Subject: Re: [boost] [histogram] Formal review
From: Hans Dembinski (hans.dembinski_at_[hidden])
Date: 2018-10-01 10:28:43
> On 30. Sep 2018, at 17:29, Bjorn Reese via Boost <boost_at_[hidden]> wrote:
> On 09/24/18 23:52, Hans Dembinski wrote:
>> The histogram iterators are const, because I cannot imagine a use-case where the histogram has to be bootstrapped with a prior distribution. Only providing const iterators avoids the misconception that a histogram is a container. Histogram can make stronger guarantees about its internal consistency, if users cannot set bin counters arbitrarily.
> If you are using histograms to collect data for later analysis, then
> there may indeed not be much use for a prior distribution. However, if
> you use histograms to make decisions on-the-fly, then you cannot always
> wait for the histogram to be filled up with data in order to make
> reliable decisions. In such cases a prior distribution can help.
> Admittedly, I have not done this with histograms, but I do use CDFs
> (well, quantile approximations) quite a lot for on-the-fly decision-
> The other use case is to restore the histogram with data from an
> earlier run, but serialization already covers that.
I plan to add a struct `unsafe_access` which allows unsafe raw access to the bin counters, like so
auto& storage = unsafe_access::storage(some_histogram);
This will be part of the public interface, but is targeted towards advanced users. It will be used by algorithms that transform histograms and by serialization code. I also want to support other serialization libraries than Boost.Serialization, and unsafe_access makes this easier. With `unsafe_access`, advanced users get full control, but the standard interface still protects from trivial usage errors. The docs of `unsafe_access` then basically say: if you use this, you are on your own. It could be used to set the histogram with a prior, which seems like an advanced use case.
>> There are two solutions, right? Turing weight_counter into a full arithmetic type or removing support for weight_counter in the default storage.
> True, but then I am going to play the zero-overhead card :)
The default value would then be without weight support.
>> Originally, I had a special method called fill(), then I switched to operator(). There are two arguments in favour of operator():
>> - a 1D histogram can be filled with std::for_each or std::accumulate
>> - consistency with Boost.Accumulators, which also use operator()
> Inserting data into the histogram with std::accumulate is odd.
I was wrong, std::accumulate uses += to fill, which is not supported.
> An alternative is to use std::copy (or std::transform) with
> std::inserter. That does requires a histogram::insert(iterator hint,
> value_type value) where the hint is completely useless... but that
> is what std::set is using.
histogram is not a container, so it should not support std::inserter. histogram is a accumulator and accumulators have a functor design, as established by Boost.Accumulator. They eat samples and update their internal state, there is an implied transformative aspect. Inserting a value implies that you can exactly retrieve it later, which is not the case here.
>> I am ok with renaming, but I think we haven't found the best name yet. What do you think about axis::tails_type?
> Seems like you settled on options in another thread. I am fine with
> Related to this, I also like the extent() over shape(). This name is
> also used by Boost.MultiArray and the std::mdspan proposal (P0009R7).
> In the name of consistency, you may also consider renaming dim() to
Good suggestion, thank you!
Boost list run by bdawes at acm.org, gregod at cs.rpi.edu, cpdaniel at pacbell.net, john at johnmaddock.co.uk