Boost logo

Boost :

Subject: Re: [boost] [histogram] Formal review
From: Hans Dembinski (hans.dembinski_at_[hidden])
Date: 2018-10-01 10:28:43


> On 30. Sep 2018, at 17:29, Bjorn Reese via Boost <boost_at_[hidden]> wrote:
>
> On 09/24/18 23:52, Hans Dembinski wrote:
>
>> The histogram iterators are const, because I cannot imagine a use-case where the histogram has to be bootstrapped with a prior distribution. Only providing const iterators avoids the misconception that a histogram is a container. Histogram can make stronger guarantees about its internal consistency, if users cannot set bin counters arbitrarily.
>
> If you are using histograms to collect data for later analysis, then
> there may indeed not be much use for a prior distribution. However, if
> you use histograms to make decisions on-the-fly, then you cannot always
> wait for the histogram to be filled up with data in order to make
> reliable decisions. In such cases a prior distribution can help.
>
> Admittedly, I have not done this with histograms, but I do use CDFs
> (well, quantile approximations) quite a lot for on-the-fly decision-
> making.
>
> The other use case is to restore the histogram with data from an
> earlier run, but serialization already covers that.

I plan to add a struct `unsafe_access` which allows unsafe raw access to the bin counters, like so

auto& storage = unsafe_access::storage(some_histogram);

This will be part of the public interface, but is targeted towards advanced users. It will be used by algorithms that transform histograms and by serialization code. I also want to support other serialization libraries than Boost.Serialization, and unsafe_access makes this easier. With `unsafe_access`, advanced users get full control, but the standard interface still protects from trivial usage errors. The docs of `unsafe_access` then basically say: if you use this, you are on your own. It could be used to set the histogram with a prior, which seems like an advanced use case.

>> There are two solutions, right? Turing weight_counter into a full arithmetic type or removing support for weight_counter in the default storage.
>
> True, but then I am going to play the zero-overhead card :)

Fair enough.
https://github.com/HDembinski/histogram/issues/75

The default value would then be without weight support.

>> Originally, I had a special method called fill(), then I switched to operator(). There are two arguments in favour of operator():
>> - a 1D histogram can be filled with std::for_each or std::accumulate
>> - consistency with Boost.Accumulators, which also use operator()
>
> Inserting data into the histogram with std::accumulate is odd.

I was wrong, std::accumulate uses += to fill, which is not supported.

> An alternative is to use std::copy (or std::transform) with
> std::inserter. That does requires a histogram::insert(iterator hint,
> value_type value) where the hint is completely useless... but that
> is what std::set is using.

histogram is not a container, so it should not support std::inserter. histogram is a accumulator and accumulators have a functor design, as established by Boost.Accumulator. They eat samples and update their internal state, there is an implied transformative aspect. Inserting a value implies that you can exactly retrieve it later, which is not the case here.

>> I am ok with renaming, but I think we haven't found the best name yet. What do you think about axis::tails_type?
>
> Seems like you settled on options in another thread. I am fine with
> that.
>
> Related to this, I also like the extent() over shape(). This name is
> also used by Boost.MultiArray and the std::mdspan proposal (P0009R7).
> In the name of consistency, you may also consider renaming dim() to
> rank().

Good suggestion, thank you!
https://github.com/HDembinski/histogram/issues/112

Best regards,
Hans


Boost list run by bdawes at acm.org, gregod at cs.rpi.edu, cpdaniel at pacbell.net, john at johnmaddock.co.uk