Boost logo

Boost :

Subject: Re: [boost] [histogram] should some_axis::size() return unsigned or int?
From: Hans Dembinski (hans.dembinski_at_[hidden])
Date: 2018-11-30 10:36:37


> On 30. Nov 2018, at 08:51, Alexander Grund via Boost <boost_at_[hidden]> wrote:
>
>
>> Conventionally your size_type should be the same type returned by size() and used for indexing. So I would expect that type to be int, given the above.
>>
>> Having said that, you're already departing from standard container conventions by having size() return a number that is *sometimes* 2 smaller than the "real" number of bins, which might frustrate generic algorithms.
>>
>> Completely without tongue in cheek, I wonder if it might be better to not provide a size() method at all (to avoid container implications which you do not satisfy) and spell it as bin_count() or something distinct instead.
>>
>>
>> Another possibility that might sidestep all of this could be to remove the underflow/overflow bins from the standard indexing and make them only accessible through separate underflow_bin() methods (or similar) instead. But this might complicate other cases such as a find_bin_for_value() that needs to return an underflow/overflow (though that could be solved by returning a bin iterator, not an index).
>
> A high +1:
>
> Having size() makes on expect that "for(i=0;i<size();i++) h[i]" iterates over everything which it seems it does not. Hence it should not be called `size()`

You are overestimating the importance of the *-flow bins, I think. Users usually ignore them when they analyse their histograms. They must be there for other reasons which are explained in the rationale and they are very useful for expert-level statistical analyses and for debugging. The beginner, however, should not notice their presence.

In fact, the `indexed` range adaptor should probably skip them by default, and only iterate over them when that is explicitly requested.

> > I think representing the underflow bin with -1 and the overflow bin with the value returned by size() is very intuitive and elegant.
>
> IMO this is against expectation of every C++ programmer: Being able to index a container with -1 and size() so in every review of code using this it will come up at least as a raised eyebrow.

An axis is not a container. It does not hold values and it has no operator[], precisely to emphasise this difference. It has size() though. See my email to Gavin with a long explanation why I think that makes sense.

> Other idea: If those bins are so special that they don't fit into the [0, size()) range, why not use a different function for getting them, which is not the index operator? high_bin()/low_bin() come to mind.

See explanation to Gavin why this is worse.

> But WHY was this chosen? Wouldn't it be ok if 0 is the first bin which starts at -inf and size()-1 to be the last one spanning to inf? This would allow a histogram of size 1 which has a single bin holding all values.

And why would you want such an axis? It would be pointless and make the histogram operate slower. Nevertheless, you can easily generate such an axis as user-defined axis type

// this axis is valid and can be plugged into boost.histogram
struct my_axis {
  int operator()(double x) const { return 0; } // always return 0
  unsigned size() const { return 1; } // axis has size of 1
};

> This makes me wonder to if you *always* want the +-inf bins or if there are reasonable use cases NOT wanting them and expect out-of-range values to be dropped.

Sure, there are reasonable use cases where you don't want them. It is a compile-time option for the builtin axis types (on by default). Also, user-defined axis types are not required to have them (see example above).

> Disclaimer: I'm not familiar with Boost.histogram and see this from a users perspective only.

No problem, your user perspective is appreciated. It would be good to read the rationale, in the docs of boost.histogram though, since it lists the arguments that have been refined in previous discussions already.

Best regards,
Hans


Boost list run by bdawes at acm.org, gregod at cs.rpi.edu, cpdaniel at pacbell.net, john at johnmaddock.co.uk