Boost logo

Boost :

Subject: Re: [boost] [histogram] should some_axis::size() return unsigned or int?
From: Hans Dembinski (hans.dembinski_at_[hidden])
Date: 2018-11-30 10:16:44


> On 30. Nov 2018, at 00:38, Gavin Lambert via Boost <boost_at_[hidden]> wrote:
>
> On 30/11/2018 06:12, Hans Dembinski wrote:
>> This is not a good match here, because -1 here does not have the meaning of "value is missing", but it really is the logical index for the virtual bin that spans from -infinity to the lower edge of the first bin in the axis.
>> Value arrow:
>> -inf ——————— | ——————— | —————— | —————— |—————————> +inf
>> bin -1 bin 0 bin 1 bin 2 bin 3
>> I think representing the underflow bin with -1 and the overflow bin with the value returned by size() is very intuitive and elegant.
>
> Conventionally your size_type should be the same type returned by size() and used for indexing. So I would expect that type to be int, given the above.

Makes sense.

> Having said that, you're already departing from standard container conventions by having size() return a number that is *sometimes* 2 smaller than the "real" number of bins, which might frustrate generic algorithms.

Generic algorithm operate on iterator intervals, so that is a separate matter, I think.

I think there is no perfect solution which is always unsurprising. In some calculations, you want to loop over the extra bins (underflow/overflow), but in most calculations you don't care about them and you want to ignore them. The current design is build on this insight, which is drawn from experience collected with CERN's ROOT library, where the underflow bin is placed at index 0. In that library, the first bin you are usually interested in starts at index 1. This turned out to be much more confusing and surprising to people, especially beginners.

I originally had two methods, `size()` and `shape()`. The former would return the logical size of the axis, the latter the physical size. People didn't like `shape()` so we discussed other options and settled for `extend()`. `extend()` is now a free function in the axis::traits namespace and not a method anymore. The axis only has a `size()` method. In practice, the end user of the library has little use for `extend()`, it is mainly used internally.

> Completely without tongue in cheek, I wonder if it might be better to not provide a size() method at all (to avoid container implications which you do not satisfy) and spell it as bin_count() or something distinct instead.

I am tempted by your argument, but not convinced whether this will really make the library less surprising. An axis may not be a container, but `for (int i = 0; i < axis.size(); ++i)` is what you would expect to work since it is similar enough to a container. We use these kinds of analogies everywhere, e.g. boost::optional uses pointer semantics, although it is not a pointer. Or iterators in general, they are not pointers and most of them do not provide the full semantics of a real pointer.

> Another possibility that might sidestep all of this could be to remove the underflow/overflow bins from the standard indexing and make them only accessible through separate underflow_bin() methods (or similar) instead. But this might complicate other cases such as a find_bin_for_value() that needs to return an underflow/overflow (though that could be solved by returning a bin iterator, not an index).

Any other solution is worse, I hope the following argument shows this. Remember the value arrow from my other email. There is no other convention for the underflow and overflow indices that is as intuitive as this convention. It also makes looping over all bins, including the extra bins every easy:

for (int i = 0; i < axis.size(); ++i) // loop over internal bins, what you would naively do
for (int i = -1; i <= axis.size(); ++i) // loop over internal and extra bins

In any other convention, I would have to loop over the internal bins and then handle the extra bins outside of the loop.

for (int i = 0; i < axis.size(); ++i)
   do_something_with_index(i);
do_something_with_index(axis.underflow_bin());
do_something_with_index(axis.overflow_bin());

This is clearly less elegant in code.

Best regards,
Hans


Boost list run by bdawes at acm.org, gregod at cs.rpi.edu, cpdaniel at pacbell.net, john at johnmaddock.co.uk