Boost logo

Boost :

Subject: Re: [boost] [histogram] should some_axis::size() return unsigned or int?
From: Alexander Grund (alexander.grund_at_[hidden])
Date: 2018-11-30 11:58:50


Am 30.11.18 um 11:36 schrieb Hans Dembinski:
> You are overestimating the importance of the *-flow bins, I think. Users usually ignore them when they analyse their histograms. They must be there for other reasons which are explained in the rationale and they are very useful for expert-level statistical analyses and for debugging. The beginner, however, should not notice their presence.
>
> In fact, the `indexed` range adaptor should probably skip them by default, and only iterate over them when that is explicitly requested.
Sounds reasonable: A range excluding the over/underflow bins and one
including it.
> An axis is not a container. It does not hold values and it has no operator[], precisely to emphasise this difference. It has size() though. See my email to Gavin with a long explanation why I think that makes sense.
Your code example was the following:

for (unsigned i = 0; i < axis.size(); ++i) {
   auto x = h[i];
   // do something with bin
}

So it looks like a container, although size and []-operator are in
different instances (which feels weird, but ok)
>> Other idea: If those bins are so special that they don't fit into the [0, size()) range, why not use a different function for getting them, which is not the index operator? high_bin()/low_bin() come to mind.
> See explanation to Gavin why this is worse.
Combining this with "Users usually ignore them[...] the `indexed` range
adaptor should probably skip them by default" I do see the need for
extra functions here too. Your argument against "high_bin()/low_bin()"
was: Iteration must be split. But your above comment already suggests,
that there are iterators which can cover the whole range. Could they
solve this split-iteration-problem?
>> But WHY was this chosen? Wouldn't it be ok if 0 is the first bin which starts at -inf and size()-1 to be the last one spanning to inf? This would allow a histogram of size 1 which has a single bin holding all values.
> And why would you want such an axis? It would be pointless and make the histogram operate slower.

I was not saying this should be done. It would just be consistent. There
are 2 dimensions:
- open ranged bins yes/no
- number of bins
In my mind enabling open ranged bins does not ADD bins but makes the
first and last go to +-inf:

axis(4,0,10,"",uoflow_type::on) -> [-inf,0), [0,5), [5,10), [10, inf]
axis(4,0,10,"",uoflow_type::off) -> [0,2.5), [2.5,5), [5,7.5), [7.5,10)

Of course this might be confusing so default should be "off" as "users
usually ignore them" so they are advanced things one does not generally
need, right?
(Side note: The parameter description at
https://hdembinski.github.io/histogram/doc/html/boost/histogram/axis/regular.html
is confusing due to the list order not matching the parameter order.)

So my TLDR of this is: Consistency and meeting expectations. If it
breaks either, think again about the choices made.

For this it is either:

- *-flow bins are kinda regular bins -> included in size(), iteration,
same behavior like regulars
- *-flow are special bins -> not included in size(), special accessors
and iterators with default ones not including them.
 Â Â Â  Given that: Why not have special constants for Underflow AND
Overflow bin (e.g. -1 and -2) (instead of -1 and size(), where the
latter is a runtime constant), then you could have a `int
find_including_ouflow` and a `unsigned find` as well as `get(unsigned)`,
`get_with_uoflow(int)` -> Idea is to make the special handling obvious

Alex

PS: I don't want to push anything. Just my thoughts on your issue in the
hope it helps you finding a solution which you are happy with.




Boost list run by bdawes at acm.org, gregod at cs.rpi.edu, cpdaniel at pacbell.net, john at johnmaddock.co.uk