Subject: Re: [boost] Formal Review of Proposed Boost.Histogram Library Starts TODAY
From: Hans Dembinski (hans.dembinski_at_[hidden])
Date: 2018-09-24 09:06:47
> On 21. Sep 2018, at 13:37, <a.hagen-zanker_at_[hidden]> <a.hagen-zanker_at_[hidden]> wrote:
>> axis.get_allocator() can be optional. Some axis types need an allocator to get memory from the heap, other don't. I didn't want to store the allocator twice, so an axis that needs an allocator uses the label allocator.
> I think it is best to make the concepts as simple as possible. "optional" to me means that it is not part of the concept.
A concept can have optional traits, e.g. the Allocator concept in the std library has optional traits.
>>> shape() and uoflow() are not very intuitive names/functions to me, how about bool has_underflow() and bool has_overflow() to get the same information.
>> Do you have a suggestion for how to name "shape()" instead? I am not 100 % happy myself. extend() ? total_size() ?
> Do you really need this function? I thought that with size(), has_overflow() and has_underflow() you would convey all information you need.
shape() is more concise than `size() + has_overflow() + has_underflow()`, and the doxy string explains what it returns. If you have a better name, let me know.
>> Histograms should be addable, but you should only be able to add histograms with the same logical layout. It makes no sense to add two histograms which have the same number of bins, but different axis types. Or further, which have the same axis types, but different axis labels. The label is a way to make a logical distinction between two otherwise identical axis types, therefore it is included in the comparison. How exactly do you foresee trouble?
> When these strings are provided in runtime, it is easy for the user to make a mistake. For instance trying to add a "Distance bands count" to a "Distance band count".
Firstly, it is better if code breaks in obvious ways than in mysterious ways. If you have a typo in your code like this, this is a bug. It is nice if the library catches that bug for you. And think about the alternative that you are suggesting. For sure, it is better to catch a typo early than to silently add two histograms which binned "age in years" and "yearly income in $1000" without throwing any error. You may think the program works correctly and then ruin a business.
Secondly, spellings mistakes are unlikely. When do you add histograms? When you have generated several histograms in parallel and then you want to add them all up. The individual histograms will be generated by the very same code, so no spelling mistakes.
> http://hdembinski.github.io/histogram/doc/html/histogram/abstract.html mentions "two variants with a common interface", which I interpreted as two implementations.
Ah, ok. That wording is alright. There is only one templated histogram class now, while in the past there were two.
>>> Now, h1 seems very reasonable to me, but for h2 you now need to use the undocumented histogram class. It is not clear to me what the two template arguments are for in histogram<A,S>. It strikes me as strange that A should be a >>vector<axis::any_std>, because having the axes stored in a vector seems to be an implementation detail irrelevant to the user.
>> It is not an implementation detail. Histogram supports two ways of storing axis types, via std::tuple or via std::vector<axis::any<â¦>>. These two options are part of the contract. Anyway, these line of thought is based on the assumption that the histogram itself is unspecified, which is not the case. Sorry about the error in the reference.
> To me these seem to be implementation details. Why would the user need to know how the axes are stored by the histogram.
This is explained in detail in the guide. You are basically saying: why do I need to make a choice between std::array, std::vector, and std::tuple. They are all containers, how they work is an implementation detail. When you can use compile-time configured histograms, they are much better, but sometimes you can't. Therefore, the library also supports histograms which are configured at run-time. Under the hood, these two implementations are very different. For the most part, the difference is hidden from you. You just either use make_static_histogram or make_dynamic_histogram and that's it. I cannot hide the difference completely, so I need to inform the user about it.
>>> I know this is not strictly functional, but you could easily provide another factory function to do this (i.e. something like "make_and_fill_histogram").
>> I am not sure what you mean, but I think you want me to change the example?
> Yes, you could use a lambda to have a solution that does not depend on smartness of compiler. Or you could provide a free function to actually do what the example does. Now you seem to say, that a functional style might work, which is not very conclusive.
> auto h = make_and_fill_histogram(axes, data); // a free function you provide
I am against adding this for the sake of maintaining a minimal sufficient interface. I don't think it is a common realistic use case. Often you need histograms when you cannot hold the original data in memory all at once. Therefore, usually you don't have a container with all the data at hand. This is syntactic sugar, so if users actually request it, it can be added later.
In any case, the whole point of the example was to show that histogram is compatible with certain STL algorithms, although there are some caveats. The STL algorithms in question often assume that the functor is cheap to copy. A histogram is not cheap to copy, only cheap to move. I cannot change the way how the STL algorithms are implemented, all I can do is inform my users that they can use these algorithms if they wish (and some explicitly requested this), but that it may not be the most performant use of the library. It is better to give this information than not.