Boost logo

Boost :

Subject: Re: [boost] Formal Review of Proposed Boost.Histogram Library Starts TODAY
From: a.hagen-zanker_at_[hidden]
Date: 2018-09-21 11:37:36




-----Original Message-----
From: Hans Dembinski [mailto:hans.dembinski_at_[hidden]]


> Ideally yes, "Although practicality beats purity." I will come back to this below.

Perhaps not in a Boost library


> axis.get_allocator() can be optional. Some axis types need an allocator to get memory from the heap, other don't. I didn't want to store the allocator twice, so an axis that needs an allocator uses the label allocator.

I think it is best to make the concepts as simple as possible. "optional" to me means that it is not part of the concept.

>> shape() and uoflow() are not very intuitive names/functions to me, how about bool has_underflow() and bool has_overflow() to get the same information.

> Do you have a suggestion for how to name "shape()" instead? I am not 100 % happy myself. extend() ? total_size() ?

Do you really need this function? I thought that with size(), has_overflow() and has_underflow() you would convey all information you need.

> In short, for the sake of performance, it is mandatory that all axis types derive from axis::base, while deriving from iterator_mixin and axis::labeled_base should be optional.

OK, will take your word for it. I would be on the purity over practicality side here.


>> Why did you chose to make the label a part of the axis? Of course there are many cases where we like the axis to have a label, but is that not true for many other classes, for instance std::vector, std::map, etc.

> I don't know what you mean by "std::vector, std::map". I agree with you that you need to be able to make a custom axis without a label if you wish.

I think it is a purity vs practicality issue. A label is not essential to an axis, so I would rather not require it. You say in practice we always have a label, which also makes sense.


>> Making this a default seems to be against the stated idea of "you don't pay for what you don't need". It seems to me that you require me to use a pair<axis, string> where I would be happy to use an axis.

> All the internal axis types have a builtin label, which is a design decision based on the experience that labels are generally useful to a large audience. Once you can write your own axis types that do not use labels, I think this point is settled, right?

Yes

>> Why is it necessary for axes to be comparable? And is is not asking
>> for trouble to also compare the axis title (considering lower/upper
>> case typos, etc.)

>Histograms should be addable, but you should only be able to add histograms with the same logical layout. It makes no sense to add two histograms which have the same number of bins, but different axis types. Or further, which have the same axis types, but different axis labels. The label is a way to make a logical distinction between two otherwise identical axis types, therefore it is included in the comparison. How exactly do you foresee trouble?

When these strings are provided in runtime, it is easy for the user to make a mistake. For instance trying to add a "Distance bands count" to a "Distance band count". Also having labels seems to invite programming styles where users keep poor track of their axes, and use the label as a kind of handle.




> I merged the two implementations into one templated class a while ago, which is much prettier, and updated the docs. Since you found an obsolete reference to two implementations, could you please point me to where you read that?

http://hdembinski.github.io/histogram/doc/html/histogram/abstract.html mentions "two variants with a common interface", which I interpreted as two implementations.


>> Now, h1 seems very reasonable to me, but for h2 you now need to use the undocumented histogram class. It is not clear to me what the two template arguments are for in histogram<A,S>. It strikes me as strange that A should be a >>vector<axis::any_std>, because having the axes stored in a vector seems to be an implementation detail irrelevant to the user.

>It is not an implementation detail. Histogram supports two ways of storing axis types, via std::tuple or via std::vector<axis::any<…>>. These two options are part of the contract. Anyway, these line of thought is based on the assumption that the histogram itself is unspecified, which is not the case. Sorry about the error in the reference.

To me these seem to be implementation details. Why would the user need to know how the axes are stored by the histogram.

> I see no need for the user to use anything other std::vector or std::tuple to store axis types. The implementation could be made more flexible in that regard, but there is simply nothing to gain by doing so, only an increase in code complexity.

Makes sense. But it depends on your definition of complexity. To me the rationale would be, the user needs to supply a range of axes so I ask for a range of axes. Your rationale seems to be, the histogram uses a vector of axes so the user supplies a vector of axes. I don't think it is a big issue, I would just (slightly) prefer to not require the user to use a vector.


>> I know this is not strictly functional, but you could easily provide another factory function to do this (i.e. something like "make_and_fill_histogram").

> I am not sure what you mean, but I think you want me to change the example?

Yes, you could use a lambda to have a solution that does not depend on smartness of compiler. Or you could provide a free function to actually do what the example does. Now you seem to say, that a functional style might work, which is not very conclusive.

auto h = make_and_fill_histogram(axes, data); // a free function you provide

or

auto h = lambda(data); // a lambda given in the example



Boost list run by bdawes at acm.org, gregod at cs.rpi.edu, cpdaniel at pacbell.net, john at johnmaddock.co.uk