Boost logo

Boost :

Subject: Re: [boost] Formal Review of Proposed Boost.Histogram Library Starts TODAY
From: a.hagen-zanker_at_[hidden]
Date: 2018-09-26 10:35:44


Hi Hans,

I did some further hands on trying. In particular I implemented my own axis type and used it with the dynamic historgram variant.

1. I think axis::any is a misnomer. We normally use the word "any" to indicate type erasure. So I would expect axis::any to be able to hold any type that implements the axis concept. Instead, axis::any derives for boost::variant and can hold an axis from a discrete set of types provided by the user. This bites when using your own axis_type in a dynamic histogram, because it means you then also have to specify your own variant of any_std. E.g. I was required to do the following:

using my_any = bh::axis::any<bh::axis::regular<>, bh::axis::category<std::string>, my_axis>;
std::vector<my_any> axes;

You could rename your axis::any to axis::variant, however I think it would be better to more fully achieve type erasure so you can store your dynamic axes as a std::vector<axis::any> and dramatically simplify usage.

2. The bin_type concept is very opaque to me. And, in the end the only two member functions I was required to implement to make my bin_type work were:

std::string value() const { return is_negative_bin ? "negative" : "positive"; }
 operator int() const { return is_negative_bin ? 0 : 1; }

"value()" seems to be a misnomer, since it is used to print a label for the bin (in your second example: getting_started_listing_02.cpp line 41). To me it is not logical for a bin to have a value() member function, since a bin really is a domain of values. Would a function "as_string()" make more sense? Or, better, simply work with ostream << ? It would make the odd cast requirement in your second example obsolete. I ended up with the following cast:

for (auto my_bin : static_cast<const my_axis&>(h.axis(2))) {
    std::printf("%s\n", my_bin.value().c_str());
  
and in my opinion the following would be better

for (auto my_bin : h.axis(2)) {
    std::printf("%s\n", my_bin.as_string().c_str());

or even better

for (auto my_bin : h.axis(2)) {
  std::cout << my_bin << '\n';

operator int() seems inappropriate because why would a bin need to know its index? That should logically be the responsibility of the axis

3. I think this was covered before, but the inheritance from labeled_base is awkward. I am not just required to inherit from it, but also construct it in the appropriate form.

I ended up constructing the base class with:

my_axis() : labeled_base(2, bh::axis::uoflow_type::off, "positive vs. negative", std::allocator<char>{})
 { ...

It seems too onerous to require my own axis to know the number of bins on construction. What if I want to add bins one at a time? What if my number of bins is based on some function?
I also really don't like having to bother with std::allocator<char>; I normally stay away from that kind of complexity. Why should using histograms force me to start thinking about allocators?

4. Using iterator_over might not be efficient

If my axis stores bin_types in a std::vector (which seems reasonable), why can I then not just use the iterator of std::vector? You could achieve that by leaving it up to the user to implement the iterators, and offer iterator_over as the default..

I still think histogram should be accepted, but in my opinion there remains quite some scope to tidy up the concepts to make working with histograms as simple as possible.

Kind regards, Alex


Boost list run by bdawes at acm.org, gregod at cs.rpi.edu, cpdaniel at pacbell.net, john at johnmaddock.co.uk