Boost logo

Boost :

Subject: Re: [boost] [histogram] review part 2 (implementation)
From: Hans Dembinski (hans.dembinski_at_[hidden])
Date: 2018-09-25 19:34:34


Dear Steven,

thank you again for the careful reading of the implementation. The bugs will be fixed. I comment on some issues below.

> On 25. Sep 2018, at 02:22, Steven Watanabe via Boost <boost_at_[hidden]> wrote:
>
> AMDG
>
> arithmetic_operators.hpp:
> […]
> 42: histogram<A, S>&& operator*(histogram<A, S>&& a, const double x)
> histogram::operator*= uses scale_type. I don't really
> care whether you use scale_type or double, but please
> be consistent. If you're hard-coding double here, then
> there's no point to making the internals more flexible.

scale_type was added recently and I forgot to update operator*

> histogram.hpp:
>
> 36: What are the requirements on the Axes parameter?
> I'm deducing that it must be either a std::tuple
> or a std::vector containing Axes, but I didn't
> see this explicitly stated anywhere.

It is a half-hidden implementation aspect, also see corresponding discussion with Alex.

It should be better documented and the histogram class needs a static_assert for the two allowed configurations.

Perhaps it can be completely hidden, I have to think about it more.

> 207: // precondition: argument sequence must be strictly ascending axis
> indices
> This is exactly the sort of information that needs to
> appear in the documentation. Use \pre.
> Also it would be nice if this weren't a requirement,
> and reduce_to could shuffle the axes.

Agreed, this is missing a static_assert. It is a requirement to be consistent with the run-time version in line 231.

In case of the latter, shuffling the indices into correct order has a significant run-time cost: a copy and a full iteration over the indices. Both can be avoided if the indices are already sorted.

> Does/should it work in the degenerate case of
> reducing a histogram to itself?

Yes.

> 234: BOOST_ASSERT_MSG(begin == end ||
> Will reduce_to even work if begin == end?

Good point, the assert should fail if begin == end.

> 261: friend class python_access;
> I don't see a forward declaration of this. Please note
> that the semantics of this can be exceedingly strange
> when there is no prior declaration. See friend.cpp

Interesting, thanks for the explanation why this should always be forward declared.

> 128: if (Archive::is_loading::value) { this->~variable(); }
> Don't call the destructor like this.

How should I call it then?

> - serialize for user defined axis types seems
> error prone, because labelled_base::serialize
> is inherited. If the user forgets to add serialize,
> the code will compile and silently slice the object.

How can I avoid this?

> weight.hpp:
>
> 12: namespace detail {
> Be careful about exposing types from namespace detail
> in any way. It allows ADL in client code to look inside
> your detail namespace.

So better make weight_type and sample_type public?

> 26: detail::weight_type<T> weight(T&& t) {
> return {t};
> No forwarding? The same goes for sample.

Right, that's bad.

> axis/base.hpp:
>
> - I only see operator==, but not operator!=.

Axes types are not required to have operator!=.

> axis/interval_view.hpp:
>
> 39: operator==
> Should two interval_views be considered to be the same or
> different if they refer to distinct, but equal axes?

I don't remember now why I made interval_views equal-comparable. If it is only used by the unit-tests, I should define the equal operator locally in the text. I think there is no need for interval_views to be comparable.

> - Actually, I find interval_view somewhat unconvincing.
> Why can't it just be a pure value that stores the upper
> and lower bounds? The value_type is usually going to
> be a built-in type anyway, not to mention that you
> haven't documented interval_view sufficiently for
> it to be used by user-defined axes.

Let's say an interval is returned as a std::pair<value_type, value_type>. Every time you call axis[k], axis::lower(k) and axis::lower(k+1) are called to fill both values of the pair.

Now assume you want to fill a std::vector with all the bin edges (a realistic use case). You iterate over the axis, get the intervals p, and append p.first to the vector. When you reach the last bin, you also append p.second. All the other p.seconds are filled while iterating over the axis, but not used. For N bins, lower() is called 2 N times while only N+1 times the value is actually used. This is very inefficient.

interval_view was my solution to avoid this superfluous calling of lower(). Is the compiler is smart enough to avoid the superfluous filling of values that are not used? I don't know, maybe. If so then I would also prefer the simpler solution.

Another nice aspect of interval_view: it is explicitly convertible to the bin index.
This allows you to run a range-based for-loop over an axis and query the histogram directly with these instances

for (auto bin : histogram.axis(0)) {
   const auto value = histogram.at(bin); // only works because "bin" knows index
}

> axis/iterator.hpp:
>
> - Does a separate reverse_iterator_over have any benefit over
> std::reverse_iterator?

Yes, but I forgot the reason.

> - It seems inconsistent for axes to have rbegin/rend, when
> histogram itself doesn't.

It makes sense to reverse-iterate over an axis, but it makes no sense to reverse-iterate over a histogram. The iteration order over the counters in a histogram is an implementation detail.

> 332,350: this->~variable();
> Don't call the destructor in the assignment operator.

Why not? See similar question above.

> - Is there a reason that you're not just using std::vector
> for variable and category?

Yes, sizeof(detail::buffer<T>) < sizeof(std::vector<T>) and axis types should be as small as possible. We don't need growth, so there no need to store a capacity in addition to a size.

> detail/axes.hpp:
>
> 521: optional_index call_impl(Tag, const std::tuple<T>& axes,
> It seems that this specialization prevents using a single
> element container with a single element static histogram,
> which seems inconsistent with the documentation and the
> behavior for dynamic histograms.

Correct, because this case is ambiguous. Let's say you have a 1d-histogram h1 and 2d-histogram h2. Both are dynamic histograms and you fill them.

h2(1, 2); // ok
h2(std::make_tuple(1, 2)); // ok, one argument must represent two values
h1(1); // ok
h1(std::make_tuple(1)); // ambiguous!

h1 cannot know at compile-time whether it should unpack the tuple or pass it to the axis unchanged. Perhaps you have a weird axis that accepts tuples as value_type.

I had to make a decision here and the most reasonable one is to pass single arguments of 1d histograms unchanged to the axis.

Best regards,
Hans


Boost list run by bdawes at acm.org, gregod at cs.rpi.edu, cpdaniel at pacbell.net, john at johnmaddock.co.uk