Boost logo

Boost :

Subject: Re: [boost] [histogram] review part 1 (documentation)
From: Hans Dembinski (hans.dembinski_at_[hidden])
Date: 2018-09-18 20:04:34


Dear Steven,

thank you very much for the careful reading of the docs and sorry for the
overuse of bold text. Now that people actually notice the project I can
stop shouting. I will implement all these wording mistakes that you point
out. A few clarifcations below:

abstract.html:
>
> - "The C implementations of the GSL are elegant..."
> Are there multiple implementations?
>

Ok, one implementation per set, but there are two sets of functions, one
set for 1d histograms and one set for 2d histograms.

- Is there a reason that CMakeLists.txt is in
> build/ instead of test/?
>

Mateusz moved the CMakeLists.txt into the root directory which is
better/more straight forward and follows the example of hana and compute.
We could also move it to test as you suggest, but I prefer it in the root
directory.
I read in some Boost guides (I think it was on the blincubator, but I can't
find the source just now and blincubator.com gives me a 403 error today)
that all build scripts should reside in the build directory or next to the
source. So I put it in "build".

- "in addition, the factory also accepts iterators over a sequence of
> axis::any, the polymorphic type that can hold concrete axis types"
> The relationship between axis::any and axis::any_std
> (the type actually used) in unclear.
>

True, this is not explained well. axis::any is a closed-set polymorphic
type derived from boost::variant and templated on a list of types that the
polymorphic type should be able to represent. axis::any_std is a concrete
type where the type list is filled with standard axis types which users are
expected to use most frequently.

> - I wonder whether you could handle the issue that
> h * 2 != h + h, by using h * weight(2) to make it
> clear that multiplication changes the weights of
> the samples instead of changing the number of samples.
>

If I understand correctly, you would only allow scaling by a weight type,
not by an arbitrary number. That makes sense and is more explicit. Sounds
like a good idea.

> benchmarks.html:
>
> - No GSL?
>

There are GSL benchmarks for 1D and 2D, where our static variants are
faster. The GSL does not provide histograms for more than two dimensions.
https://www.gnu.org/software/gsl/manual/html_node/Histograms.html

> - You say uniform and normal distributions, but there's
> only a single plot. Does this mean that you are putting
> both uniform and normal variates into a single histogram?
>

Good point, this is a mistake in my plotting script. I originally wanted to
plot both results but ended up plotting only one. The benchmarks for
uniform and normal data are slightly different, because they trigger the
branches in the axis code differently. By design of the benchmark, branch
prediction works better for the uniform data.

> - "Axis access"
> Not the most fortuitous word combination.
>

As long as you don't try to say it out loud... ;)

- It seems that using weighted fills breaks the special
> overflow handling properties of adaptive_storage. I
> think that this deserves a distinct note.
>

Right.

> - It doesn't seem very sane to have the value_type
> be a reference. Is there any reason for this
> other than the fact that you specify the exact
> signature of axis_type::index?
>

This is a mistake, the method is not required to accept a reference,
pass-by-value is also possible. How should I write it to indicate that
passing by value or reference is possible?

> - I would prefer to set the size of a storage_type
> in the constructor instead of using reset, if
> that's possible.
>

I was flip-flopping on that one. A call `value.reset(n)` can be replaced by
`value = storage_type(n)`. I don't see a good reason to prefer one over the
other, do you?

> - I don't understand the difference between the two
> overloads of storage_type::add.
>

I should explain better, the whole section about concepts is very brief.
The first version of "add" is used when you add two histograms and integral
counts are added to integral counts. The second version of add - which
accepts the weight type - is used when you increment the counter not by 1,
but by a weight. In an earlier stage of this library, the corresponding
method was called `increment_by_weight`. The second overload is required
because it causes an internal transformation from integral counters into
dual counts which track the sum of weights and the sum of weights squared
in case of the adaptive_storage (so that a variance estimate can be
computed). This transformation is not needed when you simply add two
histograms.

> - Unfortunately forward declarations seem to mess
> up the doxygen/boostbook toolchain. As a result
> a number of components are listed in histogram_fwd.hpp
> I'll look into this.
>

Thanks, I am glad for any help on this issue.

> - I'll deal with the reference content along with the
> code. I will note that the reference documentation
> seems a bit lacking overall as there are many components
> without any description.
>

True, just like the concepts it got a bit less love.

> - The passive voice is overused.
>

And in the last paper that I submitted to a journal, they complained about
my use of active voice...

On Mon, Sep 17, 2018 at 10:58 PM Steven Watanabe via Boost <
boost_at_[hidden]> wrote:

> AMDG
>
> abstract.html:
>
> - "the one- and multi-dimensional case"
> "cases" should be plural.
>
> - "..submit this project to Boost, that's why..."
> Comma splice
>
> - Please don't overuse *bold text*
>
> acknowledgments.html
>
> - "converting the documentation, and adding Jamfiles"
> No comma here.
>
> motivation.html
>
> - "The GNU Scientific Library (GSL) and in the ROOT framework"
> Remove "in"
>
> - "The C implementations of the GSL are elegant..."
> Are there multiple implementations?
>
> - "The implementations are not customizable, you have to..."
> Comma splice
>
> build.html
>
> - Is there a reason that CMakeLists.txt is in
> build/ instead of test/?
>
> getting_started.html
>
> - "int main(int, char**) {"
> If you're not using the parameters, `int main()` is a
> valid signature.
>
> - Since you require C++11, you might as well use <random> instead
> of Boost.Random. The Boost implementation of mt19937 has better
> performance, but that doesn't really matter here.
>
> - "in addition, the factory also accepts iterators over a sequence of
> axis::any, the polymorphic type that can hold concrete axis types"
> The relationship between axis::any and axis::any_std
> (the type actually used) in unclear.
>
> guide.html:
>
> - "A histogram consists a number of..."
> "consists /of/ a number of..."
>
> - "which provides safe counting, is fast and memory efficient."
> There are only two elements, so use "and" instead of a comma
> or say "is fast, and /is/ memory efficient."
>
> - "Axes objects with different label do..." ->
> s/Axes/Axis/, s/label/labels/
>
> - "...sequence of bin indices for each axes in order."
> s/axes/axis/
>
> - I wonder whether you could handle the issue that
> h * 2 != h + h, by using h * weight(2) to make it
> clear that multiplication changes the weights of
> the samples instead of changing the number of samples.
>
> - "...remove some axes and look the equivalent lower..."
> "...look /at/ the equivalent..."
>
> - "so it is not worth keeping that axies around"
> s/axies/axis/
>
> - "Users can create new axis classes ..., or implemented..."
> s/implemented/implement/
>
> - "Here is a contrieved example of a..."
> s/contrieved/contrived/
>
> - "The guarantees ... are only valid, if the default..."
> No comma.
>
> - "... you need know what you are doing"
> "you need /to/ know"
>
> benchmarks.html:
>
> - No GSL?
>
> - You say uniform and normal distributions, but there's
> only a single plot. Does this mean that you are putting
> both uniform and normal variates into a single histogram?
>
> rationale.html
>
> - "Axis access"
> Not the most fortuitous word combination.
>
> - "The buildin storage types..."
> s/buildin/builtin/
>
> - It seems that using weighted fills breaks the special
> overflow handling properties of adaptive_storage. I
> think that this deserves a distinct note.
>
> - "Why is Boost.Histogram not build on top..."
> s/build/built/
>
> concepts.html:
>
> - Concept names use CamelCase, by convention.
>
> - It doesn't seem very sane to have the value_type
> be a reference. Is there any reason for this
> other than the fact that you specify the exact
> signature of axis_type::index?
>
> - I would prefer to set the size of a storage_type
> in the constructor instead of using reset, if
> that's possible.
>
> - I don't understand the difference between the two
> overloads of storage_type::add.
>
> reference.html:
>
> - Unfortunately forward declarations seem to mess
> up the doxygen/boostbook toolchain. As a result
> a number of components are listed in histogram_fwd.hpp
> I'll look into this.
>
> - I'll deal with the reference content along with the
> code. I will note that the reference documentation
> seems a bit lacking overall as there are many components
> without any description.
>
> General:
>
> - The passive voice is overused.
>
> In Christ,
> Steven Watanabe
>
>
> _______________________________________________
> Unsubscribe & other changes:
> http://lists.boost.org/mailman/listinfo.cgi/boost
>


Boost list run by bdawes at acm.org, gregod at cs.rpi.edu, cpdaniel at pacbell.net, john at johnmaddock.co.uk