Boost logo

Boost :

From: John Phillips (phillips_at_[hidden])
Date: 2007-02-19 17:06:49


   The accumulator library submitted by Eric Niebler has been accepted
into boost. Thanks, once again to all of the people who contributed to
the development of the library, to everyone who contributed to the
review, and to Eric for a fine submission.

   During the review, Eric and some of the reviewers had a useful
discussion about issues for the library, and possible improvements. A
condensed version of the outcomes from that discussion is included
below. Also, in perusing the archives for other comments about the
library, I found a couple of other issues that were discussed. I have
included them below, as well.

   In many cases, Eric acknowledged issues and committed to a fix during
the review. Those issues are included below, along with the current
state of some of the more open ended discussions to provide easy
reference for all interested parties.

   For my own organizational convenience, I have numbered the entries
below. This does not indicate relative importance.

1) Prior to the review it was asked why the function that returns a
histogram is called the density. Little was done at the time, as Eric
suggested it should be addressed during the review. As far as I can see,
it was not brought back up during the review process. This should be
examined to make sure that density is the best name for the function,
and changed if needed.

2) There are a number of variant implementations of some statistical
functions. The documentation should clearly indicate which variants
focus on quick and dirty implementations, and which provide more numeric
stability. Eric has stated that this will be addressed in the revisions
of the documentation.

3) The question was asked whether or not it is feasible to provide
outlier rejection for some or all of the accumulators. I can find no
further discussion of this question in the review, however, since this
is a very important statistical technique it would be a good feature to
investigate for future addition.

4) Michael Stevens expressed interest in a form of the variance that
accumulates the differences squared iteratively and divides the result
by n. I saw no direct response, but this is basically what the
immediate_variance does.

5) Interest was expressed in making the compensated sum the default
version of the sum. It could be supplimented by the quick and dirty
version as an alternative. However, since there are reports that many
optimizing compilers turn the compensated sum algorithm into the same
compiled code as the simple sum, tests should be done to see if there is
any real gain before the modification.

6) There are a number of broken links in the quickbook docs. This has
been acknowledged and is in line for a fix.

7) Steven Watanabe suggested changing the structure to move away from
the fusion vector dependence. After discussion it appears that fusion
cons,. may be a better choice. However there may be a use case for the
original library that precipitated the choice of fusion vector. This
should be checked, and if Steven's idea holds up under scrutiny, it
should be implemented.

8) There is a request that the user guide specifies which header each
component is in. This has been acknowledged and is in line for a fix.

9) The macro BOOST_PARAMETER_NESTED_KEYWORD has no description in the
docs. This has been acknowledged and is in line for a fix.

10) There were a number of requests for improvements to the reference
manual, ranging from wording changes to a reorganization that reflects
class structure instead of header structure. In some cases, these
improvements are direct, and they are in line for fixes. In other cases
there is currently no known way to do it with the boost tool chain.
Eric, and many of the other members of boost would greatly appreciate
anyone who has ideas and time to fix the harder problems.

11) There was a suggestion that the documentation could mention the TR1
reference_wrapper as a future solution to the accumulator_set_wrapper
issues, and the implementation could be changed once there is a boost
accepted implementation of the reference_wrapper. My impression was that
I was not the only person who didn't realize that it could solve that
problem.

12) Functions that allow a range of values to be pushed into an
accumulator set all at once should be added. This could include forms
that take begin and end iterators and forms that take a sequence. Eric
agreed that this is a good idea.

13) Paul Bristow mentioned that the kurtosis has a confusing naming
history. What the docs refer to as the kurtosis would better be called
the "kurtosis excess." His suggestion was that a name change be
considered and that the docs be modified to acknowledge the confusion
whether the name is changed or not. Eric agreed with this suggestion.

14) Accessors for the standard deviation, the unbiased variance and the
unbiased (N-1) standard deviation were requested. The fact that John
Maddock fell into the trap of miscalculating the standard deviation
shows that even experts can make mistakes when converting from the
variance. Thus, it is a good idea for inclusion. Eric said he is
interested, and he would also welcome submissions that provide these
functions.

15) The docs mention "Even the extractors can accept named parameters.
In a bit, we'll see a situation where that is useful," but there is no
later mention. Eric has an example he intended that to refer to, but
forgot to include it in the last edition of the docs. He intends to fix
that.

16) The ability to "reset" and accumulator was requested. Eric pointed
out that it is quite possible to make accumulators with reset methods.
He also pointed out that it might be desirable to reset
accumulator_sets, but this could also pose a problem. It is not clear
what should happen if one of the accumulators used by an accumulator set
does not have a reset method. Further thought should be given to this
for possible inclusion in a later revision.

17) Autocorrelation for accumulator_sets should be explored. Matthias
already plans to do this in the coming year for possible inclusion in a
later revision.

18) The documentation on what to expect when one accumulator is dropped
while a second accumulator that depends on the first is not dropped
should be clarified. Careful thought on this may lead to a change in
current behavior.

19) There is a request that the docs state more clearly what happens
when more than one accumulator maps to the same feature. Eric plans to
include this in documentation revisions.

20) It is not currently possible to combine accumulators. In some very
common use cases, this would be an important feature. However, not all
accumulators can be combined in any sensible way. A solution to this
will require some study and design work, but it is a valuable addition
for a future revision. One possible solution is to have a compile time
check to see if the accumulators are combinable.

21) While finding newer/faster/more robust algorithms is not a bad
thing, the focus of this review has been the interfaces. If the
interfaces are good, improved algorithms can be worked in later. There
is some minimum standard for performance, but it is not the focal point
or the submission.

22) Javier and Hans submitted lists of documentation corrections that
Eric acknowledged and plans to fix.

23) It is agreed that there is a need for a more gentle and thorough
getting started document. This should include some compelling examples
that show why this is a good design decision.

24) More of the formulas should be available in the documentation. Eric
requested volunteers to make some formulas into LaTeX, and I have
already contacted him to do so.

25) The docs for how to incorporate new features should be improved.
This can be helped tremendously if the people who had problems doing
this would send Eric descriptions of where they had problems with the
docs and the process.

   Thanks again to everyone for your time and work. Any problems or
misrepresentations in the above list are purely my fault, and I
apologize for them.

                        John


Boost list run by bdawes at acm.org, gregod at cs.rpi.edu, cpdaniel at pacbell.net, john at johnmaddock.co.uk