Boost logo

Boost :

From: John Phillips (phillips_at_[hidden])
Date: 2007-08-17 13:05:48


   I am pleased to announce that the Time Series library, submitted by
Eric Neibler and developed by him along with Daniel Egloff, Matthias
Troyer, David Abrahams and Daniel Wallin using funding provided by
Zurcher Kantonalbank has been accepted into boost. As seen in the
review, there are some important issues that need to be addressed before
the library is ready for boost distribution, however I am confident that
this is a very good library. Although the donation of time and funding
for Time Series by Zurcher Kantonalbank is laudable and something we
would like to encourage more companies to do, the fact that it was
donated is not important to the review process.

   Something that is important is the well-established reputation of the
involved lead developer. Eric has a number of very high quality
libraries already in boost and has shown his willingness to support them
thoroughly and improve them continually. He has been very responsive to
user feedback and has an obvious desire to have only high quality work
committed to boost with his name on it. For this reason, I am giving him
a little more leeway than might be given to a developer without his
reputation to address the issues in this review and enter the revised
version into boost without a fast tracked second review.

   Many thanks to the participants in the review: Tom Brinkman, Hugo
Duncan, Andrey Tcherepanov, Matthias Schabel, Steven Watanabe, Stjepan
Rajko, Matias Capeletto, Matthias Schabel, Lewis Hyatt, Paul Bristow,
Tobias Schwinger, Matthias Troyer, Phil Endecott, Zach Laine, Dave
Abrahams, Michael Marcin, and Michael Fawcett for their time and
attention. The discussions held in this review provided a strong
foundation for improving the library, and the variety of domains of
expertise represented by the participants shows how broad the desire for
a library of this sort is.

   Most of the points raised in the review have been directly addressed
by Eric and discussed until a solution that is acceptable to all
involved parties was reached. Eric has acknowledged many as now either
fixed in his version or on his To-Do list. However, for the sake of easy
reference I will list the major points and my current understanding of
what to do about them below. The order in which the points are presented
should not be taken as a rating of importance: it is just the order I
made notes about them as I reviewed the discussion.

1) Fix the floating-point offset issues – The answer to this may be a
change in how they are implemented or a complete scrapping of the
floating point offsets as an indexing type. In either case, it would be
a good idea to provide a view facade for time series that applies a
constant multiplicative factor (or possibly a more general function) to
an offset. This may even be a sufficient answer to all the
floating-point issues, but it is not yet clear to me (or to anyone else
that I noticed in the review) what the best answer is. I expect there
will be continued discussion on the developer list to clarify this issue.

2) Polish and include the rolling window average example sent to the
list – This is a very good example of a very potent design idiom for
this library, and one that would help a lot of people better understand
the potential of the library. The use of the circular buffer to collect
multiple points is necessary for many filters. It is such a good case
that you should consider not only including it in the library, but also
making a tutorial out of it that shows how new algorithms can be added
to the library. Ideally, many algorithms that are specific to domains
that the authors are not specialists in will be donated by users who
develop and polish them for inclusion. A very clear tutorial will make
this more likely.

3) If the rolling average is not developed as a tutorial example, some
other filter should be – There is no way the library can include all the
filters in use by programmers in all the various use domains, so the
expectation is that writing your own filters will be common. The
documentation needs to support this very well.

4) The documentation needs improvement in a number of ways –
Reorganization to build from foundational ideas and uses to the more
complex issues is important, as is the addition of overview, tutorial
and rationale sections, as well as a substantial increase in the number
of examples. The learning curve is currently too steep and that seems to
be discouraging users who might find the library a very good fit. An
important step along the way is to add good examples and good pictures
in many places in the documentation. This general note is reinforced in
some of the other comments.

5) Small toy examples for each of the algorithms would make the
documentation of them far more clear

6) Since you maintain both libraries, it would be useful to add to the
documentation for both the Time Series and the Accumulators libraries
some notes on how a user should select between them for specific jobs.
This came up in discussions before the review, so it is likely to be a
problem for more future users.

7) Well chosen pictures can help the readers of this documentation
immensely – Especially in your attempts to clarify the documentation of
the discretization, the offset and sample values, good pictures will be
invaluable.

8) Expand the focus of the documentation so it doesn't appear to a
casual reader that this is purely for econometric/financial time series
– The library has potential uses in a wide variety of fields, so it is
important that potential future users can tell that from even a light
skimming of the documentation.

9) The prenamed discretizations seem to cause more problems than they
solve – Initial testing of the boost::units library to provide
discretization units looks hopeful. If this stands up under testing, not
only remove the current prenamed discretizations, but also document
using boost::units. A good example could do for this.

10) It is only possible to assign across series with the same
discretization type – This should be made more explicit in the
documentation and also since a good example of trouble that would be
caused by trying to assign series with different discretization types
showed up during the review, it might be added to the rationale section
so users can see why the choice is a good one.

11) The reasoning behind having unit series for types that have no
obvious unit member should also appear in the rationale – In general,
there were many good explanations for design choices that should appear
in the rationale section. Many of these reasons are compelling but not
obvious to a developer who has not tried writing a time series library.
Thus they are ideal rationale entries.

12) Currently non-entries in a dense time series are recorded as zeros.
This does not seem to be the best idea in all cases. In specific, in
some time series, the difference between a value of zero an unknown is
very important. Investigate the best choice for a non-entry. Zero gives
simple arithmetic for multiplication (though not addition) that gives
the unknown value placeholder when multiplied with anything. Some
non-zero value would have to have special arithmetic rules, probably. A
universal answer may not be feasible. If not, consider making the
unknown value a parameter available for the user to set.

13) Consider a name change from delta_series to dirac_delta_series and a
link in the documentation to what a Dirac delta function is.

14) Consider renaming the inverse_series to reciprocal_series or some
other more illustrative name.

15) Consider reorganizing to expose the storage containers to outside
developers – They are likely to be useful for a number of applications
aside from time series and this would prevent others from needing to
reinvent the wheel as often.

16) Since the usage of time series is spread across a variety of
different problem domains, there will be terms that aren't familiar to
many of the readers of the documentation. Consider including links to
definitions and descriptions for many of the terms used. Paul's example
of users who don't know what amortized constant complexity implies is
unfortunately not extreme for what you should expect.

17) ordered_inserter is not a great name for the operation it performs -
Determine a better name and change to that. Also consider moving to
named parameters for the start, stop and value arguments to the
ordered_inserter. After all this, the ordered_inserter is a lower level
interface than most users want in most circumstances. Add another layer
for doing appends on top of this interface, and include a push_back
method where sensible for more stl-like syntax.

18) There is a confusion between the template parameter discretization
and the constructor parameter discretization. Consider changing the name
for the template parameter to something like Discretization_type.

19) Include in the documentation a clear description of what it means
for two discretizations to be the same, and point out that the runtime
check for this is an assertion, not an exception. Explain why in the
rationale section, if nowhere else.

20) A number of reviewers found the concepts of discretization, offset
and run to be hard to grasp from the documentation. Work to improve this.

21) The comparison operators for the time_series_facade currently are
tied to the time_series_base class. This could be generalized to remove
that dependency. If this is done, the documentation should no longer say
that time_series_base is the base for all time_series classes, but
instead that it provides a convenient base for implementing time_series
classes. If this isn't done, the documentation should clearly state the
need to inherit from time_series_base.

22) A large number of specific documentation edits were provided in
posts by Steven, Paul and Zach.

23) Fix the documentation to clearly show what kind of iterator the
series_stl_iterator is.

24) This is outside the scope of this review for a complete answer, but
there are questions about the behavior of non-MSVC compilers that define
the _MSC_VER macro to masquerade as MS compilers. Is there a way to know
which (if any) share the behaviors of the actual MS compilers they pose
as, and what is the best way to test for them in code. While the pursuit
of this answer is not part of this library, using it is, so if a good
answer is found it should become part of this library.

25) Provide versions of the fine_grain and coarse_grain functions that
allow for the user to easily provide a sampling function - This is
likely to be a very common use case for this library, so it should be
well supported. An example in the documentation of how to do this is a
good idea.

26) If the review of floating point offsets concludes that they should
remain in the library, the dense series reference page should explicitly
point out that floating point offsets are not usable with dense series.

27) What is the appropriate handling of zero width points. They are a
part of the library in the delta_series, so having them is unavoidable.
They should be on firm conceptual ground if possible.

28) Make the existence and use of the set_at function more obvious in
the documentation – Reviewers didn't notice it and missed the
functionality, so others will, as well.

29) Consider whether it is valuable to provide versions of the transform
algorithm that take more than two series as arguments. I can't think of
a good application off hand, but there may be some. However, the same
reviewer who requested the feature said earlier in the review that he
almost never works on more than one series at a time.

30) Provide an example of using a single pass data stream with a
circular buffer based algorithm.

31) Consider a better name for the scaled_storage_tag.

32) Consider whether users need access to the metafunctions used to
determine return types. If so, consider how they could be exposed in a
user friendly way.

   Once again, my thanks to the reviewers and the developers for the
work, time and attention put into this library and congratulations to
Eric for the accepted work.

    John Phillips
    Review Manager


Boost list run by bdawes at acm.org, gregod at cs.rpi.edu, cpdaniel at pacbell.net, john at johnmaddock.co.uk