Boost logo

Boost :

From: Stjepan Rajko (stipe_at_[hidden])
Date: 2007-08-07 13:17:49


On 8/7/07, Stjepan Rajko <stipe_at_[hidden]> wrote:
> On 7/31/07, John Phillips <phillips_at_[hidden]> wrote:
> > My apologies for the delay in this posting, but the review period for
> > the Time Series library submitted by Eric Neibler runs from Monday, July
> > 30 until Wednesday, August 8. From the documentation:
> >
>
> I have started to review the library, and at this point I am confused
> :-) So I thought I would share my thoughts in case someone can
> de-confuse me and I can produce a more useful review.
>

OK, I think I have tracked down many of the sources of my confusion.

As you say in the beginning of the docs, "A time series is a series of
data points, sampled at regular intervals". That I agree with, except
for the fact that it's not necessarily regular intervals.

dense_series and sparse_series implement this definition of time
series well - they behave as containers that have specified values
only at the specified time points. Everywhere else, they are zero
(although, I would argue that for a time series, if you had to examine
a time point that has not been specified, I would be more likely to
call it an "unspecified" or "unknown" rather than a "zero").

The other containers though, they don't really implement what I would
consider a time series. They more or less implement piecewise
constant functions. In some cases, they do so well, and in some
cases, it's tricky, like with piecewise_constant_series. One of the
problems there is that runs might have an intersection - that should
be handled carefully. Even if it is up to the user to make sure there
are no overlaps, there would have to at least be support for open /
half-open intervals so that I can specify that, say f has a value of
10 on [0, 1), a value of 11 on [1, 2), etc.

This is where the concept of a "run" breaks down for me - it gives the
illusion / assumption that you are dealing with a continuous function
(rather than a strict time series), but it is not handled carefully to
provide (IMO) mathematically intuitive behavior - examples are the
fact that sparse_series claims to have a run of unit length, but it is
still zero at any unspecified value, and the surprising handling of
overlapping ranges in piecewise_continuous_series.

IMO, this library deals with two different problem domains -
time_series and picewise constant functions. I also think that these
two domains are too different to be stuck in the same bucket. I think
that a general time_series does not need to address the "run" concept
- perhaps, there can be a notion of a "weight" assigned to each
sample, which can represent time duration or other things. That would
make it behave eqivalent to the run for the "integral" (which should
really just be a sum for time_series, IMO). Also, I think that for
the piecewise constant functions, runs should at least have the option
of being open/half open intervals. With all this in mind, to some
extent, I believe that sparse_series and dense_series (which I see as
time series) should be treated differently than the rest of the
contaners (which I see as piecewise constant functions).

All in all, I think that this library is useful, but it to me it it is
something different than what it claims to be. It attempts to address
a mathematical/numerical concept, but I find that it does so in ways
that are unintuitive to me - maybe to people using time series in
other contexts this makes perfect sense but it left me very confused.
If the library made a separation between time series (values at
discrete specified times only) and piecewise constant functions
(values everywhere), and handled both with some of the changes
suggested above, I'd say "Yes! Accept!". As it stands, I'm not sure.

Oh, and kudos to Zürcher Kantonalbank. And Eric for yet another
impressive implementation, of course :-)

Stjepan


Boost list run by bdawes at acm.org, gregod at cs.rpi.edu, cpdaniel at pacbell.net, john at johnmaddock.co.uk