|
Boost : |
From: Matt Hurd (matthurd_at_[hidden])
Date: 2006-12-11 23:09:09
On 12/12/06, Jeff Garland <jeff_at_[hidden]> wrote:
> Eric Niebler wrote:
> > I'm pleased to announce the availability of a new library for computing
> > with time series (http://en.wikipedia.org/wiki/Time_series). From the
> > documentation:
Looks very nicely thought out.
I'm continually re-inventing series containers and would like to stop ;-)
I can see how the discretization type could help many type of
applications though it doesn't suit most of the styles I'm used to
dealing with.
This is due to many timeseries having missing points at similar
periods of time. For example, your library seems to have been
developed with financial daily and above time frames in mind, though
it is obviously not limited to this. Holidays in different markets
come into play in the sequences even though the discretization would
be the same. To solve this I like the concept of "clocks" from
intensional programming. Basically, if two series use the same clock
then indexed offsets into the sequence make sense, otherwise a
matching procedure has to be used of which the most typical is:
matched time = most recent time <=reference time
some input clock has to be the reference time which is also used for
the output. It is not the only way, for example some of the time only
what I call correlated matching makes sense, that is the time exists
in both (or all if there are more than two) inputs.
This way you get the benefit of direct sequencing when clocks are the
same and fast lookups when they are not. Fast lookups are based on
the discretization. Like looking up a name in the phone book, if it
is a V you go near the back. Calculate the density (num points/
period) and make an educated guess as to the location and binary
search from there. This scheme mixes microsecond data and annual data
quite freely.
Algorithms may chew up degrees of freedom and shorten series, but the
clocks will remain the same. For example, a simple moving average
over 10 days will not be relevant on the first 9 points. You've
chewed up 9 points and your output may reflect this. This is just a
simple case. Windowing functions can chew up forward and backwards.
Some algorithms may have accuracy requirements that may have minimum
input requirements. A simple case is determining the number of points
you need to get a certain accuracy for an exponential moving average
which deals with a weight sum of infinite points.
Where this puppy ends up being quite different is that you want times,
real times, associated with the series. The obvious thing to do is
tuple them, but this messes up passing blocks of data around
efficiently to things that only want to deal with sequences and don't
care about the time, but sometimes timed tuples make more efficient
sense. So you need flexible mappings and alternative types of
containers per situation.
So, when I look at this lib, it looks as a neat way to capture singly
clocked series, but it also appears that perhaps it is meant to handle
multiple clocks given it may be handling daily financial data where
holidays come into play based on the acknowledgements section
crediting Zürcher Kantonalbank. That is, it seems the discretization
is a proxy for clock that suits a particular use.
I'm not sure if you can see a way to consider a more flexibly
"clocked" POV rather than you current discretization scheme. It seems
quite different but tantalisingly close to what you have.
Thus I'd think of this library more of a series library than a time
series library if such a distinction wasn't just made up by me ;-)
Regards,
Matt.
Boost list run by bdawes at acm.org, gregod at cs.rpi.edu, cpdaniel at pacbell.net, john at johnmaddock.co.uk