Boost logo

Boost :

From: Eric Niebler (eric_at_[hidden])
Date: 2006-12-12 12:49:08


Hi, Matt.

Matt Hurd wrote:
> On 12/12/06, Jeff Garland <jeff_at_[hidden]> wrote:
>> Eric Niebler wrote:
>>> I'm pleased to announce the availability of a new library for computing
>>> with time series (http://en.wikipedia.org/wiki/Time_series). From the
>>> documentation:
>
> Looks very nicely thought out.
>
> I'm continually re-inventing series containers and would like to stop ;-)
>
> I can see how the discretization type could help many type of
> applications though it doesn't suit most of the styles I'm used to
> dealing with.
>
> This is due to many timeseries having missing points at similar
> periods of time. For example, your library seems to have been
> developed with financial daily and above time frames in mind, though

Yes.

> it is obviously not limited to this. Holidays in different markets
> come into play in the sequences even though the discretization would
> be the same. To solve this I like the concept of "clocks" from
> intensional programming. Basically, if two series use the same clock
> then indexed offsets into the sequence make sense, otherwise a
> matching procedure has to be used of which the most typical is:
> matched time = most recent time <=reference time
> some input clock has to be the reference time which is also used for
> the output. It is not the only way, for example some of the time only
> what I call correlated matching makes sense, that is the time exists
> in both (or all if there are more than two) inputs.

So in this case, a "clock" is a discretization *and* a set of "holiday"
offsets for which there is no data?

> This way you get the benefit of direct sequencing when clocks are the
> same and fast lookups when they are not. Fast lookups are based on
> the discretization. Like looking up a name in the phone book, if it
> is a V you go near the back. Calculate the density (num points/
> period) and make an educated guess as to the location and binary
> search from there. This scheme mixes microsecond data and annual data
> quite freely.

I'm having a hard time trying to image what an interface that uses
clocks instead of discretizations would look like. Could you mock up an
example (pseudo-code is fine)? English is a poor substitute for code.

> Algorithms may chew up degrees of freedom and shorten series, but the
> clocks will remain the same. For example, a simple moving average
> over 10 days will not be relevant on the first 9 points. You've
> chewed up 9 points and your output may reflect this. This is just a
> simple case. Windowing functions can chew up forward and backwards.
> Some algorithms may have accuracy requirements that may have minimum
> input requirements. A simple case is determining the number of points
> you need to get a certain accuracy for an exponential moving average
> which deals with a weight sum of infinite points.
>
> Where this puppy ends up being quite different is that you want times,
> real times, associated with the series. The obvious thing to do is
> tuple them, but this messes up passing blocks of data around
> efficiently to things that only want to deal with sequences and don't
> care about the time, but sometimes timed tuples make more efficient
> sense. So you need flexible mappings and alternative types of
> containers per situation.

I think the current framework can handle this situation quite naturally.
The offsets need not be integral multiples of the discretization. The
offsets can be floating point, which can be used to represent exact
times. Or as Jeff Garland suggested, the offset could in theory be a
posix ptime (not tested). That way you wouldn't have to pass around a
separate vector representing the times, or make your data tuple with times.

Apologies if I misunderstood your suggestion.

> So, when I look at this lib, it looks as a neat way to capture singly
> clocked series, but it also appears that perhaps it is meant to handle
> multiple clocks given it may be handling daily financial data where
> holidays come into play based on the acknowledgements section
> crediting Zürcher Kantonalbank. That is, it seems the discretization
> is a proxy for clock that suits a particular use.
>
> I'm not sure if you can see a way to consider a more flexibly
> "clocked" POV rather than you current discretization scheme. It seems
> quite different but tantalisingly close to what you have.

I'd be interested in hearing more about clocks, but I don't feel I
understand well enough to say whether the current design can accommodate
them.

> Thus I'd think of this library more of a series library than a time
> series library if such a distinction wasn't just made up by me ;-)

Thanks for your feedback!

-- 
Eric Niebler
Boost Consulting
www.boost-consulting.com

Boost list run by bdawes at acm.org, gregod at cs.rpi.edu, cpdaniel at pacbell.net, john at johnmaddock.co.uk