Boost logo

Boost :

From: Hurd, Matthew (hurdm_at_[hidden])
Date: 2004-06-02 18:09:43


> -----Original Message-----
> Behalf Of Tom Brinkman
> Subject: [boost] Proposed Boost Library for Data Series Analysis
>
> I placed a small tutorial and motivating example for
> the proposed boost::data_series library.
>
> http://groups.yahoo.com/group/boost/files/data_series/
>
> Working examples for you to criticize will be added
> within the week.

Cute code. Nice job.

Efficiency is often paramount with these types of apps. Being able to
do finite difference approaches, backwards and forwards, and in absolute
terms is important (often you only want the latest data point).

You have the incremental case covered in your notes. Being able to
calculate it at a point in an absolute sense, optional caching of
results, working with series of unknown lengths and accuracy / validity
enabling would be nice too.

By accuracy / validity enabling I mean, for say, your example of a 5
step moving average, it is not valid for the first four points as these
are not an accurate average, so there is a length modification for the
result. Another simple example would be an exponential moving average
where the number of significant digits required will determine how far
back you need to go in the series. At another level, quality of service
for some functors, the fast or accurate spectrum, is often appropriate.

Perhaps expression templates might help here. Compile time results for
validity calcs for series of known lengths would be helpful where
possible.

It would be nice if the algorithms could be simply specified, similar to
your approach, and then "mounted" into the appropriate framework.
Perhaps SFINAE or another technique could be used to determine an
algorithm's capabilities. That is, the framework could juggle the most
appropriate methods to use based on the results required. E.g. use
incremental if you can otherwise absolute.

Based on what I've done the past you also end up wanting arbitrary
n-dimensional structures with other data types flowing through as well.
Being able to associate ids/names with data items and keeping them
associated through sorting/ranking. E.g. tag the series with data
codes, rank them, and you can get the code for the best. You also want
grouping n dimensional constructs and splitting dimensions and "slicing"
operators that operate across the current dimension, which, with the
appropriate framework, are one in the same. E.g. choose the minimum or
sum the results of a bunch of functions.

Then soon you want to split the computation amongst processors and
machines and you want treat the computation as a dataflow graph and
parallelise it some way, which I did a few years ago with BGL's
precursor the GGCL. A topological sort gets you most of the way
there... Also, you can then use things like Metis to partition your
computation graph in nice ways.

I think a way approach it would be an expression template approach with
some adaptability in terms of absolute and incremental analytics with
some accuracy (at least something like the number of points being chewed
up and not valid or some such). Everything else could come on top of
this.

You need to ensure that you can easily reuse code out there. No one
wants to spend their life writing the zillions of numerical algorithms
already out there. If you want a vector ARMA model, there are only a
couple out there I think, and they are in FORTRAN, for example, wrap it,
don't write it, I'd hope.

You end up with run time versus compile time issues. People want to
interact with this kind of stuff. However, my take on that now is do it
at compile time and include the compiler as part of your interactive
tools set if need be :-)

Anyhow, it is nice to see a start on this. Good luck.

$0.005,

Matt Hurd.
_______________

Matt Hurd
+61.2.8226.5029
hurdm_at_[hidden]
Susquehanna
_______________

IMPORTANT: The information contained in this email and/or its attachments is confidential. If you are not the intended recipient, please notify the sender immediately by reply and immediately delete this message and all its attachments. Any review, use, reproduction, disclosure or dissemination of this message or any attachment by an unintended recipient is strictly prohibited. Neither this message nor any attachment is intended as or should be construed as an offer, solicitation or recommendation to buy or sell any security or other financial instrument. Neither the sender, his or her employer nor any of their respective affiliates makes any warranties as to the completeness or accuracy of any of the information contained herein or that this message or any of its attachments is free of viruses.


Boost list run by bdawes at acm.org, gregod at cs.rpi.edu, cpdaniel at pacbell.net, john at johnmaddock.co.uk