Boost :

From: Stjepan Rajko (stipe_at_[hidden])
Date: 2007-08-12 23:50:50

On 8/12/07, Eric Niebler <eric_at_[hidden]> wrote:
>
> Stjepan Rajko wrote:
> > On a less nit-picky note though, I still can't find a single outside
> > reference in which something that assigns a value to a whole real set
> > interval is called a time series. Eric, you indicated that your
> > choice for using range runs (as opposed to just points, I assume) was
> > that this yielded superior generic algorithms. But in the floating
> > point case, this is causing that most of your structures to represent
> > something that is not really a time series. In rethinking the
> > floating point case, are any of the strategies you are considering
> > looking to put all of your structures in line with what the
> > mathematical notion of what a time series is?
>
>
> Would you agree that the time series types that use integral offsets are
> isomorphic to what a time series is, in the mathematical sense? Are

Yes

> there any time series (in the math sense) that are not representable
> using integral offsets?
>

I think that most time-series, and especially those time series used
in practice, could be approximated fairly well using an
integral-offset series with the appropriate discretization. The
problems are:

1) if you don't know much about the series a priori, you might not
know what to set the discretization to initially, and might never get
a good idea of what the discretization really should be.

2) say you have a series coming at you, and the time intervals between
the samples keep getting smaller and smaller, repetitively beating
your discretization no matter how small it is. You would have to keep
fine_graining, which doesn't seem efficient.

3) a user simply might not want to use discretization. But this is
not really a problem when it comes to your lib, because the sparse
series with floating point offsets would do the trick.

>
> > In one of your posts, you mentioned something along the lines of
> > making Point concept a first class citizen of the library - IIUTC,
> > that would be a good approach. Furthermore, I think that the RangeRun
> > should be rethought, so that a RangeRun is in effect equivalent to a
> > countable set of Points even in the floating point case (where by "a
> > countable set", I mean "a countable set significantly smaller than the
> > one including every Point indexable by a floating point number between
> > the start offset and end offset"). If not, then I see this as a
> > Time_series+something else library, which is fine. But with time
> > series, I think a continuous interval is much less useful than a way
> > to specify a number of discrete points in an interval.
>
>
> I'm having a hard time seeing how this is any different that using a
> series with integral offsets and a floating point discretization. The
> time series library provides this functionality already. Can you clarify
> what you're suggesting?

I don't think that the integral offsets + floating point
discretization approach always works (mostly given the problems I list
above). But again, sparse_series with floating point offsets can be
used instead. I gave a slightly more specific example of what I am
suggesting in http://tinyurl.com/ywu53v, but also see below.

>
> I agree that the time series types that use floating point offsets are
> not very time series-ish in the math sense. But some have expressed the
> strong opinion during this review that the functionality they provide is
> useful.
>

Please don't get me wrong - I also think that the floating point
offset series are useful as they are. For example, I think it's really
useful to be able to multiply a sparse_series with a
piecewise_constant_series, to accomplish something like "multiply all
the samples in [0, 100) by 10, and all samples in [100, 200) by 20".

Also, I agree with Steven in that the floating point offset series can
be divided into two categories - sparse/dense (and delta), which are
pretty consistent with the mathematical concept of a time series as
they are, with the exception of their pre_runs and post_runs, and the
rest, which are closer to modeling a piecewise constant function.

What keeps nagging me is that all these "others" are not time series.
I wouldn't even call them "series", although they are a series of
tuples, because they so much better reflect a piecewise constant
function.

What I do see coming out of the RangeRun concept is a potentially
wonderful foundation for Boost.MathFunction - but in order to get
there, it would need to grow (for example, somehow supporting all
flavors of open/closed/half-open intervals). So, I see most of these
floating point. So all these "others", I see in this limbo - they are
not time series, but they are useful to have with time series, and
they are almost really nice implementations of piecewise constant
functions (and with the potential to implement any function I think,
using the RangeRun concept) but not quite there either.

So what I'm mostly suggesting is:
* whatever is supposed to be a time series - make it a true time
series. At the end of the day, anything that is a time series should
be convertible to a sequence of discrete time points with values
attached, and nothing more. Integral offset versions of the series
are there. Floating point versions of dense, sparse, and delta series
are also there, except for their pre-runs and their post runs.
* whatever is not a time-series - call it something else, or make it
clear in the documentation that it behaves as something else in
certain circumstances (like floating point offsets). I am not
disputing the fact that they are useful, and not suggesting they be
removed from the library - they are definitely useful in conjunction
with time-series.
* alternatively - make everything a time series, which would require
you to revisit the RangeRun concept so that it is always convertible
to a countable set of discrete (value, time) pairs. The utility I see
here is the following: from a time series perspective, I can't just
say "All samples in [0, 100) have value 10". I have to specify
exactly where all these samples lie. Allowing me to do this concisely
using a modified RangeRun would be very useful, since I wouldn't have
to specify each of the possibly numerous samples separately, nor would
they have to be stored separately.

> I guess what I'm missing is a use case for non-integral offsets. Any
> reanalysis of what floating point offsets mean has to start there. Is it

I hope the above makes a case for some of that. I think in a lot of
cases, users will not want to deal with discretization or any involved
transforms and just want to use their (value, time) pairs as they are.

> simply the desire to index into a sequence of points and interpolate
> between them in some way? If that's the case, then support for floating
> point offsets can be dropped in favor of a flexible interpolating
> facade. (IMO, something like that is needed anyway.)

I have to think about that... I wasn't thinking about that case.

> If someone really
> needs a way to say, "This signal really has the value of X in the time
> interval [Y,Z)" where Y and Z are floating point values, then continuous
> floating point runs are the way to go. That seems like a reasonable
> thing to want, even if it doesn't fit the mathematical definition of
> "time series".

It is a *very* reasonable thing to want. And it fits the mathematical
definition of a function with a domain in the real numbers very well
;-)

Best regards,

Stjepan