|
Boost : |
From: Eric Niebler (eric_at_[hidden])
Date: 2007-08-07 14:19:06
Stjepan Rajko wrote:
> On 8/7/07, Stjepan Rajko <stipe_at_[hidden]> wrote:
>> On 7/31/07, John Phillips <phillips_at_[hidden]> wrote:
>>> My apologies for the delay in this posting, but the review period for
>>> the Time Series library submitted by Eric Neibler runs from Monday, July
>>> 30 until Wednesday, August 8. From the documentation:
>>>
>> I have started to review the library, and at this point I am confused
>> :-) So I thought I would share my thoughts in case someone can
>> de-confuse me and I can produce a more useful review.
>>
>
> OK, I think I have tracked down many of the sources of my confusion.
>
> As you say in the beginning of the docs, "A time series is a series of
> data points, sampled at regular intervals". That I agree with, except
> for the fact that it's not necessarily regular intervals.
>
> dense_series and sparse_series implement this definition of time
> series well - they behave as containers that have specified values
> only at the specified time points. Everywhere else, they are zero
> (although, I would argue that for a time series, if you had to examine
> a time point that has not been specified, I would be more likely to
> call it an "unspecified" or "unknown" rather than a "zero").
You could think of it that way. When multiplying two series, for
example, the library assumes that where two series do not overlap, the
result is "zero" or "undefined". You can see that as a manifestation of
"0 x Y == 0" or as "<undefined> x Y == <undefined>". I prefer to think
of them as zeros because then I don't need to define arithmetic with
undefined values.
> The other containers though, they don't really implement what I would
> consider a time series. They more or less implement piecewise
> constant functions. In some cases, they do so well, and in some
> cases, it's tricky, like with piecewise_constant_series. One of the
> problems there is that runs might have an intersection - that should
> be handled carefully. Even if it is up to the user to make sure there
> are no overlaps, there would have to at least be support for open /
> half-open intervals so that I can specify that, say f has a value of
> 10 on [0, 1), a value of 11 on [1, 2), etc.
Integral runs are half-open. Floating-point runs are problematic in this
regard. Some extra thought needs to go into this. One possibility would
be to disallow sparse and delta series with floating point offsets, and
require floating-point runs for piecewise constance to be half-open like
their integral brethren. That may result is more intuitive behavior.
> This is where the concept of a "run" breaks down for me - it gives the
> illusion / assumption that you are dealing with a continuous function
> (rather than a strict time series), but it is not handled carefully to
> provide (IMO) mathematically intuitive behavior - examples are the
> fact that sparse_series claims to have a run of unit length, but it is
> still zero at any unspecified value, and the surprising handling of
> overlapping ranges in piecewise_continuous_series.
Only for floating-point offsets. And this behavior can be changed.
> IMO, this library deals with two different problem domains -
> time_series and picewise constant functions. I also think that these
> two domains are too different to be stuck in the same bucket. I think
> that a general time_series does not need to address the "run" concept
> - perhaps, there can be a notion of a "weight" assigned to each
> sample, which can represent time duration or other things. That would
> make it behave eqivalent to the run for the "integral" (which should
> really just be a sum for time_series, IMO). Also, I think that for
> the piecewise constant functions, runs should at least have the option
> of being open/half open intervals. With all this in mind, to some
> extent, I believe that sparse_series and dense_series (which I see as
> time series) should be treated differently than the rest of the
> contaners (which I see as piecewise constant functions).
You've hit on something important -- I agree time_series currently has a
split personality, but I don't agree that its the sparse/dense vs.
piecewise constant thing. It's the integral vs. floating-point offset
thing. And I think those problems are fixable.
> All in all, I think that this library is useful, but it to me it it is
> something different than what it claims to be. It attempts to address
> a mathematical/numerical concept, but I find that it does so in ways
> that are unintuitive to me - maybe to people using time series in
> other contexts this makes perfect sense but it left me very confused.
> If the library made a separation between time series (values at
> discrete specified times only) and piecewise constant functions
> (values everywhere), and handled both with some of the changes
> suggested above, I'd say "Yes! Accept!". As it stands, I'm not sure.
Thanks for your very valuable feedback.
> Oh, and kudos to Zürcher Kantonalbank. And Eric for yet another
> impressive implementation, of course :-)
-- Eric Niebler Boost Consulting www.boost-consulting.com The Astoria Seminar ==> http://www.astoriaseminar.com
Boost list run by bdawes at acm.org, gregod at cs.rpi.edu, cpdaniel at pacbell.net, john at johnmaddock.co.uk