|
Boost : |
From: Zach Laine (whatwasthataddress_at_[hidden])
Date: 2007-08-09 10:19:11
On 8/8/07, Eric Niebler <eric_at_[hidden]> wrote:
> > I'm not really sure why dense_series<> does not allow floating point
> > offsets. I understand that it is supposed to resemble std::vector.
> > However, there is implicitly a relationship between the indices of the
> > underlying vector and the times represented by the indices. It is
> > possible (and quite convenient) for dense_series<> to hold a
> > discretization_type that represents the start offset, and an interval,
> > and perform the mapping for me. The lack of this mapping means that
> > for dense time series of arbitrary discretization (say 0.5), I have to
> > multiply all my times by 0.5 in user code when using the time series.
> > I feel the library should take care of this for me; the fact that the
> > underlying storage is vector-like should not force me to treat
> > dense_series<> discretizations differently from all other series'
> > discretizations.
>
>
> How does it force you to treat discretizations differently? Whether your
> discretization is 1 or 0.5 or whether your underlying storage is dense
> or sparse, it doesn't affect how you index into the series, does it? I'm
> afraid I've missed your point.
I wrote "discretization" when I meant to say "offset". Perhaps it's
even that I'm merely confused, but this is actually making my point --
I see a need for clarifying the relationships among disctretization,
offset, and run. My whole point was that (as I understand it) in
order to represent an offset of 3.14 I need to keep an extrinsic value
somewhere that tells me how to convert between 3.14 and the integer
offset used in dense_series<>' runs. Is that accurate? If so, isn't
this at odds with the other series types, which let me specify double
values for offsets directly?
> > Nonetheless, it would be best if it were possible to
> > specify that a sample exists at offset X, where X is double, int, or
> > units::seconds, without worrying about any other details, including
> > discretization. That is, discretization seems useful to me only for
> > regularly-spaced time series, and seems like noise for
> > arbitrarily-spaced time series.
>
>
> Discretizations are useful for coarse- and fine-graining operations that
> resample the data at different intervals. This can be useful even for
> time series that are initially arbitrarily-spaced.
>
> Sometimes you don't care to resampmle your data at a different
> discretization, or call the integrate() algorithm. In those cases, the
> discretization parameter can be completely ignored. It does tend to
> clutter up the docs, but no more than, say, the allocator parameter
> clutters up std::vector's docs.
Is discretization then properly a property of the series itself? If
the offsets of each sample are not related to the discretization, why
have both in the same container? I find this very confusing. To
accomodate the algorithms you mention above, would it be possible to
simply say that I want to resample using a scale factor instead? What
I'm getting at here is that discretization and offset seem to have a
very muddy relationship. Doing everything in terms of offset seems
clearer to me, and I don't yet see how this simplification loses
anything useful.
> > In addition, a sample should be
> > representable as a point like 3.14 or a run like [3.14, 4.2).
>
>
> A zero-width point, like [3.14, 3.14)? What that would mean in the
> context of the time_series library is admittedly still an outstanding
> design issue.
Fair enough.
> > The rest of the algorithm detailed docs have concept requirements, but
> > it would be much easier to use them if the concepts were links to the
> > relevant concept docs; as it is now, I have to do some bit of
> > searching to find each one listed. This applies generally to all
> > references to concepts throughout the docs -- even in the concepts
> > docs, I find names of concepts that I must then look up by going back
> > to the TOC, since they are not links.
>
> Yeah, it's a limitation of our BoostBook took chain. Doxygen actually
> emits this documentation with cross-links, but out doxygen2boostbook XSL
> transform actually ignores them. Very frustrating.
That's too bad.
> > * What is your evaluation of the potential usefulness of the library?
> >
> > I think it is potentially quite useful. However, I think its
> > usefulness is not primarily as a financial time series library, but as
> > I mentioned earlier, its current docs make it sound as if it is mainly
> > only useful for that. In addition, I am forced to ask how a time
> > series library is more useful for signal processing than a std::vector
> > and an extrinsic discretization value.
>
> It's for when you want to many options for the in-memory representation
> of a series, and efficient and reusable algorithms that work equally
> well on all those different representations.
This is very true, and that's what I was alluding to below, if a bit unclearly.
> > The answer I came up with is
> > that Boost.TimeSeries is really only advantageous when you have
> > arbitrary spacing between elements, or when you want to use two
> > representations of time series in an algorithm. That is, using
> > Boost.TimeSeries' two-series for_each() is almost certainly better
> > than a custom -- and probably complicated -- loop everywhere I need to
> > operate on two time series. However, these cases are relatively rare
> > in signal processing; it is much more common to simply loop over all
> > the samples and do some operation on each element. This can be
> > accomplished just as well with std::for_each or std::transform.
>
> If std::vector and std::for_each meet your needs, then yes I agree
> Time_series is overkill for you. That's not the case for everyone.
>
>
> > The
> > question then becomes, "Does using Boost.TimeSeries introduce
> > clarifying abstractions, or conceptual noise?". The concensus among
> > my colleagues is that the latter is the case.
>
> Sorry you feel that way.
I think this feeling would change rapidly if there were more features
directly applicable to signal processing, as mentioned below.
> > Some specific signal-processing usability concerns:
> > - For many signal processing tasks, the time series used is too large
> > to fit in memory. The solution is usually to use a circular buffer or
> > similar structure to keep around just the part you need at the moment.
> > The Boost.TimeSeries series types seem unable to accommodate this
> > mode of operation.
>
>
> Not "unable to accommodate" -- making a circular buffer model the time
> series concept would be fairly straightforward, and then all the
> existing algorithms would work for it. But no, there is no such type in
> the library at present.
I'm glad to hear that this would be straightforward to do, and I think
it's a must-have for signal processing folks.
> > - It might be instructive to both the Boost.TimeSeries developers and
> > some of its potential users if certain common signal-processing
> > algorithms were implemented with the library, even if just in the
> > documentation. For example, how might one implement a sliding-window
> > normalizer over densely populated, millisecond resolution data? What
> > if this normalization used more than two time series to do it's work?
> > It may well be possible with the current framework, but a) it's not
> > really clear how to do it based on the documentation and b) the
> > documenation almost seems to have a bias against that kind of
> > processing.
>
> I wonder why you say that. The library provides a 2-series transform()
> algorithm that is for just this purpose.
That's why I asked about "more than two time series". Such
convolutions of multiple time series can be done in one pass, and
Boost.TimeSeries does this admirably for N=2, but rewriting
transform() for N>2 is a lot for most users to bite off.
> As for the rolling window calculations, I have code that does that, and
> sent it around on this list just a few weeks ago. I hope to add the
> rolling average algorithm soon. It uses a circular buffer, and would
> make a good example for the docs.
I agree. This would be a great addition to the docs.
> > As it stands, no. If there were clearly-defined relationships between
> > samples and their extents and offsets; better support for large and/or
> > piecewise-mutable time series; a rolling-window algorithm; and better
> > customizability of coarse_grain() and fine_grain(), I would probably
> > change my vote.
>
>
> I'm still not clear on what you mean by "clearly-defined relationships
> between samples and their extents and offsets." The rest is all fair.
> Rolling-window is already implemented, but not yet included.
I was alluding to my issue with the relationships among
discretization, offset, and run that I mentioned earlier.
Zach Laine
Boost list run by bdawes at acm.org, gregod at cs.rpi.edu, cpdaniel at pacbell.net, john at johnmaddock.co.uk