Boost logo

Boost :

From: Eric Niebler (eric_at_[hidden])
Date: 2007-08-10 12:14:57


Zach Laine wrote:
> You misunderstand me. I'm all for DiscretizationType (the template
> parameter -- yay typesafety), and I'm all for discretization (the
> value and associated fuctions) for dense series. What I'm against is
> using discretization (the value) for non-dense series. (Note that we
> have run into the Discretization/discretization naming ambiguity again
> here.) I find it a confusing value to keep around, especially since
> it can be simply ignored, as you pointed out in a previous email. A
> data value that you can access and that is put forward as a
> first-class element of the design -- that is also ignored -- suggests
> a better design is possible. Here's what I suggest:
>
> The Discretization template parameter becomes OffsetType, which is IMO
> a more accurate name, and follows the RangeRun concepts. I will be
> using that name below.
> The "OffsetType discretization" ctor parameter, the "OffsetType
> discretization()" accessors, and "void discretization(OffsetType d)"
> mutators should only be applied to the dense_series<> type.
> The user should be able to access any data in any type exclusively by
> using offsets. This means that dense_series<> seamlessly handles the
> mapping between (possibly floating point) offset and sample index I
> requested initially.
>
> This has these advantages:
> The notion of discretization is not introduced into types for which it
> has questionable meaning, e.g. piecewise_constant_series<>.
> Offsets can be used exclusively, both on input (as before) and output
> (as before, but now including dense_series<> with floating point
> offsets).
> Floating point and integral offsets are now treated much more uniformly.
>
> Is this reasonable? Have I perhaps missed something fundamental?
>

Reasonable, but problematic for reasons of unclear semantics and sub-par
performance. A similar design was considered early on.

Semantically, it makes it unclear what indexing into a dense series
means. Consider:

   dense_series<int> d( discretization = 5 );

In your design as I understand, d[0], d[1], d[2], d[3] and d[4] all
refer to the same element in the dense series. To me, this interface
changes the meaning of a dense series. It is no longer a dense sampling
of points at regular intervals. Now it looks piecewise constant.

To illustrate the sorts of confusion this can cause, what happens now if
I use set_at() to try to change the value at offset 2? It's nonsensical.

As another example, say I have a dense series with discretization of 5ms
and samples at 0ms, 5ms, and 10ms. Now I also have a sparse series (no
discretization) with samples at 0, 5 and 10. Now I add the two. What
does that mean? Does the dense series have a value at offset 3? If I
understand your design, yes. Does the sparse series? No. Will the result
of adding the two? It seems the answer is yes, but it should be no.

Discretization for the other series types is not useless. For sparse,
for example, it enforces that samples may only exist at offsets that are
a multiple of the discretization.

There are performance implications too. Indexing into a dense series is
currently not much more expensive than indexing into an array. With this
change, every access to the dense series would incur an integer
division. And traversing a dense series would incur an integer
multiplication at each iteration.

But the real performance problems creep in when a dense series has a
discretization, but other series do not. In the current design,
algorithms check the discretization of two series once for
compatibility, and then it can be safely ignored for the rest of the
computation. Traversing two series in parallel involves conditionally
bumping one or both cursors depending on if and how the current runs
overlap. Dense and sparse both currently have "indivisible" runs. If
they start at the same offset, I know they overlap perfectly and can
bump both cursors. If I don't know that both series are using the same
discretization, I can't do this, because the notion of indivisibility is
intricately tied to discretization. What constitutes an indivisible run
for the dense series [0ms,5ms) is different than an indivisible run for
sparse series [0ms,1ms).

The only way to fix these problems would be to add back a discretization
to all the series types. Then the indexing thing (whether it indices are
dense and logically multiplied by the discretization, or sparse and
actually multiplied) becomes a matter of convention, convenience and
efficiency. And on those counts I think the current design is a winner.

-- 
Eric Niebler
Boost Consulting
www.boost-consulting.com
The Astoria Seminar ==> http://www.astoriaseminar.com

Boost list run by bdawes at acm.org, gregod at cs.rpi.edu, cpdaniel at pacbell.net, john at johnmaddock.co.uk