Boost logo

Boost :

From: Eric Niebler (eric_at_[hidden])
Date: 2007-08-12 14:21:16


Steven Watanabe wrote:
> AMDG
>
> Eric Niebler <eric <at> boost-consulting.com> writes:
>
>> My point is, the phrase "if you want buffering" doesn't make any sense.
>> The user (or generic algorithm) doesn't know and shouldn't care what the
>> optimal insertion strategy for a given series type is.
>
> Is there any data structure for which buffering actually helps?

If we're talking about buffering of sorted data being appended to the
end of a sequence, then I think I agree with you. No *useful* data
structure comes to mind.

But IMO the problem is larger than "buffering sorted data being appended
to the end of a sequence." More below.

>> Let's back up. Why do you believe that
>>
>> back_insert (v1, from1, to1)
>> back_insert (v2, from2, to2)
>> back_insert (v3, from3, to3)
>>
>> should be lower-level than
>>
>> begin transmission
>> back_insert (v1, from1, to1)
>> back_insert (v2, from2, to2)
>> back_insert (v3, from3, to3)
>> end transmission
>>
>> ?
>>
>
> It is possible to implement either
> in terms of the other so in a sense neither
> one is intrinsically "lower-lever."

I disagree with this. If anything non-trivial is going on in commit(),
then the second version cannot be *efficiently* implemented in terms of
the first, which means the second is necessarily the more primitive of
the two.

> The first is simpler.

Agreed.

> I do not think the
> second gains enough flexibility to justify
> the added complexity.

This is a judgment call, and I have done a poor job justifying the added
complexity.

> I've just been running
> some tests with binary trees and the results
> are surprising. Neither msvc 8.0 nor gcc 3.4
> implement the range insertion in linear time
> for sorted ranges (In spite of the standard).
> Both just do an element by element insert.

Gah! That's terrible. :-(

> Second, I discovered that the map_back_inserter
> I posted is as fast as a buffered version (
> the buffering is insignificant, BTW).

Very interesting data points. Thanks for the tests.

The inserter design (with commit()) came out of a design discussion with
Dave A., which I took the time to review this morning. He was coming at
the problem from the perspective of MTL4 -- matrices and vectors. For
MTL4, he wanted generic ways of representing sparse data and of building
sequences. We wanted a mechanism that was general. He imagined a desire
to temporarily violate the invariants of a sequence while building it,
and then using commit() to "make it good".

Some examples of invariant breaking insertions might be inserting data
that is unsorted, or sorted in reverse order, or inserting at the front
of a sequence (not necessarily invariant-breaking, but possibly
inefficient otherwise). For these purposes, there would be a collection
of inserters, not just the "ordered_inserter" in Time_series, which is a
rather uninteresting example.

This perhaps explains why we've been talking past each other. I've had
this desire for extreme generality in the back of my mind. The Sequence
and RangeRunStorage concepts are general and can be reused in other
contexts, where alternate representations may be more common and
pre-sorted data may be less.

Where does this leave inserters and the time series library? Perhaps we
decide that within the time series domain, inserting pre-sorted data at
the end of the sequence is the only scenario that matters, and
everything else is unnecessary complication. For the purpose of
generality though, the inserter interface should remain on the
lower-level RangeRunStorage interface. Perhaps for the TimeSeries
concepts, the inserters are hidden, and push_back How It's Done.

How does this sound to you? Personally, I wasn't ready to say that
appending pre-sorted data is the only scenario that matters; hence, I
exposed the more general interface. Perhaps someone more steeped in the
time series and signal processing domains could weight in.

-- 
Eric Niebler
Boost Consulting
www.boost-consulting.com
The Astoria Seminar ==> http://www.astoriaseminar.com

Boost list run by bdawes at acm.org, gregod at cs.rpi.edu, cpdaniel at pacbell.net, john at johnmaddock.co.uk