Boost logo

Boost :

From: Eric Niebler (eric_at_[hidden])
Date: 2007-08-10 15:57:08


Steven Watanabe wrote:
> AMDG
>
> Eric Niebler <eric <at> boost-consulting.com> writes:
>
>> I don't doubt it. std::map does it for its range-based insert so it must
>> be possible. And this may seem like splitting hairs to you, but what I
>> said is true -- you can't efficiently build a std::map by pushing in
>> sorted data in one element at a time, because the interface doesn't
>> allow for it. You have to pass it a range of sorted elements. You may
>> say, "Well, then don't use a std::map to build a time series!" Fair
>> enough. But when a perfectly satisfactory data structure like std::map
>> cannot be adapted to satisfy a concept, it suggests a faulty concept.
>
> Or that the container is not perfectly satisfactory
> for the particular task. std::map is simply not designed
> for inserting many elements at the end. Buffering the
> input still increases the overhead significantly because
> std::map is not allowed to assume that the input range
> is sorted. It must check. So using std::map
> is worse than a custom container for two reasons--it needs
> temporary storage and it must check the range before inserting it.

I agree that with a push-based insertion interface, std::map will be
worse than a custom type that is designed with a push-based interface in
mind. The question is whether models of the concept are *required* to
have a push-based interface, or can they expose an opaque "collector"
(as you refer to it below) that gives them a more flexible collection
strategy.

>> And (crucially) you would do away with commit() on both ordered_assign
>> and ordered_append? Or would you keep them on the low-level interface
>> and add a higher level interface that called commit for you and handled
>> the 90% case?
>
> I would remove commit and require buffering
> to be explicit.
>
>> The thought I'm not sold on is:
>>
>> - get rid of commit() on the lower level interface.
>
> I consider immediate insertion lower level
> than buffering.

This is misrepresenting my position. Buffering is not inherent in what
I've been endorsing. The low-level interface I've been suggesting is:

   begin transmission
   back_insert (v1, from1, to1)
   back_insert (v2, from2, to2)
   back_insert (v3, from3, to3)
   end transmission

Whether or not buffering happens during this operation is entirely up to
the container. This is a very general form for a push-based interface.
It can accommodate immediate insertion *or* buffering.

>> OK. It seems to me that such a thing could be built on top of an
>> ordered_append that has a commit(), simply by calling commit() after
>> every appended element. Would you agree?
>
> I would prefer to implement the "commit" version
> in terms of the plain version. See below.
>
>> Anyway, the STL-ish assign(begin,end) doesn't have this problem because
>> the end is specified explicitly and it has extra benefits (the size may
>> be known), but this doesn't mesh well with the signal processing domain.
>> In assign(begin,end), the container *pulls* the data. In signal
>> processing, data is often read by polling, and then *pushed* into a
>> container, hence the desire for something like a back_inserter.
>
> Just wondering...
> Isn't it possible to have a pull model
> from the container's side *and* a push model
> from the data's side with coroutines?

Yes if we had them. And we do in some sense (see last year's Google
Summer of Code).

>> Time passes ...
>>
>> If we ditched the assigners and the appenders and went with the
>> range-based STL-ish interface, what happens? Any convenient push-based
>> interface we build on top would have to resort to sticking in one value
>> at a time, just like std::back_inserter. Do you really feel that's an
>> improvement? It's an honest question.
>
> Yes I do.
>
> If you want buffering it doesn't have to
> be done behind the scenes. By adjusting the
> interface slightly it can be implemented in terms
> of concepts that already exist.
>
> typedef get_efficient_collector<series_type>::type collector_type;
> collector_type collector;
> back_inserter<collector_type> inserter(collector);
> for(;;)
> *inserter++ = make_tuple(poll_some_value(), last_time, current_time);
> series.assign(std::move(collector));

Precisely. This is how a buffered ordered inserter work today. Now
consider the case of a generic time series algorithm that needs to
construct a new series. It doesn't know the optimal way to build the
series, so it has to ask the series for its "efficient collector". In
short, the efficient collector is the de-facto standard way to build a
series generically.

My point is, the phrase "if you want buffering" doesn't make any sense.
The user (or generic algorithm) doesn't know and shouldn't care what the
optimal insertion strategy for a given series type is. Only the series
type knows that. This is why I made the "efficient collector" interface
the default. Everything else can be built on top.

Let's back up. Why do you believe that

   back_insert (v1, from1, to1)
   back_insert (v2, from2, to2)
   back_insert (v3, from3, to3)

should be lower-level than

   begin transmission
   back_insert (v1, from1, to1)
   back_insert (v2, from2, to2)
   back_insert (v3, from3, to3)
   end transmission

?

-- 
Eric Niebler
Boost Consulting
www.boost-consulting.com
The Astoria Seminar ==> http://www.astoriaseminar.com

Boost list run by bdawes at acm.org, gregod at cs.rpi.edu, cpdaniel at pacbell.net, john at johnmaddock.co.uk