Boost logo

Boost :

From: Eric Niebler (eric_at_[hidden])
Date: 2007-08-09 14:22:36


Steven Watanabe wrote:
> AMDG
>
> Eric Niebler <eric <at> boost-consulting.com> writes:
>
>> It's not exactly what you're asking for, but if you want to modify a
>> time_series in-place, you can do that with range_run_storage::set_at().
>> It's a terrible way to build a series though, because you need to find
>> the place in the series where this element should be inserted (could be
>> O(log N)) and then insert it (could be O(N)).
>>
>> You seem to be asking for an ordered_append feature. That might be
>> useful to add.
>>
>>> Is there some reason why this doesn't work? If I need the
>>> other behavior I can always create a new series fill it and swap
>>> myself.
>> Fill it how? That's exactly what ordered_inserter is for! So you can
>> see, ordered_inserter is necessary. But the name isn't right.
>
> I would fill it using ordered_append.

How would you suggest ordered_append be implemented?

>> The reason for ordered_inserter's existence is that it is the fastest
>> way to build a time_series from scratch, regardless of the internal
>> representation. When you know elements are sorted, you can just bang
>> them into a dense vector, sparse vector, map, or what-have-you -- very
>> efficient. Once you have a time series, there are other ways to
>> manipulate it.
>>
>> The reason for .commit() is that, since you're pushing in one datum at a
>> time, there may be a more efficient intermediate representation than the
>> final time series data structure. For the scenarios I've encountered,
>> being able to efficiently build a series by pushing in one datum at a
>> time is important.
>
> Do you have an example where pushing them using a different
> intermediate representation is significantly more efficient?

Imagine the series storage is std::map<offset, value>. Inserting one
value with insert(value) is O(log N). Inserting N values this way is O(N
log N). But if you use insert(begin, end), where begin and end are
iterators to sorted data, the insert is faster. O(N).

So, if you're collecting data one sample at a time (common in signal
processing), a good strategy might be to stuff pair<int,value> into a
vector or deque first, then do one O(N) insert into your map at the end.
  That works out to be 2 O(N) operations, rather than one O(N log N)
operation. Better.

That's a made up example, though. No series types use std::map as a
backing store. Rather, they generally use a sorted vector or something.
Except delta, characteristic, heaviside and inverse heaviside which
don't need growable storage at all. And some series types might not have
any storage at all, but just write values to a stream or to disk or
something. The point is, you should be able to efficiently be able to
write ordered data to a series without any concern for how that data is
being stored, or even if it's being stored.

> Even if there is such a case should we really use an interface
> for insertion that departs from established practice even when
> it isn't needed? Would it be possible to have a range assign
> function template in addition to the inserter that takes a sequence
> of tuples? Could this be as efficient as the current ordered inserter?

Yes, and I think this would be good to have. When you have a sorted
range of data, it would be nice to have a clean high-level
ordered_assign(series, begin, end) that does the series assignment. It
can use the lower-level stuff under the covers.

But the lower-level stuff is still needed so that series types can be
efficiently built in generic code (like the time series algorithms, for
instance), which can't make any assumptions.

>> The alternative would be use a range-based interface like the STL uses.
>> E.g.:
>>
>> series.assign( from, to );
>>
>> Here, "from" and "to" can't really be iterators though, or, if they are,
>> they should yield a 3-tuple of (value, offset, end offset). An interface
>> like this is very unwieldy in signal processing, where you often collect
>> data by polling a value at an interval.
>>
>> for(;;)
>> series.push_back(poll_some_value(), last_time, current_time);
>>
>> Of course, push_back() may not be the most efficient way to build up
>> some kinds of data structures. What the ordered_inserter gives is a way
>> have something as efficient as a range-based assign() while retaining
>> the poll-and-push-back convenience.
>>
>> Does that make sense?
>
> No.
>
> std::vector<tuple<...> > elements;
> for(;;)
> elements.push_back(make_tuple(poll_some_value(), last_time, current_time));
> series.assign(elements);

Every element is copied twice.

>
> vs.
>
> ordered_inserter inserter(series);
> for(;;)
> *insert++ = make_tuple(poll_some_value(), last_time, current_time);
> inserter.commit();

Every element is copied once. Have I missed your point, or did you just
make mine for me? ;-)

-- 
Eric Niebler
Boost Consulting
www.boost-consulting.com
The Astoria Seminar ==> http://www.astoriaseminar.com

Boost list run by bdawes at acm.org, gregod at cs.rpi.edu, cpdaniel at pacbell.net, john at johnmaddock.co.uk