Subject: Re: [boost] lifetime of ranges vs. iterators
From: Giovanni Piero Deretta (gpderetta_at_[hidden])
Date: 2008-09-03 11:50:49
On Wed, Sep 3, 2008 at 3:11 PM, David Abrahams <dave_at_[hidden]> wrote:
> on Tue Sep 02 2008, "Giovanni Piero Deretta" <gpderetta-AT-gmail.com> wrote:
>>> Great, but what are you doing with these stacks? Normally, the answer
>>> would be something like "I'm applying the X algorithm to a range of
>>> elements that represent Y but have been adapted to look like Z"
>> Usually is a matter of converting a text document to a point in a
>> feature vector space as a preprocessing stage:
> e.g., for the purposes of search? Just trying to get a feel for what
> you're describing.
Near duplicate elimination and document clustering in general. Most
clustering algorithms work on representations of documents as points
in a very high dimension space.
>> Most pipelines start with tokenization, normalization, filtering and
>> hashing, with a 'take' operation at the end.
> By 'take' you mean a destructive read?
What do you mean with 'destructive'?
'take(range, n)' returns a range adaptor that iterates over the first
'n' elements of 'range'. This way I do not have to decide how much of
the original document (which theoretically could be of unlimited
length) I need to until the end of the pipeline (most importantly, I
want to set the final length *after* filtering).
>> Most of the uses of lazy ranges are in relatively non performance
>> critical part of the application, so I do not aim for absolute zero
>> abstraction overhead.
> Okay, so... is it worth breaking up the iterator abstraction for this
Do you think we are breaking the iterator abstraction? How? Or I've
misunderstood the question?
We are simply trying to come up with a simple and clean trick to keep
iterator size under control.
>> A nice thing of lazy ranges is that are useful to control peak memory
>> usage of the application (in fact if you are careful, you can do very
>> little dynamic memory allocation)
>> I'm interested in using dynamic iterators in the near future for code
> You mean, like, any_iterator?
>> and It would be a pity if these ranges couldn't fit in a small object
>> optimization buffer.
> Do you know how big that target ("small object optimization buffer") is?
I'm going to implement my own any_iterator, so I'll make the buffer
big enough for the average iterator size. Currently the iterators I
use are often over 400 bytes, which is a bit to big :), thus the need
to squeeze their size down as much as possible. A hundred bytes could
be enough .
 100 bytes for SBO might seems a lot, but, I've used a custom
string with a very large SBO buffer (over 100 bytes), been happily
copying it around and performance was much better than with standard
std::string (which, admittedly, did do reference counting but no SBO).
I guess that the size of the buffer depends a lot on what you are
going to do with it.
Boost list run by bdawes at acm.org, gregod at cs.rpi.edu, cpdaniel at pacbell.net, john at johnmaddock.co.uk