Boost logo

Boost :

Subject: Re: [boost] lifetime of ranges vs. iterators
From: Giovanni Piero Deretta (gpderetta_at_[hidden])
Date: 2008-09-03 11:50:49


On Wed, Sep 3, 2008 at 3:11 PM, David Abrahams <dave_at_[hidden]> wrote:
>
> on Tue Sep 02 2008, "Giovanni Piero Deretta" <gpderetta-AT-gmail.com> wrote:
>
>>> Great, but what are you doing with these stacks? Normally, the answer
>>> would be something like "I'm applying the X algorithm to a range of
>>> elements that represent Y but have been adapted to look like Z"
>>>
>>
>> Usually is a matter of converting a text document to a point in a
>> feature vector space as a preprocessing stage:
>
> e.g., for the purposes of search? Just trying to get a feel for what
> you're describing.

Near duplicate elimination and document clustering in general. Most
clustering algorithms work on representations of documents as points
in a very high dimension space.

>
>> Most pipelines start with tokenization, normalization, filtering and
>> hashing, with a 'take' operation at the end.
>
> By 'take' you mean a destructive read?
>

What do you mean with 'destructive'?

'take(range, n)' returns a range adaptor that iterates over the first
'n' elements of 'range'. This way I do not have to decide how much of
the original document (which theoretically could be of unlimited
length) I need to until the end of the pipeline (most importantly, I
want to set the final length *after* filtering).

<snip>
>> Most of the uses of lazy ranges are in relatively non performance
>> critical part of the application, so I do not aim for absolute zero
>> abstraction overhead.
>
> Okay, so... is it worth breaking up the iterator abstraction for this
> purpose?

Do you think we are breaking the iterator abstraction? How? Or I've
misunderstood the question?

We are simply trying to come up with a simple and clean trick to keep
iterator size under control.

>
>> A nice thing of lazy ranges is that are useful to control peak memory
>> usage of the application (in fact if you are careful, you can do very
>> little dynamic memory allocation)
>
> Yes.
>
>> I'm interested in using dynamic iterators in the near future for code
>> decoupling
>
> You mean, like, any_iterator?
>

Yes.

>> and It would be a pity if these ranges couldn't fit in a small object
>> optimization buffer.
>
> Do you know how big that target ("small object optimization buffer") is?

I'm going to implement my own any_iterator, so I'll make the buffer
big enough for the average iterator size. Currently the iterators I
use are often over 400 bytes, which is a bit to big :), thus the need
to squeeze their size down as much as possible. A hundred bytes could
be enough [1].

[1] 100 bytes for SBO might seems a lot, but, I've used a custom
string with a very large SBO buffer (over 100 bytes), been happily
copying it around and performance was much better than with standard
std::string (which, admittedly, did do reference counting but no SBO).
I guess that the size of the buffer depends a lot on what you are
going to do with it.

-- 
gpd

Boost list run by bdawes at acm.org, gregod at cs.rpi.edu, cpdaniel at pacbell.net, john at johnmaddock.co.uk