Subject: Re: [boost] lifetime of ranges vs. iterators
From: David Abrahams (dave_at_[hidden])
Date: 2008-09-03 13:21:14
on Wed Sep 03 2008, "Giovanni Piero Deretta" <gpderetta-AT-gmail.com> wrote:
> On Wed, Sep 3, 2008 at 3:11 PM, David Abrahams <dave_at_[hidden]> wrote:
>> on Tue Sep 02 2008, "Giovanni Piero Deretta" <gpderetta-AT-gmail.com> wrote:
>>>> Great, but what are you doing with these stacks? Normally, the answer
>>>> would be something like "I'm applying the X algorithm to a range of
>>>> elements that represent Y but have been adapted to look like Z"
>>> Usually is a matter of converting a text document to a point in a
>>> feature vector space as a preprocessing stage:
>> e.g., for the purposes of search? Just trying to get a feel for what
>> you're describing.
> Near duplicate elimination and document clustering in general. Most
> clustering algorithms work on representations of documents as points
> in a very high dimension space.
>>> Most pipelines start with tokenization, normalization, filtering and
>>> hashing, with a 'take' operation at the end.
>> By 'take' you mean a destructive read?
> What do you mean with 'destructive'?
Nevermind; it was a stab in the dark and you explain what you mean
> 'take(range, n)' returns a range adaptor that iterates over the first
> 'n' elements of 'range'. This way I do not have to decide how much of
> the original document (which theoretically could be of unlimited
> length) I need to until the end of the pipeline (most importantly, I
> want to set the final length *after* filtering).
>>> Most of the uses of lazy ranges are in relatively non performance
>>> critical part of the application, so I do not aim for absolute zero
>>> abstraction overhead.
>> Okay, so... is it worth breaking up the iterator abstraction for this
> Do you think we are breaking the iterator abstraction? How? Or I've
> misunderstood the question?
Well, if we add "Factorable" or something like it, the iterator
abstraction gets broken down into smaller bits for the purpose of
storage, complicating some code [like a generic range for example ;-)]
and it imposes an extra coding duty on writers of high-quality iterator
adaptors. I'm asking if its worth it.
> We are simply trying to come up with a simple and clean trick to keep
> iterator size under control.
Really? IIUC so far we're only discussing compressing the size of
ranges, unless we're willing to pay for indirection inside of iterators
>>> I'm interested in using dynamic iterators in the near future for code
>> You mean, like, any_iterator?
>>> and It would be a pity if these ranges couldn't fit in a small object
>>> optimization buffer.
>> Do you know how big that target ("small object optimization buffer") is?
> I'm going to implement my own any_iterator, so I'll make the buffer
> big enough for the average iterator size.
Oh, that's not the small object optimization. The SOO is only done by
compilers: it means passing small objects in registers instead of on the
IIUC you're talking about the too-specifically-named "small string
I don't understand why you would implement any_iterator yourself when
there are so many extant implementations, but maybe it's none of my
> Currently the iterators I use are often over 400 bytes, which is a bit
> to big :), thus the need to squeeze their size down as much as
> possible. A hundred bytes could be enough .
Again, have you measured?
>  100 bytes for SBO might seems a lot, but, I've used a custom
> string with a very large SBO buffer (over 100 bytes), been happily
> copying it around and performance was much better than with standard
> std::string (which, admittedly, did do reference counting but no SBO).
> I guess that the size of the buffer depends a lot on what you are
> going to do with it.
-- Dave Abrahams BoostPro Computing http://www.boostpro.com
Boost list run by bdawes at acm.org, gregod at cs.rpi.edu, cpdaniel at pacbell.net, john at johnmaddock.co.uk