Boost logo

Boost Users :

Subject: Re: [Boost-users] Overloading boost::for_each()
From: Cory Nelson (phrosty_at_[hidden])
Date: 2010-08-29 16:55:07


On Sun, Aug 29, 2010 at 5:34 AM, Mathias Gaunard
<mathias.gaunard_at_[hidden]> wrote:
> On 29/08/2010 00:53, Dave Abrahams wrote:
>
>>>> I presume you're familiar with http://lafstern.org/matt/segmented.pdf?
>>>
>>> I had heard of segmented iterators but hadn't looked at them in detail,
>>> so thanks for the link.  I'm not sure I want something that's just two-level
>>
>>
>> Read the paper. It works for as many levels as you want. The localiterator
>> can itself be segmented.
>
> I see some potential for some interesting discussion here, sorry in advance
> for hijacking this thread.
>
> I think the whole segmented iterators/ranges is a bad idea, as it has quite
> a few disadvantages and there are alternative, simpler solutions that
> provide the same benefits.
>
> segmented ranges have the following disadvantages:
>
> - while segmented iterators are easier to write than flat iterators in
> certain cases, they're still not as easy to write as they could be

I don't think anyone is suggesting replacing flat iterators with
segmented iterators -- they would coexist. I don't see anything wrong
with letting generic algorithms (like for_each) transparently take
advantage of the underlying storage of a sequence.

> - making non-trivial operations (any operation that needs to manipulate
> several adjacent elements [1]) deal with segmented ranges can be so
> complicated that they can realistically only be written with a flattened
> non-segmented view of the range.
>
> [1] examples include character encoding conversion, base64
> encoding/decoding, gzip encoding/decoding, etc.
>

While developing a HTTP parser, I wanted to be able to parse between
two or more buffers (segments), but that was quite slow. I wanted
full performance for the 99% of the time that a valid sequence would
exist in a single buffer. I wanted to avoid complicated code to move
between segments, and didn't want to copy buffers around.

I ended up letting the parser keep taking flat iterators as input --
as you say, clearly these are the simplest kind to work with. On top
of that I built another parser that first uses a parser<const char*>
as far as it can go on the first segment, and if that doesn't have
enough data to return something, it uses a parser<flat_iterator> on as
little of the full sequence as possible. Worked wonders, required
very little code to do, and kept the core parsing logic simple.

Granted HTTP is not all things, and base64/character encoding
conversion/gzip are not all things, but I think this way would work
fine for at least those. I don't think it's right to say "the whole
segmented iterators/ranges is a bad idea" -- I can see plenty of
applications where writing for segmented iterators *would* get overly
complicated, but I don't see any reason why we'd be forced to use them
if something better was available.

-- 
Cory Nelson
http://int64.org

Boost-users list run by williamkempf at hotmail.com, kalb at libertysoft.com, bjorn.karlsson at readsoft.com, gregod at cs.rpi.edu, wekempf at cox.net