Re: [Boost-users] Overloading boost::for_each()

29 Aug 2010

      On Sun, Aug 29, 2010 at 5:34 AM, Mathias Gaunard
<mathias.gaunard@ens-lyon.org> wrote:
...
On 29/08/2010 00:53, Dave Abrahams wrote:
...
...
...
I presume you're familiar with http://lafstern.org/matt/segmented.pdf?
I had heard of segmented iterators but hadn't looked at them in detail,
so thanks for the link.  I'm not sure I want something that's just two-level
Read the paper. It works for as many levels as you want. The localiterator
can itself be segmented.
I see some potential for some interesting discussion here, sorry in advance
for hijacking this thread.
I think the whole segmented iterators/ranges is a bad idea, as it has quite
a few disadvantages and there are alternative, simpler solutions that
provide the same benefits.
segmented ranges have the following disadvantages:
- while segmented iterators are easier to write than flat iterators in
certain cases, they're still not as easy to write as they could be
I don't think anyone is suggesting replacing flat iterators with
segmented iterators -- they would coexist.  I don't see anything wrong
with letting generic algorithms (like for_each) transparently take
advantage of the underlying storage of a sequence.
...
- making non-trivial operations (any operation that needs to manipulate
several adjacent elements [1]) deal with segmented ranges can be so
complicated that they can realistically only be written with a flattened
non-segmented view of the range.
[1] examples include character encoding conversion, base64
encoding/decoding, gzip encoding/decoding, etc.
While developing a HTTP parser, I wanted to be able to parse between
two or more buffers (segments), but that was quite slow.  I wanted
full performance for the 99% of the time that a valid sequence would
exist in a single buffer.  I wanted to avoid complicated code to move
between segments, and didn't want to copy buffers around.

I ended up letting the parser keep taking flat iterators as input --
as you say, clearly these are the simplest kind to work with.  On top
of that I built another parser that first uses a parser<const char*>
as far as it can go on the first segment, and if that doesn't have
enough data to return something, it uses a parser<flat_iterator> on as
little of the full sequence as possible.  Worked wonders, required
very little code to do, and kept the core parsing logic simple.

Granted HTTP is not all things, and base64/character encoding
conversion/gzip are not all things, but I think this way would work
fine for at least those.  I don't think it's right to say "the whole
segmented iterators/ranges is a bad idea" -- I can see plenty of
applications where writing for segmented iterators *would* get overly
complicated, but I don't see any reason why we'd be forced to use them
if something better was available.

-- 
Cory Nelson
http://int64.org