Boost logo

Boost Users :

Subject: Re: [Boost-users] Overloading boost::for_each()
From: Jeremiah Willcock (jewillco_at_[hidden])
Date: 2010-08-29 13:03:58


On Sun, 29 Aug 2010, Mathias Gaunard wrote:

> On 29/08/2010 00:53, Dave Abrahams wrote:
>
>>>> I presume you're familiar with http://lafstern.org/matt/segmented.pdf?
>>>
>>> I had heard of segmented iterators but hadn't looked at them in detail, so
>>> thanks for the link. I'm not sure I want something that's just two-level
>>
>>
>> Read the paper. It works for as many levels as you want. The localiterator
>> can itself be segmented.
>
> I see some potential for some interesting discussion here, sorry in advance
> for hijacking this thread.
>
> I think the whole segmented iterators/ranges is a bad idea, as it has quite a
> few disadvantages and there are alternative, simpler solutions that provide
> the same benefits.
>
> segmented ranges have the following disadvantages:
>
> - while segmented iterators are easier to write than flat iterators in
> certain cases, they're still not as easy to write as they could be
>
> - any code or algorithm that deals with regular ranges has to be rewritten,
> not just in order to gain the performance gains, but to get basic
> functionality working as well. That's about as bad as it gets from an
> integration point of view.
>
> - only trivial operations that manipulate element per element (for_each,
> copy, transform, accumulate, erase_if) can really exploit the performance
> gains of segmented ranges
>
> - making non-trivial operations (any operation that needs to manipulate
> several adjacent elements [1]) deal with segmented ranges can be so
> complicated that they can realistically only be written with a flattened
> non-segmented view of the range.
>
> [1] examples include character encoding conversion, base64 encoding/decoding,
> gzip encoding/decoding, etc.

I agree with all of this; an algorithm that works on segmented iterators
would effectively need to treat the leaves of an arbitrary tree as its
sequence. Additionally, in my reading of the paper, the depth of the tree
is encoded into the iterator types (and so must have a compile-time
bound).

>>> and still based on iterators, though; one case I was thinking of is
>>> compressed data that truly has a linear sequence but it might be faster to
>>> have the decompressor control the program's control flow rather than
>>> having the algorithm do it (of course, that isn't as general as iterators,
>>> but works for some algorithms).
>>
>> Ah, push or pull—the eternal struggle.
>
> Push and pull are the opposite ends of the same thing.
> When you want to generate data, you'd rather push rather than be pulled, when
> you want to read it, you'd rather pull than be pushed.
>
> So what's the solution?
> Generators. An by that, I mean a function that writes data to an output
> iterator.
>
> A generator is a pure-push primitive, and is trivial to write efficiently.
> Trivial operations can be implemented with no cost on top of a generator, by
> using something such as boost::function_output_iterator.

Yes, we just don't have them implemented.

> There are generic techniques to build a (normal, non-segmented) range that
> you can then easily pull from on top of a generator, but those are more
> complicated and introduce some overhead; overhead you would need anyway to do
> any non-trivial operation.
>
> There are two techniques to achieve this.
>
> The first one, the magical silver bullet, involves using a coroutine. Every
> time you generate a value, yield. This gives control back to the other end,
> that can do whatever it wants with the element, and when it pulls again it
> goes back in the generator exactly where it was, and can keep on pushing.
> The code keeps swapping between a pushing and a pulling context and achieves
> exactly the push->pull integration we wanted.
> It however raises a few concerns: context size, allocation and management,
> and potentially excessive context swapping.

That's my concern, plus implementation complexity.

> The other technique is to generate data step by step, each step having a
> fixed maximum size.
> Then you can run a step on a temporary buffer, read off that temporary
> buffer, then run a new step once everything has been read. An iterator that
> does this transparently on-top of a generator can be written. Albeit it does
> introduce some overhead, it could be less than that of the coroutine
> solution. The biggest concern is that it makes the iterator fat, since the
> temporary buffer is part of the iterator; and iterators are typically passed
> by value.

For this special kind of iterator, though, algorithms that know about the
special kind can pass iterators by reference/rvalue reference and avoid
copying them.

> The latter approach is the one I used to implement my generic Converter
> system as part of my Unicode library. A converter is a step-generator, except
> it takes input to convert as well. I've got an iterator adaptor that can then
> run a converter as the data is being iterated, and thus lazily perform any
> arbitrary conversion.
>
> Note that the end result, in the case of Unicode, in not very different from
> the Unicode iterator adaptors hand-written by John Maddock, except the
> implementation is much simpler, and can be easily used for efficient pushing
> as well.
>
> So, at the end of this long message, what do I propose to solve the original
> problem?
>
> Have your container provide a step-generator. You can then implement your
> iterator on top of it, or not, your choice (but I'd recommend using it to
> avoid unnecessary code duplication and maintenance).

The problem is that, without real coroutines, I'm concerned about the
performance of this approach. The example I'm thinking of is some code
that I wrote a few years ago that encoded decompressor state in control
flow (as I assume a fast UTF-8 decoder would), and needing that to be
steppable without saving the program counter and registers (i.e., a "real"
coroutine library) would prevent that technique.

> A type trait could then be introduced to tell whether a sequence provides a
> generator or not.
> Boost.Range could make use of that type trait to specialize trivial
> operations to use the generator instead of the iterators.

I do agree with that, and that's kind of what I was thinking. That's what
I would view a specialized for_each() as: write algorithms in terms of
for_each() when that is efficient, and ranges that have a faster
for_each() operation than the default using iterators would overload that
function to get performance improvements in the algorithms.

> We end up with:
>
> - an easy to write solution
> - compatibility will all code that takes ranges as input
> - only the operations that want to can be specialized to exploit the
> performance gains
> - no restriction imposed on operations that cannot be implemented in terms of
> the generator directly

Yes, I agree with all of this, except for the issue of requiring
generators to be steppable. I think just overloading for_each() and
writing algorithms in terms of that when possible would get the same
benefits, though.

-- Jeremiah Willcock


Boost-users list run by williamkempf at hotmail.com, kalb at libertysoft.com, bjorn.karlsson at readsoft.com, gregod at cs.rpi.edu, wekempf at cox.net