Boost logo

Boost :

Subject: Re: [boost] Stacking iterators vs. dataflow
From: Giovanni Piero Deretta (gpderetta_at_[hidden])
Date: 2008-09-03 14:37:18


On Wed, Sep 3, 2008 at 7:59 PM, Phil Endecott
<spam_from_boost_dev_at_[hidden]> wrote:
> Mathias Gaunard wrote:
>>>
>>> a framework that allowed buffering of "sensible size" chunks and
>>> potentially distributed the work between threads could be a good
>>> solution.
>>
>> If you perform n transformations, adaptors will give you loop fusion for
>> free.
>
> Maybe, subject to the fusion of all the levels' termination tests; I think
> this is what Dave has been talking about but I'm not knowledgeable about the
> area.
>
>> That kind of optimization seems more interesting to me than work
>> distribution and buffering.
>
> You're lucky if you get to work on "interesting" things, rather than
> "important" things :-)
>
> Here's a practical example:
>
> cat email_with_attached_picture | decode_base64 | decode_jpeg | resize_image
>> /dev/framebuffer
>
> How can I convert that shell pipeline into C++? Naive approach:
>
> vector<byte> a = read_file("/path/to/email");
> vector<byte> b = decode_base64(a);
> vector<byte> c = decode_jpeg(b);
> vector<byte> d = resize_image(c);
> write_file("/dev/framebuffer",d);
>
> The problem with that is that I don't start to decode anything until I've
> read in the whole of the input. The system would be perceptibly faster if
> the decoding could start as soon as the first data were available.
>
> So I can use some sort of iterator adaptor stack or dataflow graph to
> process the data piece at a time. But it's important that I process it in
> pieces of the right size. Base64 encoding converts 6 input bytes into 4
> output bytes, but it would be a bad idea to read the data from the file 6
> bytes at a time; we should probably ask for BUFSZ bytes. libjpeg works in
> terms of lines, and you can ask it (at runtime, after it has read the file
> header) how many lines it suggests processing at a time (it's probably the
> height of the DCT blocks in the image). Obviously that corresponds to a
> variable number of bytes in the input.
>
> I would love to see how readers would approach this problem using the
> various existing and proposed libraries.
>

Do you really think that the buffering size need to be configurable?
Given an appropriate buffering size (a memory page?) you could hide
the buffering step inside an interator adaptor, which, instead of
producing every N'th value on the fly, would batch the production of
enough elements to fill the buffer.

David: BTW, I think that you can use exactly the same abstraction used
for segmented iterators to expose the buffering capability of a
buffered iterator adaptor.

"All programming is an exercise in caching." -- Terje Marthisen

-- 
gpd

Boost list run by bdawes at acm.org, gregod at cs.rpi.edu, cpdaniel at pacbell.net, john at johnmaddock.co.uk