Boost logo

Boost :

Subject: Re: [boost] Stacking iterators vs. dataflow
From: Stjepan Rajko (stipe_at_[hidden])
Date: 2008-09-03 11:50:39


On Wed, Sep 3, 2008 at 5:36 AM, Phil Endecott
<spam_from_boost_dev_at_[hidden]> wrote:
> I just noticed this in the "lifetime of ranges vs. iterators" thread (which
> I've not really been following):
>
> Arno Sch?dl wrote:
>>
>> rng | filtered( funcA ) | filtered( funcB ) | filtered( funcC ) |
>> filtered( funcD ) | filtered( funcE )
>
> I thought it worth pointing out the similarity, and also the difference,
> between this and the proposed dataflow notation. Here, operator| is being
> used like a shell pipe operator. In dataflow, operator| has a quite
> different meaning: it's a vertical line, distributing the output of "rng" to
> the inputs of the funcs in parallel. Confusing, perhaps?

Yes, I was anticipating that there would be possible confusion between
the dataflow library use of "|" for branching and the common use of
"|" for piping. A different operator could be used in dataflow, if
preferable.

> Anyway you could
> presumably write something like
>
> rng >>= funcA >>= funcB ....
>
> and I would be interested to hear how the two implementations compare. Is
> it true to say that stacked iterators implement a "data pull" style, while
> dataflow implements "data push"?
>

Dataflow.Signals networks are typically implemented as push networks,
but they can also be used for pull-processing:
http://www.dancinghacker.com/code/dataflow/dataflow/signals/introduction/tutorial/pull.html

The direction indicated by >>= aligns with the direction of the signal
(function call), but the data can flow in either way (either sent
forward in the function call argument, or sent back through through
the return value). So, you could do

rng >>= funcA >>= funcB

 or

funcB >>= funcA >>= rng

depending on how the func and rng components are implemented.

> I also note that Arno wants to use stacked iterators because this
> alternative:
>
> result = fn1( fn2( fn3( fn4( huge_document ) ) ) );
>
> creates large intermediates and requires dynamic allocation. Again, a
> framework that allowed buffering of "sensible size" chunks and potentially
> distributed the work between threads could be a good solution.
>

As far as the dataflow library goes, some sort of a "automatic task
division" library would indeed be great in conjunction with dataflow,
but I see this as orthogonal to dataflow. Automatic task division
could be useful without dataflow, and dataflow could be useful without
automatic task division. Is it your opinion that some sort of a task
division strategy would be necessary for the dataflow library to be
useful?

Kind regards,

Stjepan


Boost list run by bdawes at acm.org, gregod at cs.rpi.edu, cpdaniel at pacbell.net, john at johnmaddock.co.uk