Boost logo

Boost :

Subject: Re: [boost] [review] Dataflow Library
From: Stjepan Rajko (stipe_at_[hidden])
Date: 2008-09-07 14:57:21


On Sun, Sep 7, 2008 at 8:43 AM, Paul Baxter <pauljbaxter_at_[hidden]> wrote:
>
> I really like the dataflow work and am watching with interest, but ongoing
> changes in my circumstances mean I can't do a full review or participate
> more fully.
>

Thank you for taking the time to join the discussion!

> I would be interested in what the use cases are that dataflow is tackling.
> e.g. which of:
>

I will answer your questions from the standpoint of what the
Dataflow.Signals framework / layer offers, since that one is the focus
of the review.

Dataflow.Signals is intended for component-run, signal-driven
processing. By that I mean that the network has no brains whatsoever.
 It is up to the components to send signals, decide when to propagate
signals and when not to, etc. It is possible to have something
controlling the network from the outside (e.g., activating a
component, inserting some data, grabbing a result), but none of this
is done by the framework - it has to be done by the user.

Below, I will provide examples of what components you might need
(unless I specify that a component is a Dataflow.Signals component,
the component would have to be implemented) and how you'd connect them
and run the network .

> 1) 'Single shot' data flow - all components run once. Typically the data
> being passed is pre-sized such that the size of the block on inputs gives
> rise to the answer on the output.
> e.g. Take the whole music file and then filter it all in one hit to give one
> answer

Example:

// this would be a component that can read a music file.
// It provides a member function .send() that will send the
// contents of the file via a signal of signature void(const MusicFile &)
whole_music_file_reader reader("file.in");

// this would be a filter component that takes as input a music file
// and outputs a filtered version
music_filter filter;

// this component is provided by Dataflow.Signals. It
// will store values from incoming signals.
signals::storage<void(const MusicFile &)> result;

// connect
reader >>= filter >>= result;

// run once
reader.send();

// we have the result - the at<0> member function will access it
// (0 because the MusicFile is the 1st parameter of the signature)
result.at<0>().play();

>
> 2) Packetised data flow (components laid out once and then operating on data
> packets) with explicit packet aggregation by components and having
> 'do_processing' events driven by the availability of a packet from its
> inputs. A component may consume three input packets and deliver one.
> e.g. Process music in 20ms lumps e.g. using FFTs for filtering
>

You could do this as long as the components could figure out
everything locally - but depending on the details this might not be
the most suited task for Dataflow.Signals. Here is something you
could do:

// A Dataflow.Signals component which will run in its own thread,
// and generate periodic void() signals
signals::timed_generator<void()> timer;

// A packetised sound source. Each time it receives a void() signal,
// it sends a void(const Packet &) signal
sound_source source;

// Some packet filters. Let's say that the filter will consume
// 3 packets before producing 1, and then produce a packet
// on each packet received. That is, if it's input packets are
// ip1, ip2, ip3, ip4... and its output packets are opa, opb, opc...
// the filter would use ip1, ip2, ip3 to produce opa, it would
// use ip2, ip3, ip4 to produce opb, etc.
packet_filter filter1(some_filter_fn), filter2(some_other_filter_fn);

// connect the network
connect(timer, source);
connect(source, filter1);
connect(source, filter2);

// set the timer to produce a signal every 20ms
timer.enable(0.02);

So far so good - the timer will drive the source, which will provide
the input to the filters (the filters need to take care of buffering
the inputs themselves). But now we get to places where the
suitability of Dataflow.Signals breaks down. For example, what about
the results of filter1 and filter2? If they each go to separate
outputs (call them output1 and output2), no problem:
connect(filter1, output1);
connect(filter2, output2);

but what if they both go to the same output (which combines them in
some way)? Then that output needs to be smart about how to handle two
inputs. It has to provide two input ports:
connect(filter1, ouput.input_port_1());
connect(filter2, ouput.input_port_2());

This is not the problem - the problem is that output will get calls
from each of the filters, and have no idea what frames of source data
the filtered data corresponds to. It could use a naive strategy where
it waits to get both an input from port_1 and an input from port_2,
then combine, then repeat, but that is not very robust (what if
filter2 was connected a few frames after filter1?). So, we'd need to
add some information to the data, which would allow the components to
be "smart enough". In this case, we would explicitly need to add
something like a frame number to the Packet, and the component would
have to figure out what to do with data originating from different
frames. Not ideal.

> 3) Streaming use cases (components laid out once and then operating on data
> streams). Each component called at a sample rate but perhaps only processing
> or packaging a 'packet' 1 out of every N calls or when all inputs have
> provided enough data to satisfy a block algorithm)
> e.g. do continuous processing at front end FIR filters, but explicitly
> decimate and manage a slower processing rate in the back end of the
> processing.

If you could put all of the logic into the components, you could do
this, but this is where the Dataflow.Signals framework would probably
be unsuitable. I started working on a different framework (called
Dataflow.Managed), which is an example of a smarter network. In what
I have so far, the framework takes care of things like "only invoke a
component when its inputs have changed", and figures out the correct
order of invocation, but one could similarly extend Dataflow.Managed
(or create a new framework) that would allow the user to specify
things like "only invoke a component when all it's inputs are ready",
or other things that are needed.

>
> This one is akin to much hardware I've designed where one may have clock
> domains and manage data events separate from processing events. In some
> places one can do processing every clock cycle, other places things are more
> complex.
>

One could make a Dataflow.Signals network where each component is
connected to a clock, so that each component does get control at every
clock cycle and does what it wants. But, the user would have to make
sure that the order of invocation is correct - and that's why I'd say
Dataflow.Signals is not the right tool for a network like this.

> 4) Not only are data events managed, but the processing flow itself may be
> dynamically altered - e.g. music contains some vocals so I'll add a flow
> through a 'remove vocals' process (normally don't bother) or perhaps other
> dynamic constraints from user interaction will dynamically alter the flows
> being used and there are sufficient permutations not to pre-allocate a full
> network with switched decision points.
>

This is not a problem - the network can be dynamically altered.

> Which of the above is dataflow suited to?
>

I hope the above illustrated what the Dataflow.Signals layer is
suitable or unsuitable for. Ideally, the Dataflow library as a whole
will evolve to a point where it can accommodate frameworks that can
handle all of the above appropriately - then it will be a matter of
providing implementations of frameworks that function in the
appropriate way (or writing support layers for frameworks that already
do).

> Thanks to all who have provided various interesting links. To add to that
> list...
> I found the descriptions and tutorials accompanying the product Gedae
> (www.gedae.com) to nicely decouple and present many of the concepts familiar
> to hardware designers that a software based dataflow library may also want
> to consider.
>
> I also found the research and development pages at insomniac games provide
> another useful perspective where they manage both data and component
> instantiation and flow on the parallel cores of the playstation 3.
> http://www.insomniacgames.com/tech/techpage.php . While they largely deal
> with implementation issues with their SPU shaders and Dynamic component
> system, the focus is making things work in a multi-core system where
> dataflow also needs awareness of the resources on which to run.
>
> Regards and good luck with the review
>

Thanks!

Kind regards,

Stjepan


Boost list run by bdawes at acm.org, gregod at cs.rpi.edu, cpdaniel at pacbell.net, john at johnmaddock.co.uk