Boost logo

Boost :

Subject: Re: [boost] [review] Dataflow Library
From: Paul Baxter (pauljbaxter_at_[hidden])
Date: 2008-09-07 11:43:16


> In Verilog (or VHDL), if I have two components that I want to "pipe"
> together I need to declare a wire that will be the channel for the
> communication and then declare the two components with this wire connected
> to the appropriate port. Something like this:
>
> wire[7:0] a;
> ExampleSource src (.the_output(a));
> ExampleSink sink (.the_input(a));
>
> As far as I am aware, neither language has syntax to pipe them together
> more concisely, i.e.
>
> ExampleSource src >>= ExampleSink sink;

I've been lurking and have faced many of these problems both in VHDL and
more recently in designing multi-processor pluggable signal processors in
software that go even further than dataflow with their dynamic
reconfigurability.

I've found the separate concept of a wire to be useful in some situations
(separating data distribution from the processing) particularly where flows
can be 'one to many' either as copies or read only, where in-place data
operations provide performance benefits and where persistence of the data
(even if just short term) is usefully managed by some form of controller.
Typically these all involve multiple threads so simple assumptions about
data availability cannot be made.

That said, when I've not needed it, the software complexity of having
explicit wires over something implicit in the linking of components is an
overhead.

I really like the dataflow work and am watching with interest, but ongoing
changes in my circumstances mean I can't do a full review or participate
more fully.

I would be interested in what the use cases are that dataflow is tackling.
e.g. which of:

1) 'Single shot' data flow - all components run once. Typically the data
being passed is pre-sized such that the size of the block on inputs gives
rise to the answer on the output.
 e.g. Take the whole music file and then filter it all in one hit to give
one answer

2) Packetised data flow (components laid out once and then operating on data
packets) with explicit packet aggregation by components and having
'do_processing' events driven by the availability of a packet from its
inputs. A component may consume three input packets and deliver one.
e.g. Process music in 20ms lumps e.g. using FFTs for filtering

3) Streaming use cases (components laid out once and then operating on data
streams). Each component called at a sample rate but perhaps only processing
or packaging a 'packet' 1 out of every N calls or when all inputs have
provided enough data to satisfy a block algorithm)
e.g. do continuous processing at front end FIR filters, but explicitly
decimate and manage a slower processing rate in the back end of the
processing.

This one is akin to much hardware I've designed where one may have clock
domains and manage data events separate from processing events. In some
places one can do processing every clock cycle, other places things are more
complex.

4) Not only are data events managed, but the processing flow itself may be
dynamically altered - e.g. music contains some vocals so I'll add a flow
through a 'remove vocals' process (normally don't bother) or perhaps other
dynamic constraints from user interaction will dynamically alter the flows
being used and there are sufficient permutations not to pre-allocate a full
network with switched decision points.

Which of the above is dataflow suited to?

Thanks to all who have provided various interesting links. To add to that
list...
I found the descriptions and tutorials accompanying the product Gedae
(www.gedae.com) to nicely decouple and present many of the concepts familiar
to hardware designers that a software based dataflow library may also want
to consider.

I also found the research and development pages at insomniac games provide
another useful perspective where they manage both data and component
instantiation and flow on the parallel cores of the playstation 3.
http://www.insomniacgames.com/tech/techpage.php . While they largely deal
with implementation issues with their SPU shaders and Dynamic component
system, the focus is making things work in a multi-core system where
dataflow also needs awareness of the resources on which to run.

Regards and good luck with the review

Paul Baxter


Boost list run by bdawes at acm.org, gregod at cs.rpi.edu, cpdaniel at pacbell.net, john at johnmaddock.co.uk