Boost logo

Boost :

Subject: Re: [boost] Flow-based programming library for Boost?
From: Dave Abrahams (dave_at_[hidden])
Date: 2012-12-15 17:32:40


on Wed Dec 12 2012, Julian Gonggrijp <j.gonggrijp-AT-gmail.com> wrote:

> Dave Abrahams wrote:
>
>> If you want such a library to be truly efficient, you should be able to
>> wire together the dataflows at compile-time and not pay any overhead for
>> composing large computations out of smaller ones. The accumulators
>> library uses something like a dataflow model, with the "wiring" done
>> completely at compile-time.
>
> Because of your remark I studied the Accumulators library, and I must
> say it's very elegant indeed. Dependencies between computations are
> resolved at compile-time (wow!) so they don't add unnecessary overhead
> at run-time and everything will be executed in the most efficient
> order.
>
> However, note the word "order". Boost.Accumulators was clearly
> designed under the assumption of single-threaded operation and tightly
> interdependent computations. I believe it might very well be the best
> possible approach under those assumptions, but when the assumptions
> don't apply other considerations may call for a different solution.
>
> "Dataflow" is a very general term, but the people who advertise with
> that word typically mean something more specific, i.e. lock-free
> multithreading with message passing. The underlying assumptions are
> more or less opposite to the assumptions in Boost.Accumulators:
> computations can run concurrently without waiting for each other, as
> long as they have input available and they can dispose of their output
> somewhere. If computations can take place /at the same time/ it does
> not really matter whether connections are made at compile-time or run-
> time; /waiting/ is eliminated altogether so the building of
> connections at run-time adds only constant overhead.

It's not just the cost of *building* of dynamic connections but the cost
of traversing them. Of course if your sub-computations happen to have
large granularity, these costs may be swamped by others, but a truly
flexible system would allow you to tune the approach to the problem.

The accumulators library was first used in a massively-parallel context
with each process running thousands or millions of accumulator sets.
Obviously it really depends on your problem domain but that might be a
more efficient way to break down the parallel processing. Of course a
truly flexible system would (as Jeremiah suggested his design does)
allow you to combine static and dynamic wiring (sort of like Xpressive
does).

> That constant overhead is worth it if accepting it helps you to avoid
> other problems. Note that the "wires" built by Accumulators at
> compile-time are /logical/ connections between /types/ that represent
> operations. Message passing "wires" are /physical/ connections between
> /threads/.

The accumulators library's wires are just as "physical" as any others,
they're just encoded in the instruction stream instead of in mutable
data. I don't see the relevance of your emphasis on the word "types."

> Instantiating the latter kind of connection at compile-time seems very
> hard, if not impossible.

No, you could easily use threads with a system like the accumulators
framework.

> Actually I think the Accumulators framework and lock-free message
> passing are completely orthogonal.

Exactly. It seems as though you're now contradicting yourself.

> One could use a "dataflow" library to connect threads at run-time
> while composing computations for the individual threads at
> compile-time with Boost.Accumulators, and get the best of both worlds.
>
> I hope I'm making sense. Please let me know what you think.

I think you eventually made sense :-)

-- 
Dave Abrahams
BoostPro Computing                  Software Development        Training
http://www.boostpro.com             Clang/LLVM/EDG Compilers  C++  Boost

Boost list run by bdawes at acm.org, gregod at cs.rpi.edu, cpdaniel at pacbell.net, john at johnmaddock.co.uk