Subject: Re: [boost] Flow-based programming library for Boost?
From: Dave Abrahams (dave_at_[hidden])
Date: 2012-12-17 13:05:27
on Sun Dec 16 2012, Julian Gonggrijp <j.gonggrijp-AT-gmail.com> wrote:
>> In the degenerate case where the dependency graph is strictly linear,
>> concurrency might be completely wasted unless you're able to take
>> advantage of pipelining.
> Isn't execution of linearly dependent operations always a pipeline?
Yes, but pipelining doesn't help if the input period exceeds the
> Assuming that, isn't every vertex in a dependency graph a pipeline by
I suppose it's a degenerate pipeline, but I don't know what you're
> Assuming that, isn't pipelining the entire purpose of any form of
> computational composition?
IMO the purpose of computational composition is to manage complexity
>> And there's still a question of how that concurrency is carved up. If
>> you have a complicated dependency graph and you can run several of these
>> flow graphs in parallel, but each in a single thread (as described
>> below), that seems likely to induce less synchronization than if you try
>> to run the nodes of a single flow graph in parallel. Of course whether
>> that decomposition can work depends on your problem.
> You have a point. But: suppose that you have two operations, A and B,
> and you want to output B(A(item)) for each item out of a collection of
> inputs. The alternatives that you're suggesting are
> (1) have operation A executed by one thread and operation B by the
> other, then pipeline these threads;
> (2) have two (or more) identical threads, each taking an equal share
> of the inputs and executing B o A.
> In scenario (2) the threads don't need to communicate, so you're right
> that it will involve less waiting if you generalize this to more
> complex computations, but now we're not taking into account that (1)
> is only a member out of a much larger class of solutions to the same
> problem. If we realistically assume that one operation takes longer
> than the other, let's say that B takes three times the amount of time
> of A, then the following network will prove superior to (1) and (2) on
> the condition that you have a CPU with at least four hardware threads:
> (1b) We use four execution threads (or any multiple). The first takes
> the inputs, applies operation A on them and divides the intermediate
> results between the other three threads. The other three threads apply
> operation B, each to their own share of the intermediate results.
> This also generalizes to more complex dependency graphs. In fact, the
> more operations are involved, the more likely you are to find
> significantly different execution times.
-- Dave Abrahams BoostPro Computing Software Development Training http://www.boostpro.com Clang/LLVM/EDG Compilers C++ Boost
Boost list run by bdawes at acm.org, gregod at cs.rpi.edu, cpdaniel at pacbell.net, john at johnmaddock.co.uk