Subject: Re: [boost] Flow-based programming library for Boost?
From: Dave Abrahams (dave_at_[hidden])
Date: 2012-12-16 15:03:19
on Sat Dec 15 2012, Julian Gonggrijp <j.gonggrijp-AT-gmail.com> wrote:
> Dave, I'm a bit confused after reading your post. In part I think
> we're not actually disagreeing about the relative merits of static and
> dynamic wiring and the possibility to combine them. In part I'm
> puzzled by what seems to me to be your bolder claim: the possibility
> of statically wired asynchronous concurrent message passing.
I don't see why it shouldn't be possible.
> In part I wonder whether my implicit assumptions about inter-thread
> message passing might have been too implicit. I'll try to touch upon
> each of these issues below, as coherently as I can manage.
> Dave Abrahams wrote:
>> on Wed Dec 12 2012, Julian Gonggrijp wrote:
>>> [...] If computations can take place /at the same time/ it does
>>> not really matter whether connections are made at compile-time or run-
>>> time; /waiting/ is eliminated altogether so the building of
>>> connections at run-time adds only constant overhead.
>> It's not just the cost of *building* of dynamic connections but the cost
>> of traversing them.
> I was assuming dynamic connections of the following type: a fixed-size
> continuous (circular) buffer in memory, a producer thread that is
> continuously appending data to the end of the buffer, and a consumer
> thread that is continuously removing data from the front of the
> buffer. This can be done lock-free and wait-free, but you're right
> that there's still some cost involved because of pointer arithmetic
> and the occasional cache miss.
I see. That could be less likely to benefit from static "wiring."
> Still I don't see how you could avoid this cost without giving up on
> the asynchronous behaviour altogether. I'll return to that question
>> Of course if your sub-computations happen to have
>> large granularity, these costs may be swamped by others,
> Here I believe we're not disagreeing at all. You seem to be implying
> that for very tightly interdependent computations static, single-
> threaded connections are most efficient, while for more coarse-grained
> computations concurrency is the best solution.
In the degenerate case where the dependency graph is strictly linear,
concurrency might be completely wasted unless you're able to take
advantage of pipelining.
And there's still a question of how that concurrency is carved up. If
you have a complicated dependency graph and you can run several of these
flow graphs in parallel, but each in a single thread (as described
below), that seems likely to induce less synchronization than if you try
to run the nodes of a single flow graph in parallel. Of course whether
that decomposition can work depends on your problem.
>> The accumulators library was first used in a massively-parallel context
>> with each process running thousands or millions of accumulator sets.
>> Obviously it really depends on your problem domain but that might be a
>> more efficient way to break down the parallel processing. Of course a
>> truly flexible system would (as Jeremiah suggested his design does)
>> allow you to combine static and dynamic wiring (sort of like Xpressive
>> you could easily use threads with a system like the accumulators
> Given the context of my assumptions and the text that you're replying
> to, in all of the above you seem to be suggesting that the
> asynchronous, buffered message passing that I described could also be
> implemented statically but still concurrently and asynchronously, in
> such a way that the mutable buffer is replaced by something in the
> instruction stream. I find it very hard to imagine how this could be
> done, because at compile time it's impossible to predict how the
> instruction streams of both threads will be aligned during run-time.
> If something like that could actually be done, *please* enlighten me.
Well, I have to admit, that sounds difficult. But you don't necessarily
need more than a single-value buffer for each sub-computation, do you?
> Alternatively I could take your remark "you could easily use threads
> with a system like the accumulators framework" in isolation, and then
> I'd interpret it completely differently: "you can build coarse-grained
> single-threaded computations from fine-grained ones using
> Accumulators, and then connect multiple of those coarse-grained
> computations asynchronously using dynamic buffers". As I said I agree
> with that, but given the context I'm unsure whether that is what you
Honestly, I'm not sure I had thought it through completely.
>>> Actually I think the Accumulators framework and lock-free message
>>> passing are completely orthogonal.
>> Exactly. It seems as though you're now contradicting yourself.
> I meant to say that the Accumulators framework and the (buffered
> asynchronous) lock-free message passing serve different needs and that
> these needs may occur independently, so there can be programs with
> both needs, hence calling for application of both techniques in
> parallel. As far as I'm concerned this is in line with everything I've
> said before.
> If this is your point, I must conclude that all of our discussion has
> been one big misunderstanding, because it was my point as well.
I no longer know exactly what my point *was*, but I could
comfortably say that's what my point *is*.
>>> One could use a "dataflow" library to connect threads at run-time
>>> while composing computations for the individual threads at
>>> compile-time with Boost.Accumulators, and get the best of both worlds.
>>> I hope I'm making sense. Please let me know what you think.
>> I think you eventually made sense :-)
> Assuming that our discussion was based on misunderstanding and that
> we're actually agreeing, one thing is still bothering me. You appear
> to believe that the same library should provide for both ways of
> composing computations, i.e. the static single-threaded one (like
> Accumulators) and the dynamic asynchronous one. I doubt that would be
> the right thing to do, because the modes of composition are not
> related in their underlying logic (contrary to the static and dynamic
> expressions in Xpressive) and because they pose very different
> requirements on the subcomputations that are to be composed.
You may be right.
> In the dynamic multithreaded mode of composition the producer and the
> consumer can be as unrelated as any function that writes to a
> std::vector and any other function that reads from it. As the
> Accumulators mode of composition involves much tighter coupling
> between the operations, they need to have much more in common, e.g.
> features and dependencies in Accumulators.
Hmm, I don't think I buy that last part of the argument, but I also
don't think we're very far apart now.
-- Dave Abrahams BoostPro Computing Software Development Training http://www.boostpro.com Clang/LLVM/EDG Compilers C++ Boost
Boost list run by bdawes at acm.org, gregod at cs.rpi.edu, cpdaniel at pacbell.net, john at johnmaddock.co.uk