Boost logo

Boost :

Subject: Re: [boost] Flow-based programming library for Boost?
From: Dave Abrahams (dave_at_[hidden])
Date: 2012-12-16 15:03:19


on Sat Dec 15 2012, Julian Gonggrijp <j.gonggrijp-AT-gmail.com> wrote:

> Dave, I'm a bit confused after reading your post. In part I think
> we're not actually disagreeing about the relative merits of static and
> dynamic wiring and the possibility to combine them. In part I'm
> puzzled by what seems to me to be your bolder claim: the possibility
> of statically wired asynchronous concurrent message passing.

I don't see why it shouldn't be possible.

> In part I wonder whether my implicit assumptions about inter-thread
> message passing might have been too implicit. I'll try to touch upon
> each of these issues below, as coherently as I can manage.
>
> Dave Abrahams wrote:
>
>> on Wed Dec 12 2012, Julian Gonggrijp wrote:
>>
>>> [...] If computations can take place /at the same time/ it does
>>> not really matter whether connections are made at compile-time or run-
>>> time; /waiting/ is eliminated altogether so the building of
>>> connections at run-time adds only constant overhead.
>>
>> It's not just the cost of *building* of dynamic connections but the cost
>> of traversing them.
>
> I was assuming dynamic connections of the following type: a fixed-size
> continuous (circular) buffer in memory, a producer thread that is
> continuously appending data to the end of the buffer, and a consumer
> thread that is continuously removing data from the front of the
> buffer. This can be done lock-free and wait-free, but you're right
> that there's still some cost involved because of pointer arithmetic
> and the occasional cache miss.

I see. That could be less likely to benefit from static "wiring."

> Still I don't see how you could avoid this cost without giving up on
> the asynchronous behaviour altogether. I'll return to that question
> below.
>
>> Of course if your sub-computations happen to have
>> large granularity, these costs may be swamped by others,
>
> Here I believe we're not disagreeing at all. You seem to be implying
> that for very tightly interdependent computations static, single-
> threaded connections are most efficient, while for more coarse-grained
> computations concurrency is the best solution.

s/is/might be/

In the degenerate case where the dependency graph is strictly linear,
concurrency might be completely wasted unless you're able to take
advantage of pipelining.

And there's still a question of how that concurrency is carved up. If
you have a complicated dependency graph and you can run several of these
flow graphs in parallel, but each in a single thread (as described
below), that seems likely to induce less synchronization than if you try
to run the nodes of a single flow graph in parallel. Of course whether
that decomposition can work depends on your problem.

>> The accumulators library was first used in a massively-parallel context
>> with each process running thousands or millions of accumulator sets.
>> Obviously it really depends on your problem domain but that might be a
>> more efficient way to break down the parallel processing. Of course a
>> truly flexible system would (as Jeremiah suggested his design does)
>> allow you to combine static and dynamic wiring (sort of like Xpressive
>> does).

<schnipp>

>> you could easily use threads with a system like the accumulators
>> framework.
>
> Given the context of my assumptions and the text that you're replying
> to, in all of the above you seem to be suggesting that the
> asynchronous, buffered message passing that I described could also be
> implemented statically but still concurrently and asynchronously, in
> such a way that the mutable buffer is replaced by something in the
> instruction stream. I find it very hard to imagine how this could be
> done, because at compile time it's impossible to predict how the
> instruction streams of both threads will be aligned during run-time.
> If something like that could actually be done, *please* enlighten me.
> :-)

Well, I have to admit, that sounds difficult. But you don't necessarily
need more than a single-value buffer for each sub-computation, do you?

> Alternatively I could take your remark "you could easily use threads
> with a system like the accumulators framework" in isolation, and then
> I'd interpret it completely differently: "you can build coarse-grained
> single-threaded computations from fine-grained ones using
> Accumulators, and then connect multiple of those coarse-grained
> computations asynchronously using dynamic buffers". As I said I agree
> with that, but given the context I'm unsure whether that is what you
> meant.

Honestly, I'm not sure I had thought it through completely.

>>> Actually I think the Accumulators framework and lock-free message
>>> passing are completely orthogonal.
>>
>> Exactly. It seems as though you're now contradicting yourself.
>
> I meant to say that the Accumulators framework and the (buffered
> asynchronous) lock-free message passing serve different needs and that
> these needs may occur independently, so there can be programs with
> both needs, hence calling for application of both techniques in
> parallel. As far as I'm concerned this is in line with everything I've
> said before.
>
> If this is your point, I must conclude that all of our discussion has
> been one big misunderstanding, because it was my point as well.

I no longer know exactly what my point *was*, but I could
comfortably say that's what my point *is*.

 ;-)

>>> One could use a "dataflow" library to connect threads at run-time
>>> while composing computations for the individual threads at
>>> compile-time with Boost.Accumulators, and get the best of both worlds.
>>>
>>> I hope I'm making sense. Please let me know what you think.
>>
>> I think you eventually made sense :-)
>
> Assuming that our discussion was based on misunderstanding and that
> we're actually agreeing, one thing is still bothering me. You appear
> to believe that the same library should provide for both ways of
> composing computations, i.e. the static single-threaded one (like
> Accumulators) and the dynamic asynchronous one. I doubt that would be
> the right thing to do, because the modes of composition are not
> related in their underlying logic (contrary to the static and dynamic
> expressions in Xpressive) and because they pose very different
> requirements on the subcomputations that are to be composed.

You may be right.

> In the dynamic multithreaded mode of composition the producer and the
> consumer can be as unrelated as any function that writes to a
> std::vector and any other function that reads from it. As the
> Accumulators mode of composition involves much tighter coupling
> between the operations, they need to have much more in common, e.g.
> features and dependencies in Accumulators.

Hmm, I don't think I buy that last part of the argument, but I also
don't think we're very far apart now.

-- 
Dave Abrahams
BoostPro Computing                  Software Development        Training
http://www.boostpro.com             Clang/LLVM/EDG Compilers  C++  Boost

Boost list run by bdawes at acm.org, gregod at cs.rpi.edu, cpdaniel at pacbell.net, john at johnmaddock.co.uk