From: Yigong Liu (yigongliu_at_[hidden])
Date: 2007-06-14 03:27:04
On 6/13/07, Giovanni Piero Deretta <gpderetta_at_[hidden]> wrote:
> I'm not really familiar with either the implementation of the Join
> library nor with lock-free design, but from a quick glance at the
> code, I see in boost/join/base/port.hpp:asynch_f  that you use
> mutex protected deque to enqueue async calls. This is probably one
> place that would benefit from a lock-free deque; also probably the
> fact that the max queue size is bounded might help with this
Thanks for spending time looking into code and reply. This is encouraging
that some experts start checking details.
Each object with async / synch / chords and inheriting actor class will have
only one mutex protect ALL states related to concurrency / synchronization /
asynchrony, which are distributed over async / synch / actor classes. One
scenario may better explain, suppose an async method is called, the calling
thread will first grab the mutex, then do the following:
1> push passed argument to queue,
2> turn on a bit in a bitmap representing the message arrival status of all
async / synch methods of this object,
3> check this "global" bitmap against the bitmaps of all chords to find
which chord has all the messages it need to be ready to fire
4> if a chord is ready to fire, do the following:
4.1> extract all the arguments from the queues of async / synch
methods of this chord, and bind theses arguments to the defined chord body
4.2> if the chord has a synch method, wake up the calling thread of
that synch method to run the chord body,
4.3> otherwise spawn a new task in executor's thread pool to run the
5> The calling thread of async method will release the mutex and return.
So the mutex is not only for protecting queue, it is used to protect the
whole process, its data structures and operations. All the queues and data
structures don't have any their own protections, they are plain STL
containers, all protected by this global mutex.
The above call-sequence and similar ones are critical to the performance of
Join. From Jocaml to Cw, people have done much to optimize these critical
paths. My implementations mostly follows Cw.
I don't know about lock-free algorithms. From what i learnt from browsing
web, i only see lock-free algorithms for basic data structures (stacks ,
queues) and algorithms, which may not be able to provide much help for the
above described call-sequence with heavily customized data structures. I
could be wrong, please correct.
> Not at all. Plain well designed lock-full (?) algorithm are fast
> enough 99% of the times (even as fast as lock-free). Is just that a
> more optimized implementation would cover even the remaining 1%. Do
> you have any benchmark? And not just synthetic benchmarks, but some
> real application written with both classic locks & condvars and with
> Join patterns. This would be useful to compare both the benefits in
> code simplicity and in performance (if any). Even if Join where
> slower, if it would really simplify the design of concurrent
> applications it would be a win.
I cannot agree more about this. I am looking any ways to improve under
current design. I have done some initial optimization with template partial
specializatin, for example. For async methods, there are two template
1> async<void(arg)>: this is async method which does pass argument, this
class use a queue to buffer the message
2> async<void(void)>: this is a async method which does not pass any
arguments, this class will not use queues, instead it use a integer to count
how many times it has been called. This ways the semantics is maintained
with much less overhead.
Based on method signatures, compiler will automatically choose the best-fit
template classes to use.
Similarly, for synch<> methods, there are four template classes defined to
allow compiler to choose.
I haven't done any benchmark comparison yet. I'll try to do some profiling
and benchmarking when i get the time and resources.
Some implementation of design documentation would be greatly useful,
> but I think most of it can be deduced from the Microsoft papers, is
> that right?
Yes, the current implementation mostly follow Cw paper. The most notable
differences are the handling of synchronous methods, exceptions and chord