Boost logo

Boost :

From: williamkempf_at_[hidden]
Date: 2001-03-15 19:23:53


--- In boost_at_y..., terekhov_at_y... wrote:
> --- In boost_at_y..., williamkempf_at_h... wrote:
> [...]
>
> > It would be even
> > more beneficial if some other group had evaluated your solutions
> > prior to this groups evaluation (bearing in mind that this group
is
> > full of C++ experts but has few people that are _truly_ "expert"
in
> > multi-threaded programming).
>
> sorry, but this is not me how is/was asking for review/evaluation
> here (critique with respect to your _threads_ library). in response
> to your request i've just made a claim that your CV impl. is
> incorrect and pointed to materials so that you and anybody else
> (who is interested) could make his own decision with respect to
> my claim (is valid or not). later _you_ requested more info from
> me and now you say to me that "You're not being too helpful
here" ???
> well..

I warned you that it could be taken as a flame, but was not meant to
be. Let me try and explain better now.

Your first post simply stated the implementation was incorrect.
Nothing more. This was not at all helpful. When I asked why not you
only told me to look on comp.programming.threads. Also not very
helpful. When I asked for more clarification you sent me to a link
with an article you wrote that did *NOT* prove the implementation was
incorrect, though it was much more helpful than what I'd gotten
before this. You also said you had a solution, which would have been
very helpful so I asked that you send it. You instead sent multiple
solutions with nothing to evaluate which, if any, of them produced a
correct implementation. More helpful than previous, but still it's
like pulling teeth on my end and what I have is not
evaluated/validated by anyone other than yourself.

Now do you understand why I said you're not being too helpful?

> > You stated at one point that
> > there was "a proper implementation for Win32 available"
> [...]
> > Firstly, your solutions are "available" only in the
> > sense that you posted them to the Usenet.
>
> please check my messages again. I never said that these are
> "my" solutions (in fact they are a product of joint effort)
> nor did I ever claim that these are "correct" solutions.
> I said _hopefully_ correct.

Even if it's a collaboration you had something to do with these
solutions. Further, you _DID_ claim that these solutions were "more
correct than ACE/pthread-win32" unless I majorly misread your post.
 
> as for evidence/reviews.. here is some info:
>
> From: Mike Haller_at_IBMUS on 03/13/2001 01:40 AM
> Subject: Win32 condition variables
>
>
> Alexander,
>
> I have read your postings on the problems with Doug Schmidt's
> approach to condition variables on Win32. I found his code
> while looking for a strategy to implement condition variables
> on win32, and noticed the same problems while reading the code.
>
> You made a request in your post for reviews of correctness
> and effeciency of your code, which I will happily oblige.
> I am wondering if you would be interested in reviewing my
> implementation as well? It seems a bit simpler that what you
> are proposing (it only requires one event and some
> manipulation with thread handles), but is restricted to
> NT & 2000 because of dependence on SignalObjectAndWait(). Let
> me know if you are interested and I will send you the code.
>
> Regards,
> Mike Haller
>
> [Mike's impl...
>
> //------------------------------------------------------------------

--
> --------
> // This implementation is unduly complicated due to the fact that 
> Windows does
> // not have a synchronization object with characteristics similar 
to 
> that of
> // a condition variable.  Note that the widely avaialable and copied
> // implementation of condition variables for Win32 described by 
> Douglas C.
> // Schmidt (http://www.cs.wustl.edu/~schmidt/win32-cv-1.html) has 
> fatal flaws,
> // so if there are bugs in this version, do not attepmt to replace 
it 
> with
> // that implementation.
> //
> ]
> 
> 
> Please respond to Ross Johnson <rpj_at_i...>
> Subject:  Re: win32 conditions: sem+counter+event = 
broadcast_deadlock
>       +spur.wakeup/unfairness/incorrectness ??
> 
> 
> 
>  Hi,
> 
>  Your message is much appreciated.
> 
>  I'm just back from vacation and am looking through the
>  discussion and patch. Expect to see your fix in the next
>  snapshot which I hope to get out soonish.
> 
>  Cheers.
>  Ross
> 
>  TEREKHOV_at_d... wrote:
>  >
>  > fyi.. (more detailed problem description/demos + possible 
> fix/patch)
>  >
>  > regards,
>  > alexander.
>  >
>  > Alexander Terekhov
>  > 31.01.2001 17:43
> 
>  - --
>  +----------------------+---+
>  | Ross Johnson         |   | E-Mail: rpj_at_i...
>  | Info Sciences and Eng|___|
>  | University of Canberra   | FAX:    +61 6 2015227
>  | PO Box 1                 |
>  | Belconnen  ACT    2616   | WWW:    
> http://willow.canberra.edu.au/~rpj/
>  | AUSTRALIA                |
>  +--------------------------+
> 
> 
> Please respond to "Nanbor Wang" <nanbor_at_c...> 
> cc:	"Louis Thomas" <lthomas_at_a...>, 
> rpj_at_i..., 
> "Douglas C. Schmidt" <schmidt_at_c...> 
> Subject:	Re: Win32 Conditions
> 
> 
> Hi Alexander,
> 
> Thank you very much for keeping us informed on your finding and 
> providing
> us detailed information on your research.  I am the current 
maintainer
> of CV implementation in ACE.  I will look over the information over 
> the
> weekend and verify/deny/comment your finding next week (I am not 
> ignoring
> you :-)  However, since we are in the process of preparing a beta 
> release,
> changes won't reflect on our release immediately.
> 
> Thanks,
> 
> nanbor
> 
> 
> Please respond to Louis Thomas <lthomas_at_a...> 
> To:	rpj_at_i..., Thomas Pfaff <tpfaff_at_g...>, 
> Nanbor Wang <nanbor_at_c...>
> cc:	 
> Subject:	RE: Win32 Conditions
> 
> 
> In this message, I am going to present 3 algorithms and discussion 
> about
> different approaches. I believe that one of these algorithms is the 
> best so
> far, so we may not want to create a new PThreads release until 
people 
> have
> had a chance to critique what I am going to present here.
> 
> I have studied Alexander's algorithm, and I believe it is correct. 
> However,
> I believe it has two shortcomings which can be overcome. One 
> shortcoming is
> that threads that time out waiting on the semaphore can induce 
> spurious
> wakeups later. This is easy to fix. The other shortcoming is that
> Alexander's locking is too strict. It blocks threads attempting to 
do
> signal/broadcast that don't really need to be blocked. I will get 
to 
> this,
> but I think it's best to start at the beginning.
> 
> There are many ways to implement condition variables in Win32. Some 
> ways,
> such as having an event associated with each waiting thread, are 
not 
> even
> being considered because they are too globally invasive. We have 
been
> concentrating on implementations with a small fixed number of Win32 
> sync
> objects per condition. One class of implementations has the threads 
> wait on
> one or more event objects. Schmidt & Pyarali list many of these in 
> their
> paper "Strategies for Implementing POSIX Condition Variables on 
Win32"
> [http://www.cs.wustl.edu/~schmidt/win32-cv-1.html], and, as they 
> indicate,
> none of them are correct. However, there is one that they did not 
> mention
> that is interesting because it is correct. Here it is in pseudocode 
> form:
> ---------- Algorithm 1 ----------
> given:
> ex_mutex - mutex
> cond - autoreset event
> nWaiters - int
> 
> wait(timeout) {
>   nWaiters++
>   bTimedOut=SignalAndWait(ex_mutex, cond, timeout)
>   lock ex_mutex
>   nWaiters--
>   return bTimedOut
> }
> 
> signal(bAll) {
>   do (bAll?nWaiters:1 times) {
>     PulseEvent(cond)
>   }
> }
> ---------- ---------- ----------
> This pseudocode assumes that ex_mutex is always held by the thread 
> calling
> signal, but is easily corrected by adding a CritSec or mutex to 
> protect
> nWaiters. I think that this algorithm meets all the requirements 
> specified
> by PThreads. There is no signal stealing, double broadcast 
deadlocks, 
> or
> spurious wake ups. (There may be spurious PulseEvents but they do 
not 
> cause
> wake ups.) It has the nice property that it requires the same 
number 
> of sync
> objects as the Unix version. However, it has one major drawback 
that 
> is not
> even mentioned in the PThread spec. In Win32, a thread that is in 
the
> 'suspended' state is not in the 'waiting' state. This means that if 
a 
> thread
> is waiting on an auto-reset event but is suspended when the 
PulseEvent
> occurs, it will miss the notification because it was not really 
> waiting on
> the event. Debuggers suspend and resume threads all the time. This 
> means
> that single stepping a program while a signal occurs can very 
easily 
> cause
> any number of wake ups to be missed and deadlock to occur. Because 
of 
> this
> problem, this implementation is basically unusable. PThreads does 
not 
> even
> say whether or not threads can be suspended, let alone how this 
should
> affect threads waiting on a condition.
> 
> The next class of implementations uses a semaphore to wake up 
waiting
> threads. A semaphore gives us control over exactly how many threads 
> are
> woken up, but we have no control over how long it takes them to 
wake 
> up.
> This makes semaphore implementations susceptible to signal 
stealing. 
> To
> prevent this, we have a new rule that our implementations must obey 
> in order
> to be correct:
>     A thread may not begin a wait on the semaphore until it is 
> guaranteed
> not to steal a signal.
> The "SignalObjectAndWait Solution" of Schmidt & Pyarali does not 
> follow this
> rule and is not correct. The current implementations in PThreads-
> Win32 and
> ACE use this solution and are not correct, as Alexander has 
> demonstrated.
> 
> How can we guarantee that we follow this rule? We need some kind of 
> gate. A
> thread must pass through the gate to wait on our semaphore. We can 
> close the
> gate after we do a signal, and the last thread to wake up can open 
> the gate
> again. We must be able to wait on the gate (if we try to pass and 
it 
> is
> closed) and we must be able to close the gate in one thread and 
open 
> it in
> another. The Win32 sync objects that meet these requirements are
> manual-reset events and semaphores. We have one more requirement: 
We 
> must
> know at all times exactly how many threads are in the waiting 
state. 
> This
> means that passing through the gate and incrementing the number of 
> waiting
> threads must be atomic.
> 
> We can do this pretty easily using a manual reset event, if the 
> object that
> protects nWaiters is a mutex.
> ---------- Algorithm 2 ----------
> given:
> gate - manual event
> queue - semaphore
> counters - mutex
> ex_mutex - CS or mutex
> nWaiters - int
> nSignaled - int
> 
> wait(timeout) {
>   waitmultiple(gate, counters, both) // pass gate
>   nWaiters++
>   unlock counters
>   unlock ex_mutex
>   bTimedOut=sem_wait(queue, timeout)
>   lock counters
>   if (bTimedOut && nWaiters>0) {
>     nWaiters--
>   } else {
>     if (bTimedOut) {
>       sem_wait queue  // better now than spurious later
>       bTimedout=false
>     }
>     nSignaled--
>     if (0==nSignaled) {
>       set gate // open gate
>     }
>   }
>   unlock counters
>   lock ex_mutex
>   return bTimedOut?ETIMEDOUT:0
> }
> 
> signal(bAll) {
>   lock counters
>   if (0!=nWaiters) {
>     reset gate // close gate
>     if (!bAll) {
>       sema_post queue
>       nSignaled++
>       nWaiters--
>     } else {
>       sema_post (queue, nWaiters)
>       nSignaled=nWaiters;
>       nWaiters=0;
>     }
>   }
>   unlock counters
> }
> ---------- ---------- ----------
> This implementation has no signal stealing, double broadcast 
> deadlocks, or
> spurious wake ups. It is nice and straight forward. On the other 
> hand, it
> requires 'counters' to be a mutex and not a critical section, which 
> means a
> speed hit. Can we do better? The big problem is the last requirement
> mentioned: passing through the gate and incrementing the number of 
> waiting
> threads must be atomic. If we use a critical section, we can't do 
the
> WaitForMultipleObjects that satisfied that atomicity requirements.
> 
> Now is a good time to look at Alexander's implementation. I present 
> it here
> in pseudocode I extracted from his PThread patch. My apologies if I 
> missed
> something.
> ---------- Alexander's Algorithm ----------
> given:
> BlockLock - semaphore
> BlockQueue - semaphore
> ex_mutex - mutex or CS
> UnblockLock - mutex or CS
> nWaitersBlocked - int
> nWaitersToUnblock - int
> nWaitersUnblocked - int
> 
> wait(timeout) {
> 
>     sema_wait BlockLock
>     nWaitersBlocked++
>     sema_post BlockLock
> 
>     unlock ex_mutex
>     bTimedOut=sema_wait(BlockQueue, timeout)
> 
>     lock UnblockLock
>     nWaitersUnblocked++
>     if (0==nWaitersToUnblock) {
>         lastsignal=-1
>     } else {
>         nWaitersToUnblock--
>         if (0==nWaitersToUnblock) {
>             lastsignal=1
>         } else {
>             lastsignal=0
>         }
>     }
>     unlock UnblockLock
> 
>     if (1==lastsignal) {
>         sema_post BlockLock
>     }
> 
>     lock ex_mutex;
>     return bTimedOut?ETIMEOUT:0
> }
> 
> signal(bAll) {
> 
>     sema_wait BlockLock
>     lock UnblockLock
>     if (nWaitersUnblocked!=0) {
>         nWaitersBlocked-=nWaitersUnblocked
>         nWaitersBlocked=0
>     }
>     if (nWaitersBlocked>0) {
>         toUnblock=
>             nWaitersToUnblock=
>                 bAll?nWaitersBlocked:1
>         unlock UnblockLock
>         sema_post(BlockQueue, toUnblock)
>     } else {
>         sema_post(BlockLock)
>         unlock UnblockLock
>     }
> }
> ---------- ---------- ----------
> As you can see, Alexander has probably followed reasoning similar 
to 
> my own,
> and is using the BlockLock semaphore to ensure that a thread may 
not 
> begin a
> wait on the BlockQueue semaphore until it is guaranteed not to 
steal a
> signal. This implementation has no signal stealing or double 
broadcast
> deadlocks. One shortcoming is that threads that time out waiting on 
> the
> semaphore can induce spurious wakeups later. Again, this is easy to 
> fix and
> is not really important. The other shortcoming is that Alexander's 
> locking
> is too strict. It blocks threads attempting to do signal/broadcast 
> that
> don't really need to be blocked. For example, assume there are 
three 
> threads
> waiting on a condition, and a different thread 
> does "pthread_cond_signal();
> pthread_cond_broadcast();". The broadcast will block until the 
signal
> completes, although it should be able to execute 'concurrently'. 
Our 
> rule
> states that a thread may not begin a wait on the BlockQueue 
semaphore 
> while
> the 'signal' is processing. The rules does NOT prevent us from 
> releasing
> more threads, so we should do so. In fact, when trying to do
> "pthread_cond_broadcast(); pthread_cond_signal();" this algorithm 
> will block
> on the signal even though the signal is probably a no-op!
> 
> Can we relax the blocking and make it more like the Algorithm 2? 
> There are
> two problems: One is maintaining the atomicity of passing the gate 
an
> incrementing nWaiters. (Note that by atomically incrementing 
nWaiters 
> I
> really mean locking 'counters'.) In order to allow critical 
sections, 
> we
> have to do this in two calls. The other problem is that closing the 
> gate
> when the gate is a semaphore is a blocking call (sema_wait). We 
can't 
> pass
> the gate while holding the 'counters' lock, because we will 
deadlock 
> when
> the gate is closed. We can't close the gate while holding 
> the 'counters'
> lock, because we will deadlock when the gate is being passed.
> * If we inc nWaiters before passing the gate, we could close the 
gate 
> and
> think we have more waiters than we do, and thus deadlock waiting 
for 
> the
> last waiter.
> * If we inc nWaiters after passing the gate, we will think we have 
> less
> waiters than we do, which means some won't be signaled. This is 
could 
> be OK,
> since we could claim we did a signal before those threads were 
really
> waiters, but we can't guarantee which threads will not wake up. We 
> will have
> signal stealing, which is incorrect.
> Note that this is a race condition that only happens when one 
thread 
> is
> trying to do a wait while another thread that did not lock the 
> ex_mutex
> tries to do a signal at the same time. If we always hold the 
> ex_mutex, we
> never run in to this situation.
> It turns out that in the second case, we can detect this situation 
> and deal
> with it! Then we can maintain the no stealing rule and we win. Here 
> is the
> code. Detecting the race condition makes it a bit messier.
> ---------- Algorithm 3 ----------
> given:
> gate - semaphore
> queue - semaphore
> counters - CS or mutex
> ex_mutex - CS or mutex
> nWaiters - int
> nSignaled - int
> bSigAll - bool // could be sign bit of nSignaled
> 
> wait(timeout) {
>   sem_wait gate // pass gate
>   lock counters
>   if (0==nSignaled) {
>     // normal
>     nWaiters++
>     sem_post gate
>   } else {
>     // oh no - in the middle of a signalOne or signalAll - gate 
stays 
> shut
>     if (!bSigAll) {
>       // we don't have to wake up. Be a waiter and see what happens
>       nWaiters++
>     } else {
>       // we have to wake up. We could just return - see discussion 
> below
>       nSignaled++;
>       sem_post queue
>     }
>   }
>   unlock counters
>   unlock ex_mutex
>   bTimedOut=sem_wait(queue, timeout)
>   lock counters
>   if (bTimedOut && nWaiters>0) {
>     nWaiters--;
>   } else {
>     if (bTimedOut) {
>       sem_wait queue  // better now than spurious later
>       bTimedout=false;
>     }
>     nSignaled--;
>     if (0==nSignaled) {
>       bSigAll=false;
>       sem_post gate // open gate
>     }
>   }
>   unlock counters
>   lock ex_mutex
>   return bTimedOut?ETIMEDOUT:0;
> }
> 
> signal(bAll) {
>   lock counters
>   if (0!=nWaiters) {
>     sema_wait(gate, 0) // close gate
>     if (!bAll) {
>       sema_post queue
>       nSignaled++;
>       nWaiters--;
>     } else {
>       bSigAll=true;
>       sema_post (queue, nWaiters)
>       nSignaled=nWaiters;
>       nWaiters=0;
>     }
>   }
>   unlock counters
> }
> ---------- ---------- ----------
> At one point we have the option of just returning. I think it is 
> probably
> better not to. The PThreads spec says that a timeout will always 
> unlock then
> lock ex_mutex. If we don't return immediately, but do full 
> processing, we
> match this, and we don't have to do anything extra to handle 
> cancellation.
> 
> This implementation has no signal stealing, double broadcast 
> deadlocks, or
> spurious wake ups. It requires four sync objects compared to Unix's 
> two, but
> no one has been able to find a good way around this. It can utilize 
> CritSecs
> for all the mutexes, which are faster than mutexes when there is 
not 
> much
> contention  (like for the 'counters' lock). It does not block any 
> signal()
> calls that can immediately succeed. It only blocks wait() calls 
that 
> can't
> proceed. It behave correctly even when threads are suspended and 
> resumed
> while it is executing, assuming threads are not suspended 
> indefinitely.
> 
> I think this makes Algorithm 3 the best available algorithm. What 
do 
> you
> think? Are there any bugs that I missed?
> 
> 	Later,
> 		-Louis! :)
This is definately helpful.  Just remember here that your dealing 
with people "not in the know" on this and you should do better at 
helping out with this.
Bill Kempf

Boost list run by bdawes at acm.org, gregod at cs.rpi.edu, cpdaniel at pacbell.net, john at johnmaddock.co.uk