Boost :

williamkempf_at_[hidden]

--
> --------
> // This implementation is unduly complicated due to the fact that 
> Windows does
> // not have a synchronization object with characteristics similar 
to 
> that of
> // a condition variable.  Note that the widely avaialable and copied
> // implementation of condition variables for Win32 described by 
> Douglas C.
> // Schmidt (http://www.cs.wustl.edu/~schmidt/win32-cv-1.html) has 
> fatal flaws,
> // so if there are bugs in this version, do not attepmt to replace 
it 
> with
> // that implementation.
> //
> ]
> 
> 
> Please respond to Ross Johnson <rpj_at_i...>
> Subject:  Re: win32 conditions: sem+counter+event = 
broadcast_deadlock
>       +spur.wakeup/unfairness/incorrectness ??
> 
> 
> 
>  Hi,
> 
>  Your message is much appreciated.
> 
>  I'm just back from vacation and am looking through the
>  discussion and patch. Expect to see your fix in the next
>  snapshot which I hope to get out soonish.
> 
>  Cheers.
>  Ross
> 
>  TEREKHOV_at_d... wrote:
>  >
>  > fyi.. (more detailed problem description/demos + possible 
> fix/patch)
>  >
>  > regards,
>  > alexander.
>  >
>  > Alexander Terekhov
>  > 31.01.2001 17:43
> 
>  - --
>  +----------------------+---+
>  | Ross Johnson         |   | E-Mail: rpj_at_i...
>  | Info Sciences and Eng|___|
>  | University of Canberra   | FAX:    +61 6 2015227
>  | PO Box 1                 |
>  | Belconnen  ACT    2616   | WWW:    
> http://willow.canberra.edu.au/~rpj/
>  | AUSTRALIA                |
>  +--------------------------+
> 
> 
> Please respond to "Nanbor Wang" <nanbor_at_c...> 
> cc:	"Louis Thomas" <lthomas_at_a...>, 
> rpj_at_i..., 
> "Douglas C. Schmidt" <schmidt_at_c...> 
> Subject:	Re: Win32 Conditions
> 
> 
> Hi Alexander,
> 
> Thank you very much for keeping us informed on your finding and 
> providing
> us detailed information on your research.  I am the current 
maintainer
> of CV implementation in ACE.  I will look over the information over 
> the
> weekend and verify/deny/comment your finding next week (I am not 
> ignoring
> you :-)  However, since we are in the process of preparing a beta 
> release,
> changes won't reflect on our release immediately.
> 
> Thanks,
> 
> nanbor
> 
> 
> Please respond to Louis Thomas <lthomas_at_a...> 
> To:	rpj_at_i..., Thomas Pfaff <tpfaff_at_g...>, 
> Nanbor Wang <nanbor_at_c...>
> cc:	 
> Subject:	RE: Win32 Conditions
> 
> 
> In this message, I am going to present 3 algorithms and discussion 
> about
> different approaches. I believe that one of these algorithms is the 
> best so
> far, so we may not want to create a new PThreads release until 
people 
> have
> had a chance to critique what I am going to present here.
> 
> I have studied Alexander's algorithm, and I believe it is correct. 
> However,
> I believe it has two shortcomings which can be overcome. One 
> shortcoming is
> that threads that time out waiting on the semaphore can induce 
> spurious
> wakeups later. This is easy to fix. The other shortcoming is that
> Alexander's locking is too strict. It blocks threads attempting to 
do
> signal/broadcast that don't really need to be blocked. I will get 
to 
> this,
> but I think it's best to start at the beginning.
> 
> There are many ways to implement condition variables in Win32. Some 
> ways,
> such as having an event associated with each waiting thread, are 
not 
> even
> being considered because they are too globally invasive. We have 
been
> concentrating on implementations with a small fixed number of Win32 
> sync
> objects per condition. One class of implementations has the threads 
> wait on
> one or more event objects. Schmidt & Pyarali list many of these in 
> their
> paper "Strategies for Implementing POSIX Condition Variables on 
Win32"
> [http://www.cs.wustl.edu/~schmidt/win32-cv-1.html], and, as they 
> indicate,
> none of them are correct. However, there is one that they did not 
> mention
> that is interesting because it is correct. Here it is in pseudocode 
> form:
> ---------- Algorithm 1 ----------
> given:
> ex_mutex - mutex
> cond - autoreset event
> nWaiters - int
> 
> wait(timeout) {
>   nWaiters++
>   bTimedOut=SignalAndWait(ex_mutex, cond, timeout)
>   lock ex_mutex
>   nWaiters--
>   return bTimedOut
> }
> 
> signal(bAll) {
>   do (bAll?nWaiters:1 times) {
>     PulseEvent(cond)
>   }
> }
> ---------- ---------- ----------
> This pseudocode assumes that ex_mutex is always held by the thread 
> calling
> signal, but is easily corrected by adding a CritSec or mutex to 
> protect
> nWaiters. I think that this algorithm meets all the requirements 
> specified
> by PThreads. There is no signal stealing, double broadcast 
deadlocks, 
> or
> spurious wake ups. (There may be spurious PulseEvents but they do 
not 
> cause
> wake ups.) It has the nice property that it requires the same 
number 
> of sync
> objects as the Unix version. However, it has one major drawback 
that 
> is not
> even mentioned in the PThread spec. In Win32, a thread that is in 
the
> 'suspended' state is not in the 'waiting' state. This means that if 
a 
> thread
> is waiting on an auto-reset event but is suspended when the 
PulseEvent
> occurs, it will miss the notification because it was not really 
> waiting on
> the event. Debuggers suspend and resume threads all the time. This 
> means
> that single stepping a program while a signal occurs can very 
easily 
> cause
> any number of wake ups to be missed and deadlock to occur. Because 
of 
> this
> problem, this implementation is basically unusable. PThreads does 
not 
> even
> say whether or not threads can be suspended, let alone how this 
should
> affect threads waiting on a condition.
> 
> The next class of implementations uses a semaphore to wake up 
waiting
> threads. A semaphore gives us control over exactly how many threads 
> are
> woken up, but we have no control over how long it takes them to 
wake 
> up.
> This makes semaphore implementations susceptible to signal 
stealing. 
> To
> prevent this, we have a new rule that our implementations must obey 
> in order
> to be correct:
>     A thread may not begin a wait on the semaphore until it is 
> guaranteed
> not to steal a signal.
> The "SignalObjectAndWait Solution" of Schmidt & Pyarali does not 
> follow this
> rule and is not correct. The current implementations in PThreads-
> Win32 and
> ACE use this solution and are not correct, as Alexander has 
> demonstrated.
> 
> How can we guarantee that we follow this rule? We need some kind of 
> gate. A
> thread must pass through the gate to wait on our semaphore. We can 
> close the
> gate after we do a signal, and the last thread to wake up can open 
> the gate
> again. We must be able to wait on the gate (if we try to pass and 
it 
> is
> closed) and we must be able to close the gate in one thread and 
open 
> it in
> another. The Win32 sync objects that meet these requirements are
> manual-reset events and semaphores. We have one more requirement: 
We 
> must
> know at all times exactly how many threads are in the waiting 
state. 
> This
> means that passing through the gate and incrementing the number of 
> waiting
> threads must be atomic.
> 
> We can do this pretty easily using a manual reset event, if the 
> object that
> protects nWaiters is a mutex.
> ---------- Algorithm 2 ----------
> given:
> gate - manual event
> queue - semaphore
> counters - mutex
> ex_mutex - CS or mutex
> nWaiters - int
> nSignaled - int
> 
> wait(timeout) {
>   waitmultiple(gate, counters, both) // pass gate
>   nWaiters++
>   unlock counters
>   unlock ex_mutex
>   bTimedOut=sem_wait(queue, timeout)
>   lock counters
>   if (bTimedOut && nWaiters>0) {
>     nWaiters--
>   } else {
>     if (bTimedOut) {
>       sem_wait queue  // better now than spurious later
>       bTimedout=false
>     }
>     nSignaled--
>     if (0==nSignaled) {
>       set gate // open gate
>     }
>   }
>   unlock counters
>   lock ex_mutex
>   return bTimedOut?ETIMEDOUT:0
> }
> 
> signal(bAll) {
>   lock counters
>   if (0!=nWaiters) {
>     reset gate // close gate
>     if (!bAll) {
>       sema_post queue
>       nSignaled++
>       nWaiters--
>     } else {
>       sema_post (queue, nWaiters)
>       nSignaled=nWaiters;
>       nWaiters=0;
>     }
>   }
>   unlock counters
> }
> ---------- ---------- ----------
> This implementation has no signal stealing, double broadcast 
> deadlocks, or
> spurious wake ups. It is nice and straight forward. On the other 
> hand, it
> requires 'counters' to be a mutex and not a critical section, which 
> means a
> speed hit. Can we do better? The big problem is the last requirement
> mentioned: passing through the gate and incrementing the number of 
> waiting
> threads must be atomic. If we use a critical section, we can't do 
the
> WaitForMultipleObjects that satisfied that atomicity requirements.
> 
> Now is a good time to look at Alexander's implementation. I present 
> it here
> in pseudocode I extracted from his PThread patch. My apologies if I 
> missed
> something.
> ---------- Alexander's Algorithm ----------
> given:
> BlockLock - semaphore
> BlockQueue - semaphore
> ex_mutex - mutex or CS
> UnblockLock - mutex or CS
> nWaitersBlocked - int
> nWaitersToUnblock - int
> nWaitersUnblocked - int
> 
> wait(timeout) {
> 
>     sema_wait BlockLock
>     nWaitersBlocked++
>     sema_post BlockLock
> 
>     unlock ex_mutex
>     bTimedOut=sema_wait(BlockQueue, timeout)
> 
>     lock UnblockLock
>     nWaitersUnblocked++
>     if (0==nWaitersToUnblock) {
>         lastsignal=-1
>     } else {
>         nWaitersToUnblock--
>         if (0==nWaitersToUnblock) {
>             lastsignal=1
>         } else {
>             lastsignal=0
>         }
>     }
>     unlock UnblockLock
> 
>     if (1==lastsignal) {
>         sema_post BlockLock
>     }
> 
>     lock ex_mutex;
>     return bTimedOut?ETIMEOUT:0
> }
> 
> signal(bAll) {
> 
>     sema_wait BlockLock
>     lock UnblockLock
>     if (nWaitersUnblocked!=0) {
>         nWaitersBlocked-=nWaitersUnblocked
>         nWaitersBlocked=0
>     }
>     if (nWaitersBlocked>0) {
>         toUnblock=
>             nWaitersToUnblock=
>                 bAll?nWaitersBlocked:1
>         unlock UnblockLock
>         sema_post(BlockQueue, toUnblock)
>     } else {
>         sema_post(BlockLock)
>         unlock UnblockLock
>     }
> }
> ---------- ---------- ----------
> As you can see, Alexander has probably followed reasoning similar 
to 
> my own,
> and is using the BlockLock semaphore to ensure that a thread may 
not 
> begin a
> wait on the BlockQueue semaphore until it is guaranteed not to 
steal a
> signal. This implementation has no signal stealing or double 
broadcast
> deadlocks. One shortcoming is that threads that time out waiting on 
> the
> semaphore can induce spurious wakeups later. Again, this is easy to 
> fix and
> is not really important. The other shortcoming is that Alexander's 
> locking
> is too strict. It blocks threads attempting to do signal/broadcast 
> that
> don't really need to be blocked. For example, assume there are 
three 
> threads
> waiting on a condition, and a different thread 
> does "pthread_cond_signal();
> pthread_cond_broadcast();". The broadcast will block until the 
signal
> completes, although it should be able to execute 'concurrently'. 
Our 
> rule
> states that a thread may not begin a wait on the BlockQueue 
semaphore 
> while
> the 'signal' is processing. The rules does NOT prevent us from 
> releasing
> more threads, so we should do so. In fact, when trying to do
> "pthread_cond_broadcast(); pthread_cond_signal();" this algorithm 
> will block
> on the signal even though the signal is probably a no-op!
> 
> Can we relax the blocking and make it more like the Algorithm 2? 
> There are
> two problems: One is maintaining the atomicity of passing the gate 
an
> incrementing nWaiters. (Note that by atomically incrementing 
nWaiters 
> I
> really mean locking 'counters'.) In order to allow critical 
sections, 
> we
> have to do this in two calls. The other problem is that closing the 
> gate
> when the gate is a semaphore is a blocking call (sema_wait). We 
can't 
> pass
> the gate while holding the 'counters' lock, because we will 
deadlock 
> when
> the gate is closed. We can't close the gate while holding 
> the 'counters'
> lock, because we will deadlock when the gate is being passed.
> * If we inc nWaiters before passing the gate, we could close the 
gate 
> and
> think we have more waiters than we do, and thus deadlock waiting 
for 
> the
> last waiter.
> * If we inc nWaiters after passing the gate, we will think we have 
> less
> waiters than we do, which means some won't be signaled. This is 
could 
> be OK,
> since we could claim we did a signal before those threads were 
really
> waiters, but we can't guarantee which threads will not wake up. We 
> will have
> signal stealing, which is incorrect.
> Note that this is a race condition that only happens when one 
thread 
> is
> trying to do a wait while another thread that did not lock the 
> ex_mutex
> tries to do a signal at the same time. If we always hold the 
> ex_mutex, we
> never run in to this situation.
> It turns out that in the second case, we can detect this situation 
> and deal
> with it! Then we can maintain the no stealing rule and we win. Here 
> is the
> code. Detecting the race condition makes it a bit messier.
> ---------- Algorithm 3 ----------
> given:
> gate - semaphore
> queue - semaphore
> counters - CS or mutex
> ex_mutex - CS or mutex
> nWaiters - int
> nSignaled - int
> bSigAll - bool // could be sign bit of nSignaled
> 
> wait(timeout) {
>   sem_wait gate // pass gate
>   lock counters
>   if (0==nSignaled) {
>     // normal
>     nWaiters++
>     sem_post gate
>   } else {
>     // oh no - in the middle of a signalOne or signalAll - gate 
stays 
> shut
>     if (!bSigAll) {
>       // we don't have to wake up. Be a waiter and see what happens
>       nWaiters++
>     } else {
>       // we have to wake up. We could just return - see discussion 
> below
>       nSignaled++;
>       sem_post queue
>     }
>   }
>   unlock counters
>   unlock ex_mutex
>   bTimedOut=sem_wait(queue, timeout)
>   lock counters
>   if (bTimedOut && nWaiters>0) {
>     nWaiters--;
>   } else {
>     if (bTimedOut) {
>       sem_wait queue  // better now than spurious later
>       bTimedout=false;
>     }
>     nSignaled--;
>     if (0==nSignaled) {
>       bSigAll=false;
>       sem_post gate // open gate
>     }
>   }
>   unlock counters
>   lock ex_mutex
>   return bTimedOut?ETIMEDOUT:0;
> }
> 
> signal(bAll) {
>   lock counters
>   if (0!=nWaiters) {
>     sema_wait(gate, 0) // close gate
>     if (!bAll) {
>       sema_post queue
>       nSignaled++;
>       nWaiters--;
>     } else {
>       bSigAll=true;
>       sema_post (queue, nWaiters)
>       nSignaled=nWaiters;
>       nWaiters=0;
>     }
>   }
>   unlock counters
> }
> ---------- ---------- ----------
> At one point we have the option of just returning. I think it is 
> probably
> better not to. The PThreads spec says that a timeout will always 
> unlock then
> lock ex_mutex. If we don't return immediately, but do full 
> processing, we
> match this, and we don't have to do anything extra to handle 
> cancellation.
> 
> This implementation has no signal stealing, double broadcast 
> deadlocks, or
> spurious wake ups. It requires four sync objects compared to Unix's 
> two, but
> no one has been able to find a good way around this. It can utilize 
> CritSecs
> for all the mutexes, which are faster than mutexes when there is 
not 
> much
> contention  (like for the 'counters' lock). It does not block any 
> signal()
> calls that can immediately succeed. It only blocks wait() calls 
that 
> can't
> proceed. It behave correctly even when threads are suspended and 
> resumed
> while it is executing, assuming threads are not suspended 
> indefinitely.
> 
> I think this makes Algorithm 3 the best available algorithm. What 
do 
> you
> think? Are there any bugs that I missed?
> 
> 	Later,
> 		-Louis! :)
This is definately helpful.  Just remember here that your dealing 
with people "not in the know" on this and you should do better at 
helping out with this.
Bill Kempf