Boost logo

Boost :

Subject: Re: [boost] [thread] Request review of new synchronisation object, boost::permit<>
From: Niall Douglas (s_sourceforge_at_[hidden])
Date: 2014-05-05 13:37:26


On 5 May 2014 at 17:26, Peter Dimov wrote:

> "You will never see spurious wakeups from a permit object -- [...]. This
> means you don't need to do any wait predicate checking with permits..."
>
> I don't think that this is true. Suppose we have two producer threads Tp1
> and Tp2 and two consumer treads Tc1 and Tc2, and a predicate P. I'll treat
> the predicate as a simple boolean variable below, for simplicity.
>
> A producer thread sets P=1 and calls notify_one. A consumer thread calls
> wait and then sets P=0. (It has to set P=0 regardless of how many producers
> have set the predicate to 1, because - I assume - the permit object doesn't
> count the pending notify_one calls, unlike a semaphore and like an event.)
>
> So you have
>
> Initially P is 0.
> Tc1, Tc2 call wait.
> Tp1 sets P=1 and calls notify_one. Tc1 is unblocked, returns from wait, but
> is immediately suspended by the scheduler for unfathomable scheduling
> reasons, like its code causing a page fault.
> Tp2 sets P=1 and calls notify_one. Tc2 is unblocked, returns from wait, sets
> P=0.
> Tc1 is resumed, sees P=0, panics.
>
> I suspect that your guarantees, that grant/notify_one calls are blocking and
> mutually exclusive, were designed to prevent this, but they don't and can't,
> in theory. They _can_ make the above extremely improbable, but they aren't
> theoretically sound.

That isn't the reason actually, but I'll get back to that.

Firstly, permits are a notification object, not a serialisation
object. If you have some predicate P whose state you are changing you
must protect it with a mutex, just like any other code. This is why
permit.wait() takes a locked mutex.

Reworking your example thusly:

Initially P is 0 and p_mutex is unlocked.
Tc1, Tc2 lock p_mutex and call wait.
Tp1 locks p_mutex, sets P=1 and calls notify_one. Upon Tp1 releasing
the mutex, Tc1 is unblocked with mutex locked, returns from wait,
but is immediately suspended by the scheduler for unfathomable
scheduling
reasons, like its code causing a page fault. Tp2 tries to set P=1,
but gets blocked on the mutex being held by Tc1. When Tc1 is resumed
and eventually unlocks the mutex, everything else proceeds as normal.

Maybe you meant the fact that the permit contains its own predicate,
and hence no need for predicate checking? If so, then P equals the
state of the permit, and:

Initially P is 0.
Tc1, Tc2 call wait.
Tp1 sets P=1. Tc1 is unblocked, thus atomically resetting P=0,
returns from wait, but is immediately suspended by the scheduler for
unfathomable scheduling reasons, like its code causing a page fault.
Tp2 sets P=1. Tc2 is unblocked, thus atomically resetting P=0,
returns from wait.
Tc1 is resumed, all is well. Exactly the number of waiters were freed
as granters.

Perhaps though in fact you were more concerned about two threads
granting a permit, but only one thread getting woken? The problem
here is that in some cases this is exactly what you want, in other
cases it is a lost wakeup and you should use a semaphore instead.
Similarly, revoking non-consuming permits is always racy unless you
have added synchronisation to ensure you're not being stupid.

All threading primitives have their gotchas, no doubt. The question
here is if the presented implementation of this permit is a wise
addition to Boost.

> I suspect that your guarantees, that grant/notify_one calls are blocking and
> mutually exclusive, were designed to prevent this, but they don't and can't,
> in theory. They _can_ make the above extremely improbable, but they aren't
> theoretically sound.

Grants being blocking for non-consuming permits is actually a time
complexity guarantee, so you are being guaranteed progress no matter
what. The Windows kernel can "cheat" here in ways we cannot in
portable code.

> The specification of condition variables allows spurious wakeups not
> because they can't be prevented by the implementation; it permits spurious
> wakeups to make client code assume spurious wakeups, because code that is
> written to assume spurious wakeups is more likely to be correct. Or
> conversely, code that does not assume spurious wakeups is likely to be
> incorrect even if no spurious wakeups occur.

I would have said spurious wakeups come exclusively from kernel bugs
and the fact POSIX allows signals to escape syscalls, which is
definitely a pre-threading era legacy design choice. On Windows
spurious wakeups are trapped for you in user space so you never see
them and therefore never have to deal with them, unless of course you
elect to do so (e.g. any of the wait APIs able to return
WAIT_IO_COMPLETION).

Spurious wakeups absolutely can be prevented by the implementation,
just for backwards compatibility POSIX cannot do so.

Niall

-- 
ned Productions Limited Consulting
http://www.nedproductions.biz/ 
http://ie.linkedin.com/in/nialldouglas/



Boost list run by bdawes at acm.org, gregod at cs.rpi.edu, cpdaniel at pacbell.net, john at johnmaddock.co.uk