Boost logo

Boost Users :

Subject: Re: [Boost-users] thread_group::interrupt_all is not reliable
From: Stonewall Ballard (sb.list_at_[hidden])
Date: 2009-12-01 12:52:39


On Dec 1, 2009, at 11:43 AM, Peter Dimov wrote:

> Stonewall Ballard wrote:
>> On Dec 1, 2009, at 9:50 AM, Peter Dimov wrote:
>>
>>> I can't read the boost.thread code well enough to be able to
>>> diagnose the problem, but from a cursory look, it looks possible to
>>> me that condition_variable::wait may perform its interruption check,
>>> be preempted, the interrupt can proceed with setting
>>> interrupt_requested and doing a broadcast, and then
>>> condition_variable::wait to block on its pthread_cond_wait.
>>
>> I've found that all the threads are waiting on the condition variable
>> at the time that I interrupt them, so that scenario doesn't seem to
>> apply.
>
> They are before interrupt_all is issued, but then, as you yourself write:
>
>> I think that what's actually happening here is that 16 threads are
>> waiting on the same condition, then they're individually interrupted
>> (using the above code). Each interrupt wakes all the threads waiting
>> on the condition, where all but one thread locks the mutex, see that
>> it's not interrupted, then unlocks the mutex and waits.
>
> they are getting interrupted and awakened one by one, the end result being that condition_variable::wait and thread::interrupt are being called in parallel.
>
>> The actual interrupt code in pthread/thread.cpp is:
> ...
>> As you can see, each thread has a mutex protecting its internal
>> state, and the interrupter locks that mutex before setting
>> interrupt_requested to true. I think that this precludes a race
>> condition there.
>
> It doesn't.
>
> inline void condition_variable::wait(unique_lock<mutex>& m)
> {
> detail::interruption_checker check_for_interruption(&cond);
>
> // thread::interrupt is executed here
>
> BOOST_VERIFY(!pthread_cond_wait(&cond,m.mutex()->native_handle()));
> }
>

On Dec 1, 2009, at 12:12 PM, Anthony Williams wrote:

> Peter is right.
>
> The mutex that is locked across the broadcast protects the interrupt
> flag, but is distinct from the mutex that is associated with the
> condition variable. It is thus possible for the waiting thread to miss
> an interruption between when it checks for interruption and when it
> enters pthread_cond_wait.
>
> Anthony

I see that now. It's necessary for the interrupter to lock the condition variable mutex before interrupting the thread. The likelihood of failure would depend on the number of threads waiting on the condition variable, and the use of an interrupt_all() to interrupt them repeatedly.

So this really is a bug in boost::thread, at least with the pthreads implementation. I suppose that makes it easier to fix than a bug in pthreads.

Peter's suggestion:
On Dec 1, 2009, at 9:50 AM, Peter Dimov wrote:

> Your workaround does prevent it, but I think that boost.thread should take care of that internally, by storing a mutex pointer in the thread data, along with the cv pointer.

seems like the right thing to do for a general fix.

Thanks to all for your help. I'm confident now that my workaround will keep this from happening in my app, but a proper fix to boost::threads is necessary.

I'm surprised that I can't find any other reports of this problem.

 - Stoney

-- 
Stonewall Ballard 
stoney_at_[hidden]           http://stoney.sb.org/

Boost-users list run by williamkempf at hotmail.com, kalb at libertysoft.com, bjorn.karlsson at readsoft.com, gregod at cs.rpi.edu, wekempf at cox.net