Boost logo

Boost Users :

Subject: Re: [Boost-users] thread_group::interrupt_all is not reliable
From: Stonewall Ballard (sb.list_at_[hidden])
Date: 2009-12-01 09:53:46


On Dec 1, 2009, at 3:29 AM, Roland Bock wrote:

> Stonewall Ballard wrote:
>> I think I found the cause of this problem. It seems that the caller of interrupt_all should be holding the mutex associated with the condition on which the threads are waiting.
>> This gave me the clue to try that:
>> <http://www.opengroup.org/onlinepubs/009695399/functions/pthread_cond_broadcast.html>
>>> The pthread_cond_broadcast() or pthread_cond_signal() functions may be called by a thread whether or not it currently owns the mutex that threads calling pthread_cond_wait() or pthread_cond_timedwait() have associated with the condition variable during their waits; however, if predictable scheduling behavior is required, then that mutex shall be locked by the thread calling pthread_cond_broadcast() or pthread_cond_signal().
>> thread::interrupt() calls pthread_cond_broadcast in pthread/thread.cpp.
>> Although "predictable scheduling" doesn't seem like it should include a failure to wake up, taking the mutex around the call to thread_pool::interrupt_all() appears to be 100% reliable.
>> I can patch my app to do that, but I don't think there's a general solution. The documentation should include a note that thread::interrupt() isn't reliable unless the caller is holding the mutex associated with the condition variable on which the interrupted thread is waiting.
>> Of course, this could be a bug in the OS X pthreads implementation as well.
>
> Hi,
>
> FWIW, I ran that test of yours several times with varying parameters on my machine (quad core, 64bit, linux) and it did not show a single failure. Of course, since it is not a deterministic effect even on your machine, failure to reproduce does not really mean much, but well, I thought you might like to hear anyway :-)

Thanks, but this doesn't surprise me. Since the reliability drops rapidly as I add threads, I suspect it has something to do with running this on an 8-core (16 hyperthread) machine. I also suspect that it's a Mac OS bug.

> And I totally agree: Predictable scheduling should not be required to wake up all threads, especially since the document also says
>
> <snip>
> The pthread_cond_broadcast() or pthread_cond_signal() functions may be called by a thread whether or not it currently owns the mutex [...]
> </cite>

Of course, that could just be a promise that it won't crash. I hope you're right, though.

> As for boiling down the application for others to inspect:
> Your debugger showed that the thread is still in wait() after the interrupt call.
>
> Can you assure that ALL worker threads are in wait() prior to the interrupt?
> * If yes: There seems to be no connection with the interlocked queue,
> the sleep and so on. It should be possible to get rid of all that
> for a much simpler test program
> * If no: OK, there seems to be a connection between the wait(), the
> interrupt and the sleep and/or mutex.

Yes, I added code to check that, and all 16 threads were waiting on that condition when I interrupted them.

> In any case, I would assume that by analysing the situation right before the interrupt, you should be able to reproduce the problem with much less code.

If I understand this correctly, I should be able to reproduce it with pthreads alone (no boost). I'm going to try that when I get some time and file a bug report.

> Hope that helps in any way?

Yes, thanks. Reasoning about threads requires code review.

 - Stoney

-- 
Stonewall Ballard 
stoney_at_[hidden]           http://stoney.sb.org/

Boost-users list run by williamkempf at hotmail.com, kalb at libertysoft.com, bjorn.karlsson at readsoft.com, gregod at cs.rpi.edu, wekempf at cox.net