Boost logo

Boost :

Subject: Re: [boost] [thread 1.48] Multiple interrupt/timed_join leads to deadlock
From: Anthony Williams (anthony.ajw_at_[hidden])
Date: 2012-12-05 07:39:12


On 05/12/12 11:33, Gaetano Mendola wrote:
> On 05/12/2012 09.16, Anthony Williams wrote:
>> On 04/12/12 18:32, Gaetano Mendola wrote:
>>> Hi all,
>>> I was investigating a rare deadlock when issuing an interrupt and
>>> a timed_join in parallel. I come out with the the following code
>>> showing the behavior.
>>>
>>> The deadlock is rare so sometime you need to wait a bit.
>>>
>>> I couldn't try it with boost 1.52 because the code is invalid
>>> due the precondition of "thread joinable" when issuing the
>>> timed_join.
>>
>> That's a hint.
>>
>>> Is the code not valid or a real bug?
>>
>> The code is invalid: you keep trying to interrupt and join even after
>> the thread has been joined! Once the thread has been joined, the thread
>> handle is no longer valid, and you should exit the loop.
>
> I haven't seen this statement in the documentation.

The thread object itself is not thread-safe --- a given thread object
should have a single owner. Only one thread can perform an operation on
that thread object at a time. If you wish to call member functions on it
from multiple threads then they need synchronizing.

> The loop was meant to exploit exactly this, then you are confirming that
> interrupting a joined thread is not valid. How do I safely interrupt
> then a thread?

Yes, you cannot interrupt (or do ANYTHING at all to) a joined thread.
After joining with a thread the thread object is no longer associated
with the thread --- the thread itself has terminated and all resources
are cleaned up.

To interrupt a thread you call interrupt(). Just don't call it
concurrently with anything else.

> There is no "atomic" check_joinable_then_interrupt, whatching at the
> interrupt code it seems that the check is done inside. I'm lost.

True. This is not necessary. A simple if(t.joinable()) t.interrupt()
will suffice, since no other thread can be legally accessing your thread
object.

> In order to cope with a bug in 1.40 (an interrupt to a thread could have
> been lost) I have implemented my own ThreadGroup:
>
> ThreadGroup::interrupt_all() {
> for_each_thread(
> boost::thread::interrupt();
> if ( boost::thread::timed_join() ) {
> move_to_next_thread
> }
> )
> }
>
> along with the fact that boost::thread_group doesn't provide a method
> "join_any" with the semantic to issue an interrupt_all if any of the
> threads terminate I have implemented join_any this way:
>
> ThreadGroup::join_any() {
> while(true) {
> for_each_thread(
> if ( boost::thread::timed_join() ) {
> interrupt_all();
> } else {
> move_to_next_thread
> }
> )
> }
> }
>
> This has working well for 2 years now. Upgrading to 1.48 I'm
> experiencing dead locks and core dumps. The backtrace shows that
> a timed_join crashes if somehow the thread terminates at the same
> time. Given the fact in the 1.48 documentation there is nothing
> written about the fact I can not call a timed_join concurrently with
> the interrupt and the fact there is specified no precondition on
> the interrupt method I did suppose the above code should have been
> armless using the 1.48.

If a concurrent interrupt() and timed_join() call worked then that was a
bonus. It is not guaranteed. Just because a thread object manages a
thread does not mean it is itself thread-safe.

Anthony

-- 
Author of C++ Concurrency in Action     http://www.stdthread.co.uk/book/
just::thread C++11 thread library             http://www.stdthread.co.uk
Just Software Solutions Ltd       http://www.justsoftwaresolutions.co.uk
15 Carrallack Mews, St Just, Cornwall, TR19 7UL, UK. Company No. 5478976

Boost list run by bdawes at acm.org, gregod at cs.rpi.edu, cpdaniel at pacbell.net, john at johnmaddock.co.uk