Boost logo

Boost :

Subject: Re: [boost] [thread 1.48] Multiple interrupt/timed_join leads to deadlock
From: Vicente J. Botet Escriba (vicente.botet_at_[hidden])
Date: 2012-12-06 17:31:16


Le 05/12/12 21:02, Gaetano Mendola a écrit :
> On 05/12/2012 19.44, Vicente J. Botet Escriba wrote:
>> Le 05/12/12 18:49, Gaetano Mendola a écrit :
>>> On 05/12/2012 18.35, Vicente J. Botet Escriba wrote:
>>>> Le 05/12/12 16:59, Gaetano Mendola a écrit :
>>>>> On 05/12/2012 16.29, Vicente Botet wrote:
>>>>>> Gaetano Mendola-3 wrote
>>>>>>> On 05/12/2012 13.42, Vicente J. Botet Escriba wrote:
>>>>>>>> Le 05/12/12 12:33, Gaetano Mendola a écrit :
>>>>>>>>> On 05/12/2012 09.16, Anthony Williams wrote:
>>>>>>>>>> On 04/12/12 18:32, Gaetano Mendola wrote:
>>>>>>>>>>> Hi all,
>>>>>>>>>>> I was investigating a rare deadlock when issuing an
>>>>>>>>>>> interrupt and
>>>>>>>>>>> a timed_join in parallel. I come out with the the following
>>>>>>>>>>> code
>>>>>>>>>>> showing the behavior.
>>>>>>>>>>>
>>>>>>>>>>> The deadlock is rare so sometime you need to wait a bit.
>>>>>>>>>>>
>>>>>>>>>>> I couldn't try it with boost 1.52 because the code is invalid
>>>>>>>>>>> due the precondition of "thread joinable" when issuing the
>>>>>>>>>>> timed_join.
>>>>>>>>>>
>>>>>>>>>> That's a hint.
>>>>>>>>>>
>>>>>>>>>>> Is the code not valid or a real bug?
>>>>>>>>>>
>>>>>>>>>> The code is invalid: you keep trying to interrupt and join even
>>>>>>>>>> after
>>>>>>>>>> the thread has been joined! Once the thread has been joined, the
>>>>>>>>>> thread
>>>>>>>>>> handle is no longer valid, and you should exit the loop.
>>>>>>>>>
>>>>>>>>> I haven't seen this statement in the documentation.
>>>>>>>>> The loop was meant to exploit exactly this, then you are
>>>>>>>>> confirming
>>>>>>>>> that interrupting a joined thread is not valid. How do I safely
>>>>>>>>> interrupt then a thread?
>>>>>>>>> There is no "atomic" check_joinable_then_interrupt, whatching at
>>>>>>>>> the
>>>>>>>>> interrupt code it seems that the check is done inside. I'm lost.
>>>>>>>> Boost.Thread and std::thread are designed so that there is only
>>>>>>>> one
>>>>>>>> owner of the thread. That is only one thread can join/interrupt a
>>>>>>>> thread
>>>>>>>> safely.
>>>>>>>
>>>>>>> Unless I have totally missed it the documentation doesn't mention
>>>>>>> anything about thread safety (would that be an hint about it?).
>>>>>>
>>>>>> From the 1.48 documentation
>>>>>> "Member function timed_join()
>>>>>>
>>>>>> bool timed_join(const system_time& wait_until);
>>>>>>
>>>>>> template<typename TimeDuration>
>>>>>> bool timed_join(TimeDuration const& rel_time);
>>>>>>
>>>>>> Preconditions:
>>>>>>
>>>>>> this->get_id()!=boost::this_thread::get_id()
>>>>>>
>>>>>> Postconditions:
>>>>>>
>>>>>> If *this refers to a thread of execution on entry, and
>>>>>> timed_join
>>>>>> returns true, that thread of execution has completed, and *this no
>>>>>> longer
>>>>>> refers to any thread of execution. If this call to timed_join
>>>>>> returns
>>>>>> false,
>>>>>> *this is unchanged.
>>>>>> "
>>>>>>
>>>>>> Your second call doesn't satisfy the pre-conditions, so that the
>>>>>> outcome of
>>>>>> this second call is undefined.
>>>>>
>>>>> That precondition tests that your are not interrupting yourself
>>>>> doesn't say anything about thread safety. Am I missing something ?
>>>>>
>>>> Hi,
>>>>
>>>> no function is considered been thread-safe until it is stated
>>>> explicitly.
>>>
>>> You are completely right, it was my own fault considering it thread
>>> safe.
>>>
>>> So even thread_group is not thread safe? The implementation is full
>>> thread safe.
>> How it is thread-safe? Only one thread can call to the thread_group
>> functions at a time.
>
> The implementation is straight.
> The boost::thread_group keeps a std::list of boost::threads pointers.
>
> the public methods:
>
> create_thread
> add_thread
> remove_thread
> join_all
> interrupt_all
> size
>
> are all protecting the internal status and then all the threads in the
> list with a common mutex.
>
> It appears completely thread safe to me.
You are right. I didn't see the implementation. Anthony, I don't know
what was the intent of the initial design. Do you have an idea?
What do you suggest? update the documentation?

Best,
Vicente


Boost list run by bdawes at acm.org, gregod at cs.rpi.edu, cpdaniel at pacbell.net, john at johnmaddock.co.uk