Boost logo

Boost :

Subject: Re: [boost] [thread 1.48] Multiple interrupt/timed_join leads to deadlock
From: Gaetano Mendola (mendola_at_[hidden])
Date: 2012-12-07 14:26:49


On 07/12/2012 19.03, Vicente J. Botet Escriba wrote:
> Le 07/12/12 01:23, Gaetano Mendola a écrit :
>> On 06/12/2012 23.31, Vicente J. Botet Escriba wrote:
>>> Le 05/12/12 21:02, Gaetano Mendola a écrit :
>>>> On 05/12/2012 19.44, Vicente J. Botet Escriba wrote:
>>>>> Le 05/12/12 18:49, Gaetano Mendola a écrit :
>>>>>> On 05/12/2012 18.35, Vicente J. Botet Escriba wrote:
>>>>>>> Le 05/12/12 16:59, Gaetano Mendola a écrit :
>>>>>>>> On 05/12/2012 16.29, Vicente Botet wrote:
>>>>>>>>> Gaetano Mendola-3 wrote
>>>>>>>>>> On 05/12/2012 13.42, Vicente J. Botet Escriba wrote:
>>>>>>>>>>> Le 05/12/12 12:33, Gaetano Mendola a écrit :
>>>>>>>>>>>> On 05/12/2012 09.16, Anthony Williams wrote:
>>>>>>>>>>>>> On 04/12/12 18:32, Gaetano Mendola wrote:
>>>>>>>>>>>>>> Hi all,
>>>>>>>>>>>>>> I was investigating a rare deadlock when issuing an
>>>>>>>>>>>>>> interrupt and
>>>>>>>>>>>>>> a timed_join in parallel. I come out with the the following
>>>>>>>>>>>>>> code
>>>>>>>>>>>>>> showing the behavior.
>>>>>>>>>>>>>>
>>>>>>>>>>>>>> The deadlock is rare so sometime you need to wait a bit.
>>>>>>>>>>>>>>
>>>>>>>>>>>>>> I couldn't try it with boost 1.52 because the code is invalid
>>>>>>>>>>>>>> due the precondition of "thread joinable" when issuing the
>>>>>>>>>>>>>> timed_join.
>>>>>>>>>>>>>
>>>>>>>>>>>>> That's a hint.
>>>>>>>>>>>>>
>>>>>>>>>>>>>> Is the code not valid or a real bug?
>>>>>>>>>>>>>
>>>>>>>>>>>>> The code is invalid: you keep trying to interrupt and join
>>>>>>>>>>>>> even
>>>>>>>>>>>>> after
>>>>>>>>>>>>> the thread has been joined! Once the thread has been
>>>>>>>>>>>>> joined, the
>>>>>>>>>>>>> thread
>>>>>>>>>>>>> handle is no longer valid, and you should exit the loop.
>>>>>>>>>>>>
>>>>>>>>>>>> I haven't seen this statement in the documentation.
>>>>>>>>>>>> The loop was meant to exploit exactly this, then you are
>>>>>>>>>>>> confirming
>>>>>>>>>>>> that interrupting a joined thread is not valid. How do I safely
>>>>>>>>>>>> interrupt then a thread?
>>>>>>>>>>>> There is no "atomic" check_joinable_then_interrupt,
>>>>>>>>>>>> whatching at
>>>>>>>>>>>> the
>>>>>>>>>>>> interrupt code it seems that the check is done inside. I'm
>>>>>>>>>>>> lost.
>>>>>>>>>>> Boost.Thread and std::thread are designed so that there is only
>>>>>>>>>>> one
>>>>>>>>>>> owner of the thread. That is only one thread can
>>>>>>>>>>> join/interrupt a
>>>>>>>>>>> thread
>>>>>>>>>>> safely.
>>>>>>>>>>
>>>>>>>>>> Unless I have totally missed it the documentation doesn't mention
>>>>>>>>>> anything about thread safety (would that be an hint about it?).
>>>>>>>>>
>>>>>>>>> From the 1.48 documentation
>>>>>>>>> "Member function timed_join()
>>>>>>>>>
>>>>>>>>> bool timed_join(const system_time& wait_until);
>>>>>>>>>
>>>>>>>>> template<typename TimeDuration>
>>>>>>>>> bool timed_join(TimeDuration const& rel_time);
>>>>>>>>>
>>>>>>>>> Preconditions:
>>>>>>>>>
>>>>>>>>> this->get_id()!=boost::this_thread::get_id()
>>>>>>>>>
>>>>>>>>> Postconditions:
>>>>>>>>>
>>>>>>>>> If *this refers to a thread of execution on entry, and
>>>>>>>>> timed_join
>>>>>>>>> returns true, that thread of execution has completed, and *this no
>>>>>>>>> longer
>>>>>>>>> refers to any thread of execution. If this call to timed_join
>>>>>>>>> returns
>>>>>>>>> false,
>>>>>>>>> *this is unchanged.
>>>>>>>>> "
>>>>>>>>>
>>>>>>>>> Your second call doesn't satisfy the pre-conditions, so that the
>>>>>>>>> outcome of
>>>>>>>>> this second call is undefined.
>>>>>>>>
>>>>>>>> That precondition tests that your are not interrupting yourself
>>>>>>>> doesn't say anything about thread safety. Am I missing something ?
>>>>>>>>
>>>>>>> Hi,
>>>>>>>
>>>>>>> no function is considered been thread-safe until it is stated
>>>>>>> explicitly.
>>>>>>
>>>>>> You are completely right, it was my own fault considering it thread
>>>>>> safe.
>>>>>>
>>>>>> So even thread_group is not thread safe? The implementation is full
>>>>>> thread safe.
>>>>> How it is thread-safe? Only one thread can call to the thread_group
>>>>> functions at a time.
>>>>
>>>> The implementation is straight.
>>>> The boost::thread_group keeps a std::list of boost::threads pointers.
>>>>
>>>> the public methods:
>>>>
>>>> create_thread
>>>> add_thread
>>>> remove_thread
>>>> join_all
>>>> interrupt_all
>>>> size
>>>>
>>>> are all protecting the internal status and then all the threads in the
>>>> list with a common mutex.
>>>>
>>>> It appears completely thread safe to me.
>>> You are right. I didn't see the implementation. Anthony, I don't know
>>> what was the intent of the initial design. Do you have an idea?
>>> What do you suggest? update the documentation?
>>
>> Even the implementation can be improved, even if it's thread safe it
>> doesn't avoid to issue a join to an already joined thread, or for the
>> matter to interrupt a thread already joined/interrupted.
>>
> I don't know if you comment has something to be with these two tickets
>
> https://svn.boost.org/trac/boost/ticket/7668: thread_group::join_all()
> should check whether its threads are joinable

Intersting, I saw the fix and as you can see it has a race condition
there, indeed a thread can detach itself between the call to joinable
and the join at that point even if the check was made when the join is
issued the precondition doesn't hold anymore.

Basically that is the question I'm asking: how can we be sure that a
thread is joinable when we call the join on it, for sure checking the
"joinable" has some problems. I have solved this in my thread group
implementation with a flag saying if the thread was already
joined or not and this works only because our threads do not detach
them self so the thread group is the only responsible for the status
change.

> https://svn.boost.org/trac/boost/ticket/7669: thread_group::join_all()
> should catch resource_deadlock_would_occur

That is another story about the exception safety of thread_group, and
as you can see it's not exception safe.

>> I also believe the create_thread returning a boost::thread pointers is
>> very dangerous, indeed doing so the boost::thread_group doesn't have
>> anymore the ownership of it, it would be better if create_thread would
>> have returned a thread id instead.
> I don't want to invest to much on the thread_group class, as for me this
> class was designed before threads were movable and a complete redesign
> should be done. You can of course create a Trac ticket requesting this
> new feature.

Due the limitation of this class I have wrote my own thread group with
also the method join_any having the semantic to interrupt all threads
if any of the threads terminate. I believe seeing the current status of
thread_group class the best to do is or fix it for good or deprecate it.

>> Sure the programmer can shoot himself in the foot in other more
>> spectacular way but at least having a thread group interface obbeying
>> the principle of least astonishment can be a plus.
> Best,
> Vicente

Regards
Gaetano Mendola


Boost list run by bdawes at acm.org, gregod at cs.rpi.edu, cpdaniel at pacbell.net, john at johnmaddock.co.uk