Boost logo

Boost :

Subject: Re: [boost] [thread 1.48] Multiple interrupt/timed_join leads to deadlock
From: Gaetano Mendola (mendola_at_[hidden])
Date: 2012-12-06 19:11:46


On 06/12/2012 23.31, Vicente J. Botet Escriba wrote:
> Le 05/12/12 21:05, Gaetano Mendola a écrit :
>> On 05/12/2012 19.57, Vicente J. Botet Escriba wrote:
>>> Le 05/12/12 18:39, Gaetano Mendola a écrit :
>>>> On 05/12/2012 18.08, Anthony Williams wrote:
>>>>> On 05/12/12 15:59, Gaetano Mendola wrote:
>>>>>> On 05/12/2012 16.29, Vicente Botet wrote:
>>>>>>> template<typename TimeDuration>
>>>>>>> bool timed_join(TimeDuration const& rel_time);
>>>>>>>
>>>>>>> Preconditions:
>>>>>>>
>>>>>>> this->get_id()!=boost::this_thread::get_id()
>>>>>>>
>>>>>>> Postconditions:
>>>>>>>
>>>>>>> If *this refers to a thread of execution on entry, and
>>>>>>> timed_join
>>>>>>> returns true, that thread of execution has completed, and *this no
>>>>>>> longer
>>>>>>> refers to any thread of execution. If this call to timed_join
>>>>>>> returns
>>>>>>> false,
>>>>>>> *this is unchanged.
>>>>>>> "
>>>>>>>
>>>>>>> Your second call doesn't satisfy the pre-conditions, so that the
>>>>>>> outcome of
>>>>>>> this second call is undefined.
>>>>>>
>>>>>> That precondition tests that your are not interrupting yourself
>>>>>> doesn't
>>>>>> say anything about thread safety. Am I missing something ?
>>>>>
>>>>> Your code had two bugs. Firstly, it called interrupt and timed_join on
>>>>> the same thread object from multiple threads. Secondly, it called
>>>>> interrupt and timed_join in a loop without ever checking the result of
>>>>> timed_join.
>>>>>
>>>>> Vicente's quote from the docs addresses the second issue: if
>>>>> timed_join
>>>>> returns true then looping round to call interrupt and timed_join again
>>>>> is now a precondition violation, and thus undefined behaviour.
>>>>
>>>> Then I'm not reading well the precondition.
>>>>
>>>> this->get_id(): the thread id of the instance on which I'm calling the
>>>> boost::thread::interrupt().
>>>>
>>>> boost::this_thread::get_id(): gives the caller thread id.
>>>>
>>>> from documentation I'm reading that if an instance of boost::thread
>>>> doesn't refer to a thread of execution then it returns a
>>>> default-constructed boost::thread::id that represents Not-a-Thread.
>>>>
>>>> This means that in case of an already joined thread this->get_id()
>>>> returns a default-constructed boost::thread::id, how can that be equal
>>>> to current (caller) thread id?
>>>>
>>>> I'm reading the precondition as: "you can not time_join your self"
>>>>
>>>> Indeed looking at 1.52 code in the interrupt method I can read:
>>>>
>>>> if (this_thread::get_id() == get_id()) {
>>>> boost::throw_exception(
>>>> thread_resource_error(
>>>> system::errc::resource_deadlock_would_occur,
>>>> "boost thread: trying joining itself"));
>>>> }
>>>>
>>>> **boost thread: trying joining itself**
>>>>
>>>> I'm sure I'm still missing something.
>>>>
>>>>
>>> My bad. You are right the pre-conditions in 1.48 is not the correct one.
>>> I have updated the documentation for join() since 1.50 and since 1.52
>>> for all the other join functions. In particular
>>>
>>>
>>> Member function |timed_join()|
>>>
>>> <http://www.boost.org/doc/libs/1_52_0/doc/html/thread/thread_management.html#thread.thread_management.thread.timed_join>
>>>
>>>
>>>
>>>
>>> bool timed_join(const system_time& wait_until);
>>>
>>> template<typename TimeDuration>
>>> bool timed_join(TimeDuration const& rel_time);
>>>
>>> Preconditions:
>>>
>>> the thread is joinable.
>>>
>>>
>>> So you are right there is a bug up to 1.51, with a fix on 1.52.
>>>
>>> Sorry for the troubles this issue could caused you.
>>
>> I believe boost project deserve a wiki documentation site.
>>
>>
> And your suggestion is?

If I had time and resources I would have put up a boost doc site
wiki style so it can be easily modified by almost everyone without
bashing doc maintainers for even little adjustments.

As said most of the problems I'm seeing is a "RTFM" problem but some
time is even a WTFM related issue.

I understand that is not easy to maintain documentation in shape
especialy when you have so many "active" boost version around.
Consider that last Ubuntu LTS deploys by default boost 1.46 and not
all IT department do accept to install libraries that are not part
of the distribution (we have to face it).

To add salt to the wound boost doesn't back-port bugs corrections
(AFAIK) so various Linux distributions are left with boost versions
containing annoying bugs. Don't take me wrong, no project is bug-less,
but at least a "warning" on documentation can save people time.

As example we used then 1.40 boost 2 years ago (shipped with the
previous Ubunt LTS version) and it had the bug of sporadic
boost::thread::interrupt not working, we spend 3 days to figure it out,
I would have put a "warning" in the 1.40 documentation about the
boost::thread::interrupt missing.

Regards
Gaetano Mendola

--
http://cpp-today.blogspot.it/

Boost list run by bdawes at acm.org, gregod at cs.rpi.edu, cpdaniel at pacbell.net, john at johnmaddock.co.uk