Boost logo

Boost :

Subject: Re: [boost] [thread 1.48] Multiple interrupt/timed_join leads to deadlock
From: Vicente J. Botet Escriba (vicente.botet_at_[hidden])
Date: 2012-12-12 02:33:26


Le 12/12/12 00:32, Gaetano Mendola a écrit :
> On 11/12/2012 21.23, Vicente J. Botet Escriba wrote:
>> Le 11/12/12 19:12, Gaetano Mendola a écrit :
>>> On 11/12/2012 16.13, Vicente Botet wrote:
>>>> Gaetano Mendola-3 wrote
>>>>>
>>>>> 1) Thread group is now thread safe, it can be used concurrently by
>>>>> multiple threads
>>>>
>>>> Why a thread group should be inherently thread-safe? It seems to me
>>>> that
>>>> having a thread container is already useful.
>>>
>>> It can manage without pestering the developers the fact that one
>>> entity spawns a batch of threads and then wait for the completion
>>> waiting on
>>> the join() while another entity (an user interface as example) can
>>> stop the whole process if it's taking too much time. Otherwise as soon
>>> someone performs a boost::thread_group::join then nothing can be done
>>> from outside to stop the process. It seems a natural use to me.
>> OK I think I understand your use case. Here it is an alternative that
>> don't use any mutex to protect the group of thread.
>> I will choose and owner of all these threads, insert them on a
>> container. Only this thread is able to join/interrupt the threads.
>> I will use some way to transfer the request from the user interface
>> thread to the owner that this is taking too much time (using atomic?).
>> The owner will try to join each thread using try_join_until with the
>> desired expiration time. If the thread is joined the thread is removed
>> from the container. If there is a timeout the owner will check the
>> protected state 'take_too_much_time' and will interrupt all the other
>> threads and then join all of them. As you can see the contention is
>> reduced.
>>
>> Note that this is a specific behavior that can not be added to the
>> thread_group class. I will be for the addition of an algorithm/free
>> function that try to join the threads on a container/range during a
>> given duration or until an expiration time (removing the joined
>> threads).
>
> I don't know who the boost thread maintainer is and how/who decides
> if a design is good to be implemented or it work the other way around?
Here I was talking about my alternative solution.
>
>>>>> 2) thread_group now maintains a list of handlers with the
>>>>> responsibility
>>>>> to:
>>>>> -) Avoid join and interrupts to be called concurrently on a
>>>>> thread
>>>>> -) Avoid to call join on a joined thread
>>>>> -) Avoid to call interrupt on a joined/interrupt thread
>>>>
>>>> IMO, all the threads in a thread_group are owned by the group, and
>>>> use move
>>>> semantics, no need to use pointer to threads. As a consequence there
>>>> is no
>>>> need for the handler/wrapper.
>>>
>>> This is true if the thread_group does not permits to be used by
>>> multiple
>>> threads interrupting/joining.
>> I understand now why you did this way. But I will not do that.
>
> Then the maintainer is you?
Here I was talking as a user, that is, that I will not use the design of
your application.
And yes, I'm the maintainer with Anthony Williams that is the principal
author.
>
>>>>> 4) Due the fact mutex are not fair a thread issuing an interrupt_all
>>>>> most likely will go in starvation if a thread is issuing a
>>>>> join_all
>>>>> (especialy if the group contains a single thread). I can work at
>>>>> it.
>>>>
>>>> Could you clarify your concern?
>>>
>>> Sure, if a thread performs a closed loop:
>>>
>>> <snip>
>>> then it goes in starvationm we have observed this (even in a
>>> deterministic way), then I had to make the two interrupt/timed_join on
>>> the thread handler fair each other.
>>> Our platform is a linux platform with a 3.2.0 kernel.
>>>
>> IMO, only one thread should join/interrupt all the threads. This avoid
>> all these issues.
>
> Avoid the issues at thread group level, but those issues will be present
> into an upper layer.
Maybe. Have you identified these issues on the design I have proposed above?
>
>
>>>> As I said I'm for deprecating thread_group in Boost.Thread. The
>>>> implementation you are proposing has not changed my mind.
>>>
>>> Fine with me, leaving it as it is now is worst than
>> The class thread_group is not a big one. You are free to propose to
>> Boost whatever you consider is good for the Boost community.
>
> I'm proposing to decide what thread_group has to do because as it is
> now it have design flaws.
I agree completely with the design flaw. This thread_group class should
be a specific user class so that it can adapt it to its specific needs.
>
>>>> I would prefer an approach where data structures (containers) and
>>>> algorithms
>>>> (join, interrupt, ...) are separated, thread-safety is not mandatory,
>>>> ...
>>>
>>> I agreed with you about the fact if not explicitly written a library is
>>> not meant to be thread safe, this rule doesn't fit a thread library.
>> This is your opinion and I respect it even if I don't share it.
>>> I suggest to explicitly write, even in bold, the fact that thread_group
>>> and thread classes are not thread safe.
>> I believed that we agreed that the thread group was thread-safe! Of
>> course the thread class is not thread-safe. I could add something like
>> all the functions in this library are not thread-safe until stated
>> explicitly, as I did it for Boost.Chrono since the beginning.
>
> From an academic pure point of view the thread group is thread safe
> because it protects his internal list so it can be used by multiple
> threads, but to the other side if used (as it is now) from multiple
> threads then the best you can get is "everyone guess" joining a batch
> of already joined threads or for the matter to interrupt an already
> joined group.
If the current design doesn't respond to your expectations you should
use an alternative.
> It seems it was coded when it was not a problem for a thread to be
> joined even if joined/interrupted.
You are right, the last change in thread had some some undesirable
impacts on thread_group. IMO, the two fixes I reported in this thread
resolve the issues.
Please let me know if this is not the case.

Best,
Vicente


Boost list run by bdawes at acm.org, gregod at cs.rpi.edu, cpdaniel at pacbell.net, john at johnmaddock.co.uk