Boost logo

Boost Users :

Subject: Re: [Boost-users] [Thread] Timed join returning true before thread terminated
From: John Rocha (jrr_at_[hidden])
Date: 2012-03-15 12:47:13


Hello,

First I want to thank you for helping with this. How did you come to this
answer so quickly? Is this a technique or tool that I can learn, or was this
wisdom from having worked with your library for so long?

Next, I'd like to ensure that I fully understand what was occurring. Can you
please confirm that I've got this right. I've read and thought about your
answer and looked at the code and I believe these are the details.

Main
    Uses SQ.shutdown() to shutdown the SQ thread it
         sets the SQ.m_shutdown flag
         sends the SQ thread a boost thread interrupt
         joins on the thread

SQ Thread
    This thread is running under it's thread_main() and __happens__ to be in
    the is_shutdown() method, after it checked the interruption_point() but
    BEFORE it checks the m_shutdown() logic. SQ.m_shudown was just changed to
    true, so is_shutdown() sees that and throws a boost::thread_interrupted
    exception. However, it should also be noted that the boost thread data has
    its internal interrupt requested flag set too.

    Now SQ Thread, "throws" out of thread_main() is caught by the base classes
    operator() method and enters the SQ.thread_shutdown method.

    SQ.thread_shutdown needs to shut down it's lookup child thread. So it
    invokes the LU.shutdown method.

    The LU.shutdown method is just like above:
        sets the LU.m_shutdown flag
        sends the LU thread a boost thread interrupt
        joins on the LU thread

    However, join() is an interruption point, and I haven't "cleared the
    interrupt" for the boost thread yet. Therefore, when the SQ thread invokes
    join(), it checks if there are any outstanding interrupts it needs to
    honor. There are, so it throws another thread interrupted exception, which
    exits the join(), and I don't catch; therefore the SQ thread exits
    prematurely due to my faulty logic.

Yes, I know, a wordy explanation for the brilliantly summary you gave me. I
just want to double check that I've got the details down correctly.

One final word. I feel ungrateful for bringing this up, and I'm still looking
into this. However, I've encountered a new symptom. I've added the
boost::this_thread::disable_interruption object to the beginning of my
thread_base_c::shutdown() function and I've removed all references to
m_shutdown -- as you recommended.

However now, I occasionally get a deadlock during the shutdown. I call it a
deadlock when the shutdown process stalls for longer than 5 minutes. I've run
the test multiple times and I'll see the deadlock on rare occasions. I've run
seven different test cycles with the deadlock occurring at different times for
each: 189, 398, 797, 999, 1282, 1527, 3416 (not in that order)

This could be an artifact of my simulation, and I'm just now starting to crawl
through the gdb output -- nicely enough I can connect to the running process
and see it's current running state.

Do you have any debugging insights or tips that I should apply for this
investigation?

Thank you for all your help,

-=John

On 3/14/2012 3:23 PM, Anthony Williams wrote:
> On 14/03/12 20:09, John Rocha wrote:
>> I have been able to extract the thread start/stop logic from our code
>> base into
>> a standalone program that illustrates the problem. Even this is still
>> sort of
>> long, 900 lines or so.
>
> Thanks for the example. Your problem is that you are overlaying TWO
> interruption mechanisms --- boost::thread::interrupt() and your own m_shutdown
> flag.
>
> thread::join() is an interruption point, so if your thread sees the
> m_shutdown flag before the boost::thread::interrupt(), then it will pick up
> the interrupt when it calls join() on its own worker threads.
>
> I would suggest that you avoid the use of m_shutdown, since it is redundant.
> Also, wrap your calls to join() in scope with a
> boost::this_thread::disable_cancellation object so that the join cannot be
> interrupted.
>
>> void shutdown(const std::string &s_caller) {
>> EE_LOG_MSG(EE_TRACE, "%s shutting down %s",
>> s_caller.c_str(), m_name.c_str());
>>
>> ptime start_time(microsec_clock::local_time());
>>
>> m_shutdown = true;
>> m_thread.interrupt();
>> m_thread.join();
>
>> inline void check_for_shutdown () {
>> boost::this_thread::interruption_point();
>>
>> if (m_shutdown) {
>> throw boost::thread_interrupted();
>> }
>> }
>
> Anthony


Boost-users list run by williamkempf at hotmail.com, kalb at libertysoft.com, bjorn.karlsson at readsoft.com, gregod at cs.rpi.edu, wekempf at cox.net