Boost logo

Boost Users :

Subject: Re: [Boost-users] [Thread] Timed join returning true before thread terminated
From: Anthony Williams (anthony.ajw_at_[hidden])
Date: 2012-03-15 13:25:13


On 15/03/12 16:47, John Rocha wrote:
> First I want to thank you for helping with this. How did you come to this
> answer so quickly? Is this a technique or tool that I can learn, or was
> this
> wisdom from having worked with your library for so long?

I tried it out, saw the problem manifest, trapped it in gdb, and
examined the code. I guess it's just experience.

>
> Main
> Uses SQ.shutdown() to shutdown the SQ thread it
> sets the SQ.m_shutdown flag
> sends the SQ thread a boost thread interrupt
> joins on the thread
>
>
> SQ Thread
> This thread is running under it's thread_main() and __happens__ to be in
> the is_shutdown() method, after it checked the interruption_point() but
> BEFORE it checks the m_shutdown() logic. SQ.m_shudown was just changed to
> true, so is_shutdown() sees that and throws a boost::thread_interrupted
> exception. However, it should also be noted that the boost thread data has
> its internal interrupt requested flag set too.
>
> Now SQ Thread, "throws" out of thread_main() is caught by the base classes
> operator() method and enters the SQ.thread_shutdown method.
>
> SQ.thread_shutdown needs to shut down it's lookup child thread. So it
> invokes the LU.shutdown method.
>
> The LU.shutdown method is just like above:
> sets the LU.m_shutdown flag
> sends the LU thread a boost thread interrupt
> joins on the LU thread
>
> However, join() is an interruption point, and I haven't "cleared the
> interrupt" for the boost thread yet. Therefore, when the SQ thread invokes
> join(), it checks if there are any outstanding interrupts it needs to
> honor. There are, so it throws another thread interrupted exception, which
> exits the join(), and I don't catch; therefore the SQ thread exits
> prematurely due to my faulty logic.

Yes, that matches my understanding.

> One final word. I feel ungrateful for bringing this up, and I'm still
> looking
> into this. However, I've encountered a new symptom. I've added the
> boost::this_thread::disable_interruption object to the beginning of my
> thread_base_c::shutdown() function and I've removed all references to
> m_shutdown -- as you recommended.
>
> However now, I occasionally get a deadlock during the shutdown. I call it a
> deadlock when the shutdown process stalls for longer than 5 minutes.
> I've run
> the test multiple times and I'll see the deadlock on rare occasions.
> I've run
> seven different test cycles with the deadlock occurring at different
> times for
> each: 189, 398, 797, 999, 1282, 1527, 3416 (not in that order)
>
> This could be an artifact of my simulation, and I'm just now starting to
> crawl
> through the gdb output -- nicely enough I can connect to the running
> process
> and see it's current running state.
>
> Do you have any debugging insights or tips that I should apply for this
> investigation?

My first thought is to check that the code is not blocked in a
non-interruptible call.

Anthony

-- 
Author of C++ Concurrency in Action     http://www.stdthread.co.uk/book/
just::thread C++11 thread library             http://www.stdthread.co.uk
Just Software Solutions Ltd       http://www.justsoftwaresolutions.co.uk
15 Carrallack Mews, St Just, Cornwall, TR19 7UL, UK. Company No. 5478976

Boost-users list run by williamkempf at hotmail.com, kalb at libertysoft.com, bjorn.karlsson at readsoft.com, gregod at cs.rpi.edu, wekempf at cox.net