Boost logo

Threads-Devel :

From: Johan Nilsson (r.johan.nilsson_at_[hidden])
Date: 2007-04-19 09:18:08


[cross-posting this to boost.threads.devel as I tried putting out a request
for support there as well]

Peter Dimov wrote:

[snip]

>
> It seems that the static destructors are being run after
> on_thread_exit for the main thread; that's pretty odd. I see some
> logic in tss.cpp that looks like it should handle the case of
> ordering/race issues between ~thread_specific_ptr and the thread exit
> cleanup, but the code is too complicated to tell. Maybe you need to
> instrument tss.cpp and tss_hooks.cpp to see what is being called and
> when.

As I couldn't find any problem with my own code, and as the problem
seemed to spread to all of my Windows machines, I finally had to give
in and instrumented some of the tss code.

As you said - the static thread_specific_ptr<> destructors are run after
on_thread_exit for the main thread (actually, some of them before, and one
of them after, which is even more confusing). Reading the comments in
tss_pe.cpp, it seems to be a deliberate decision to run on_thread_exit from
the main thread before the static destructors are run. Perhaps to be able to
use static objects safely during "normal" thread exits?

Anyway, the real problem seems to be that the value of the native TSS slot
is not reset to 0 (zero) after the pointed-to data has been destroyed
(tss_slots) inside cleanup_slots. This is not a problem in general, as this
is called when the actual thread is ending. But - in the context of the main
thread cleanup_slots is called prematurely - this leaves a dangling pointer
which is later erroneously dereferenced. In my case in this particular code
segment (some comments added):

--- tss.cpp ---

void* tss::get() const
{
    tss_slots* slots = get_slots(false);

    // slots can actually reference a dangling pointer when called
    // from the main thread during static object destruction, as
    // on_thread_exit might have been called previously for
    // this thread
    //
    if (!slots)
        return 0;

    //
    // The statement slots->size() causes an access violation
    // occasionally, when slots refers to a dangling pointer
    //
    if (m_slot >= slots->size())
        return 0;

    return (*slots)[m_slot];
}

--------------

My suggestion is to modify on_thread_exit (Win32 specific) to cleanup
the native TSS slot after cleanup_slots is called:

------- tss.cpp -------

void __cdecl tss_thread_exit()
{
    tss_slots* slots = get_slots(false);

    if (slots)
    {
        cleanup_slots(slots);
        ::TlsSetValue(tss_native_data_key, NULL); // New code
    }
}

--------------

The above fixes my problems (at least I've been able to run my testsuite 100
consecutive times without further crashes).

Now if it would only be possible to get the above fix into the 1.34 release.
All tests under libs/thread/test passes on my SMP+HT Win XP machine after
the change.

A related idea: on_thread_exit should perhaps instead be
scheduled for invocation after the user defined static objects destructors
have been run, but before the library provided static objects destructor
have been run. IIRC it should be possible under MSVC, as suggested by the
"pragma init_seg" documentation.

If possible, it would probably make more sense ... or?

/ Johan


Threads-Devel list run by bdawes at acm.org, david.abrahams at rcn.com, gregod at cs.rpi.edu, cpdaniel at pacbell.net, john at johnmaddock.co.uk