Boost logo

Boost :

From: Roland Schwarz (roland.schwarz_at_[hidden])
Date: 2005-09-14 16:38:44


While reviewing the implementation of the condition
variable implementation I came across a case where
a "lost wakup" seems to take place.

The example is admittedly artificial, but I think still
correct code. (I hope someone can proof me wrong.)

Could this be an indication that the current implementation
is not using Alexander Terekhovs algorithm?
If this really is the case we should definitely change our
code. On the other side, I constructed the test case,
by carefully studying Alexanders algorithm, in order
to better understand the design.

I did the tests on W2K with Vc7.1.

My qeustions:

1) Is this corect code with respect to condition variable
    semantics (posix thread semantics?)

2) If 1) appplies: Can someone reproduce this on another
    platform too?

Thank you,
Roland

#include <iostream>

#include <boost/thread.hpp>

using namespace boost;

mutex guard;
typedef mutex::scoped_lock lock_type;
condition cond;

long global_count = 0;
long count_for_test = 10000;

long global_number_of_threads = 0;
long number_of_threads_for_test = 1000;

bool global_flag = true;

void do_work()
{
    lock_type lock(guard);
    ++global_number_of_threads;
    cond.notify_one();
    while (global_count < count_for_test)
        cond.wait(lock);
}

void do_flag()
{
    lock_type lock(guard);
    ++global_number_of_threads;
    cond.notify_one();
    while (global_count < count_for_test) {
        cond.wait(lock);
        global_flag = false;
        // Step 3: The following signal occasionally
        // is beeing lost. See below.
        cond.notify_one();
    }
}

int main()
{
   
    thread_group pool;

    for (long i=0; i < number_of_threads_for_test; ++i)
    {
        pool.create_thread( do_work );
    }
    pool.create_thread( do_flag );

    { // we wait until all threads are up
        lock_type lock(guard);
        while (global_number_of_threads <
            number_of_threads_for_test+1)
            cond.wait(lock);
    }

    // the loop will run for some time and
    // then suddenly deadlock
    for (long j=0; j<count_for_test; ++j) {
        lock_type lock(guard);
        ++global_count;
        // Step 1: We do a heavy notification
        cond.notify_all();
        // Step 2: Then we wait on the same condition.
        // Since we hold the lock we can be sure
        // all threads are waiting at this point.
        // Entering the below wait should
        // atomically wait and release the lock.
        // In turn I expect all threads being woken up,
        // except the main thread since the wait
        // occures after the notify_all().
        // The do_flag thread will signal after
        // we have entered the below wait.
        // However the signal occasionally
        // is beeing lost. Is this kind of a
        // "lost wakup"?
        while(global_flag) cond.wait(lock); // deadlocks
        // When you break in with the debugger
        // as the deadlock occurs, you can see
        // that the flag indeed is false, which is a
        // proof that the signal has been sent.
        global_flag = true;
    }

    
pool.join_all();
   
    //result check
    std::cout << "We should get here without a
deadlock\n";
      
    return 0;
}


Boost list run by bdawes at acm.org, gregod at cs.rpi.edu, cpdaniel at pacbell.net, john at johnmaddock.co.uk