On Mon, Jan 18, 2010 at 8:00 AM, girish hilage <girish_hilage@yahoo.com> wrote:

   The code I have now written is as follows (which seems to work fine) :

This code is broken. I find it hard to read and understand, and it contains 4 different mutexes, which is asking for trouble. But besides those 'aesthetic' complaints, it really is broken.

(IIUC) It appears that you are trying very hard to make sure someone is waiting on the condition before notifying. I was tempted to say this can't be done, but you have done it nonetheless. Although I am not sure it helps.

Please research 'spurious wakeups' of condition variables:

void mywait (boost::condition_variable &cond, boost::mutex &mut, boost::mutex &flag_mut, bool &flag)
{
   boost::unique_lock<boost::mutex> lock(mut);

   enable_flag (flag_mut, flag);

   cond.wait (lock);

This cond.wait(lock) may wake up even though cond.notify_one() was NOT called. This may seem strange, but it is allowed because otherwise implementing conditions becomes very hard. And by allowing it, it encourages developers to code with conditions properly. To use a condition you MUST do so in a loop which checks the global *state* that the condition is communicating. ie in your case 'new job available'. The standard could have stored state inside the condition variable, but it was harder, and it was also assumed that you can determine the state better on your own, so it is outside the condvar. It also makes it easier for multiple threads to agree - they can all look at the external state (available jobs) and decide if something needs to be done, or if another thread has already done it.

Another way to look at things: spurious wakeups mean that, in effect, notify_one might wake up more than one thread (they still battle for the mutex of course), but notify_one is essentially (worse case) the same as notify_all - ie all the other threads could 'spuriously' wake up. ie notify_one could be implemented by just calling notify_all (it would be a poor implementation, but valid). So notify_one is just an optimization (*probably* only one thread will awaken), not a guarantee.

So what happens in your code if cond.wait() wakens spuriously? It appears a job is 'done' (or attempted) even though no job exists.

If you can tell (from the worker thread) that there is no job to do, then make THAT your external state, and wrap the cond.wait() inside a loop:

}

void worker_function (void)
{

 for (;;)
   {

while ( ! /* there is work to be done */) // note the '!' at the front

cond.wait(mutex);

      ..... read new job .....

// probably need to hold lock while reading job (from some shared queue or whatever)

// then unlock so you can do the work *in parallel* with other threads

mutex.unlock(); // unlock before doing work

      ..... do the work .....

mutex.lock(); // lock again before checking /* there is work to be done */, unlocked inside cond.wait()

   }
}

int main (void)
{
   ..... create worker threads .....

   while (/*there is work to be done */)
   {

cond.notify_all(); // there may be more than one job to do, so notify all

      /* wait for job_done notification from any of the busy thread */
      mywait (job_done_condition, job_done_mutex, job_done_notification_reached_flag_mutex, job_done_notification_reached_flag);

Why do you wait for the job to be done?

If you send a job to another thread, then do nothing but wait for it to be done, why not just do it yourself?

ie it appears that your code will run like this:

Thread 1 Thread 2

wait:

check for job .....

notify wake

wait for done: do job

..... keep doing that job

..... job done!

wake! wait for new job (ie repeat)

do the 2 threads ever do any real work at the same time, or is one always waiting for the other?

   }
}

   I hope this will not create any problems.
   (I have also edited the problem statement in the mail below to make it more clear. Shown by __TEXT_EDITED__.)

Regards,

Girish