Boost logo

Boost Users :

Subject: [Boost-users] [Interprocess] deadlocking race condition in emulation interprocess_condition.hpp
From: Young, Zachariah L (zachariah.l.young_at_[hidden])
Date: 2009-09-11 21:49:26


This appears related to the discussion found here:
http://lists.boost.org/boost-users/2009/03/46051.php
 
Bug:
 
There is a set of conditions where a process can manage to enter
do_timed_wait, increment m_num_waiters, and exit without decrementing
it.
 
Boost 1.39.0
 
Sequence of events:
 
We join our hero, Process A (P_A), in
boost/interprocess/sync/emulation/interprocess_condition.hpp.
 
P_A is executing a do_timed_wait(true, lock, abs_time) call, and is
spinning at the while loop at line 124.
tout_enabled == true, and abs_time is a microsecond in the future (about
to expire but hasn't yet).
Process B, P_A's trusty sidekick, sends a notify_all on the conditional,
breaking P_A out of the while loop at line 124.
abs_time arrives (ie, P_A got to line 149 with
microsec_clock::universal_time() >= abs_time and timed_out = false).
With these conditions, P_A gets to line 163 and calls the constructor
for scoped_lock.
P_A jumps to boost/interprocess/sync/scoped_lock.hpp line 114.
P_A executes mp_mutex->timed_lock(abs_time) at line 115.
P_A jumps to
boost/interprocess/sync/emulation/interprocess_condition.hpp line 49.
P_A takes a reading of now at line 56.
P_A finds that (now >= abs_time) at line 58 and is sent packing with a
return value of false.
P_A arrives back in
boost/interprocess/sync/emulation/interprocess_condition.hpp on line
163.
P_A gets to line 171 and finds lock is false. He panics! He sets
timed_out to true and unlock_enter_mut to true, but in his haste to
break out of evil Dr. while(1)'s clutches, he forgot to atomically
decrement m_num_waiters!
Manic laughter can be heard behind him as he tries in vein to acquire
the lock on line 214.
"You fool! You fell into my trap!", shouts Dr. while(1). "Process B
grabbed that very lock and attempted to free you again! He is at line
56 of this very header file, waiting for a call from you that will never
come, and he's holding your precious lock! Your deadlock is complete!
HAHAHAHAHAHAH!!"
 
Ahem.
 
I do not know the proper way to fix this problem.
 
My straightforward idea is to add the following line at line 172 of
boost/interprocess/sync/emulation/interprocess_condition.hpp:
 
         if(!lock){
 
detail::atomic_dec32(const_cast<boost::uint32_t*>(&m_num_waiters));
            timed_out = true;
            unlock_enter_mut = true;
            break;
         }
 
However, the commment inlcuded with the block in question:
 
//Notification occurred, we will lock the checking interprocess_mutex so
that
//if a notify_one notification occurs, only one thread can exit
 
...makes me wary.
 
One thing seems certain to me: there's something not quite right here.
I could increase my timeout window to lower the chances of this
occurring, but that is not a solution. That is masking a problem.
 
Anyone got a take on this? Am I missing something obvious? That whole
block with the comment really makes me suspicious.



Boost-users list run by williamkempf at hotmail.com, kalb at libertysoft.com, bjorn.karlsson at readsoft.com, gregod at cs.rpi.edu, wekempf at cox.net