Bug:
There is a set of
conditions where a process can manage to enter do_timed_wait, increment
m_num_waiters, and exit without decrementing it.
Boost
1.39.0
Sequence of
events:
We join our hero,
Process A (P_A), in
boost/interprocess/sync/emulation/interprocess_condition.hpp.
P_A is executing a
do_timed_wait(true, lock, abs_time) call, and is spinning at the while loop at
line 124.
tout_enabled == true, and abs_time is a microsecond in the future
(about to expire but hasn't yet).
Process B, P_A's trusty sidekick, sends a
notify_all on the conditional, breaking P_A out of the while loop at line
124.
abs_time arrives (ie, P_A got to line 149 with
microsec_clock::universal_time() >= abs_time and timed_out = false).
With
these conditions, P_A gets to line 163 and calls the constructor for
scoped_lock.
P_A jumps to boost/interprocess/sync/scoped_lock.hpp line
114.
P_A executes mp_mutex->timed_lock(abs_time) at line 115.
P_A jumps
to boost/interprocess/sync/emulation/interprocess_condition.hpp line 49.
P_A
takes a reading of now at line 56.
P_A finds that (now >= abs_time) at
line 58 and is sent packing with a return value of false.
P_A arrives back in
boost/interprocess/sync/emulation/interprocess_condition.hpp on line 163.
P_A
gets to line 171 and finds lock is false. He panics! He sets
timed_out to true and unlock_enter_mut to true, but in his haste to break out of
evil Dr. while(1)'s clutches, he forgot to atomically decrement
m_num_waiters!
Manic laughter can be heard behind him as he tries in vein to
acquire the lock on line 214.
"You fool! You fell into my trap!", shouts Dr.
while(1). "Process B grabbed that very lock and attempted to free you
again! He is at line 56 of this very header file, waiting for a call from
you that will never come, and he's holding your precious lock! Your
deadlock is complete! HAHAHAHAHAHAH!!"
Ahem.
I do not know the
proper way to fix this problem.
My straightforward
idea is to add the following line at line 172 of
boost/interprocess/sync/emulation/interprocess_condition.hpp:
if(!lock){
detail::atomic_dec32(const_cast<boost::uint32_t*>(&m_num_waiters));
timed_out =
true;
unlock_enter_mut =
true;
break;
}
However, the
commment inlcuded with the block in question:
//Notification
occurred, we will lock the checking interprocess_mutex so that
//if a
notify_one notification occurs, only one thread can exit
...makes me
wary.
One thing seems
certain to me: there's something not quite right here. I could increase my
timeout window to lower the chances of this occurring, but that is not a
solution. That is masking a problem.
Anyone got a take on
this? Am I missing something obvious? That whole block with the
comment really makes me suspicious.