Boost logo

Boost :

From: Phil Endecott (spam_from_boost_dev_at_[hidden])
Date: 2020-05-14 10:41:45


Dear Experts,

Can we improve how interprocess mutexes and condition variables
behave on process termination?

Currently if a process terminates (i.e. it crashes, or you press
ctrl-C), the interprocess docs say nothing as far as I can see
about what happens to locked mutexes and awaited conditions. In
practice it seems that mutexes that were locked remain locked,
and other processes will deadlock. (I'm using Linux.) A few
thoughts:

* If a process were only reading the shared state, then it would
be appropriate for the mutex to be unlocked on termination.

* If a process were modifying the shared state, then it would be
wrong to unconditionally unlock the mutex. So it would be useful
to distinguish between reader and writer locks, even if we're not
implementing a single-writer/multiple-reader mutex.

* The system could be made more robust by blocking signals while
a mutex is locked. This doesn't help with crashes, e.g. segfaults,
but it would help with ctrl-C.

* It may be useful to cause all processes to terminate if one of
them terminates with a mutex held for writing, either immediately
or as soon as they try to lock the same mutex. Perhaps also to
delete the presumed-corrupted shared memory segment.

* PTHREAD_MUTEX_ROBUST might be part of the solution. That seems
to require the non-crashed process to do clean up, i.e. we would
need to record whether the crashed process were reading or writing
and react appropriately.

I'm less clear about what happens to condition variables, but it
does seem that perhaps terminating a process while it is waiting
on a condition will cause other processes to deadlock. Perhaps
the wait conceptually returns and the mutex is re-locked during
termination.

I have encountered this while trying to use a simple diagnostic
program that just dumps some shared memory data structures and
waits on a condition in a loop. I run this for a while and then
press ctrl-C. Yes, a while after I disconnect the diagnostic program
the system crashes... the worst sort of bug!

Regards, Phil.


Boost list run by bdawes at acm.org, gregod at cs.rpi.edu, cpdaniel at pacbell.net, john at johnmaddock.co.uk