Boost logo

Boost :

From: Andrey Semashev (andrey.semashev_at_[hidden])
Date: 2020-05-14 17:43:41


On 2020-05-14 13:41, Phil Endecott via Boost wrote:
> Dear Experts,
>
> Can we improve how interprocess mutexes and condition variables
> behave on process termination?
>
> Currently if a process terminates (i.e. it crashes, or you press
> ctrl-C), the interprocess docs say nothing as far as I can see
> about what happens to locked mutexes and awaited conditions.  In
> practice it seems that mutexes that were locked remain locked,
> and other processes will deadlock.  (I'm using Linux.)  A few
> thoughts:
>
> * If a process were only reading the shared state, then it would
> be appropriate for the mutex to be unlocked on termination.
>
> * If a process were modifying the shared state, then it would be
> wrong to unconditionally unlock the mutex.  So it would be useful
> to distinguish between reader and writer locks, even if we're not
> implementing a single-writer/multiple-reader mutex.
>
> * The system could be made more robust by blocking signals while
> a mutex is locked.  This doesn't help with crashes, e.g. segfaults,
> but it would help with ctrl-C.

Catching signals is a good idea regardless of IPC and locking mutexes.
As long as there is a moment when your application holds some valuable
data or some state (e.g. a network connection) that needs to be properly
saved or cleaned up on exit, you have to implement proper signal
handling and graceful program termination.

> * It may be useful to cause all processes to terminate if one of
> them terminates with a mutex held for writing, either immediately
> or as soon as they try to lock the same mutex.  Perhaps also to
> delete the presumed-corrupted shared memory segment.
>
> * PTHREAD_MUTEX_ROBUST might be part of the solution.  That seems
> to require the non-crashed process to do clean up, i.e. we would
> need to record whether the crashed process were reading or writing
> and react appropriately.

You can't do that reliably because the crashed process could have
crashed between locking the mutex and indicating its intentions. For an
other process to be able to restart or roll back a failed operation,
that operation has to be implemented in a lock-free fashion, so that
each step is atomic. At this point mutexes become redundant.

In my experience, the only sensible reaction to an abandoned operation
(regardless of the way you use to detect the abandoned state) is to
scrap it and abort or start over in a new shared memory segment.

> I'm less clear about what happens to condition variables, but it
> does seem that perhaps terminating a process while it is waiting
> on a condition will cause other processes to deadlock.  Perhaps
> the wait conceptually returns and the mutex is re-locked during
> termination.

AFAIR, pthread_cond_t uses a non-robust mutex internally, which means
that condition variables are basically useless when you need robust
semantics.

If you need a condition variable-like behavior, in a robust way, I think
your best bet is to use futexes directly.


Boost list run by bdawes at acm.org, gregod at cs.rpi.edu, cpdaniel at pacbell.net, john at johnmaddock.co.uk