|
Boost : |
From: Andrey Semashev (andrey.semashev_at_[hidden])
Date: 2020-05-14 17:44:52
On 2020-05-14 20:43, Andrey Semashev wrote:
> On 2020-05-14 13:41, Phil Endecott via Boost wrote:
>> Dear Experts,
>>
>> Can we improve how interprocess mutexes and condition variables
>> behave on process termination?
>>
>> Currently if a process terminates (i.e. it crashes, or you press
>> ctrl-C), the interprocess docs say nothing as far as I can see
>> about what happens to locked mutexes and awaited conditions. In
>> practice it seems that mutexes that were locked remain locked,
>> and other processes will deadlock. (I'm using Linux.) A few
>> thoughts:
>>
>> * If a process were only reading the shared state, then it would
>> be appropriate for the mutex to be unlocked on termination.
>>
>> * If a process were modifying the shared state, then it would be
>> wrong to unconditionally unlock the mutex. So it would be useful
>> to distinguish between reader and writer locks, even if we're not
>> implementing a single-writer/multiple-reader mutex.
>>
>> * The system could be made more robust by blocking signals while
>> a mutex is locked. This doesn't help with crashes, e.g. segfaults,
>> but it would help with ctrl-C.
>
> Catching signals is a good idea regardless of IPC and locking mutexes.
> As long as there is a moment when your application holds some valuable
> data or some state (e.g. a network connection) that needs to be properly
> saved or cleaned up on exit, you have to implement proper signal
> handling and graceful program termination.
To be clear, I don't mean that Boost.Interprocess should be dealing with
signals. User's application should.
>> * It may be useful to cause all processes to terminate if one of
>> them terminates with a mutex held for writing, either immediately
>> or as soon as they try to lock the same mutex. Perhaps also to
>> delete the presumed-corrupted shared memory segment.
>>
>> * PTHREAD_MUTEX_ROBUST might be part of the solution. That seems
>> to require the non-crashed process to do clean up, i.e. we would
>> need to record whether the crashed process were reading or writing
>> and react appropriately.
>
> You can't do that reliably because the crashed process could have
> crashed between locking the mutex and indicating its intentions. For an
> other process to be able to restart or roll back a failed operation,
> that operation has to be implemented in a lock-free fashion, so that
> each step is atomic. At this point mutexes become redundant.
>
> In my experience, the only sensible reaction to an abandoned operation
> (regardless of the way you use to detect the abandoned state) is to
> scrap it and abort or start over in a new shared memory segment.
>
>> I'm less clear about what happens to condition variables, but it
>> does seem that perhaps terminating a process while it is waiting
>> on a condition will cause other processes to deadlock. Perhaps
>> the wait conceptually returns and the mutex is re-locked during
>> termination.
>
> AFAIR, pthread_cond_t uses a non-robust mutex internally, which means
> that condition variables are basically useless when you need robust
> semantics.
>
> If you need a condition variable-like behavior, in a robust way, I think
> your best bet is to use futexes directly.
Boost list run by bdawes at acm.org, gregod at cs.rpi.edu, cpdaniel at pacbell.net, john at johnmaddock.co.uk