|
Boost : |
Subject: Re: [boost] Interprocess mutex & condition variable at process termination
From: Andrey Semashev (andrey.semashev_at_[hidden])
Date: 2017-02-15 18:24:13
On 02/15/17 20:42, Phil Endecott via Boost wrote:
> Dear Experts,
>
> I've just been surprised by the behaviour of the interprocess
> mutex and condition variable on abnormal process termination, i.e.
> they are not automatically released.
>
> Google tells me that I'm not the first to be surprised by this; there
> have been previous posts here, stack overflow questions etc.
>
> One often-valid observation is that if a process crashes - or
> otherwise terminates without executing its destructors - while it
> holds a lock on a shared data structure then the data is probably
> now corrupt, so unlocking the mutex that protects it is not very
> useful. I think there is an important case where that does not
> apply - when the process that crashes is only reading the shared
> data. In my case, I had written a "monitor" utility that loops
> forever, waiting on a shared condition, taking the corresponding
> mutex, and then dumping the shared data to stdout. I had been
> running this and stopping it by pressing ctrl-C and it had not
> occurred to me that this might not work as I expected. My
> attempt at debugging using this utility was making my problems worse,
> not better! Modifying this code to run destructors on ctrl-C is
> non-trivial.
>
> I am aware that the SysV shared semaphore is able to undo on
> process termination (see SEM_UNDO in man semop), and I had assumed
> that Boost.Interprocess was using this or something like it. I
> now see that it is using pthreads, which I didn't even realise
> could work between processes, and I don't think this API has
> any way to specify process termination behaviour.
There is a way to handle this case, but this API is not universally
supported:
http://pubs.opengroup.org/onlinepubs/9699919799/functions/pthread_mutexattr_getrobust.html
If that API is not supported on your platform, you may want to avoid
locking the mutex without a timeout (i.e. failing to acquire a mutex for
a given time should be considered an indication that the mutex has been
abandoned in the locked state).
In general, synchronization primitives that reside in shared memory
(such as pthread mutexes or Boost.Interprocess mutexes) should be
considered vulnerable to (a) corruption and (b) becoming unusable (like,
indefinitely locked) because of a user process misbehavior. That is
rather obvious considering that such primitives typically do not include
any other resources, such as handles to kernel objects or file
descriptors and as such "don't exist" for the kernel (consequently, the
kernel cannot release them on process termination). Robust mutexes that
I referenced above are an exception to that general rule.
Named primitives, such as SysV semaphores, are typically more protected
because there is at least a file descriptor or something that
corresponds to the name and there is usually a limited API to interact
with the primitive (i.e. you usually don't have a direct access to the
primitive data).
There are a number of named synchronization primitives in
Boost.Interprocess, although I don't think they provide "auto unlock on
process termination" feature.
> Anyway, I'd like to suggest that the interprocess docs should
> make some mention of the behaviour of the synchronisation
> primitives on process termination, e.g. somewhere near the
> beginning of
> http://www.boost.org/doc/libs/1_63_0/doc/html/interprocess/synchronization_mechanisms.html#interprocess.synchronization_mechanisms.mutexes
>
> I may now try to implement some primitives that use semop() and
> unlock automatically. I haven't yet looked at what's involved to
> implement a condition variable on top of a semaphore, so I may not
> get very far! Has anyone else ever tried this?
If you want (more or less) reliable interprocess synchronization, you
will currently have to implement it yourself. There are a number of
compromises to make along the way. For instance, pthread robust mutexes
API does not quite fit into the traditional C++ mutex API, so one has to
improvise. In the absence of robust mutexes, the timeout workaround is
not universally applicable, and the timeout itself is, obviously,
case-specific. Also, most of these APIs are not fully portable (not
between Windows and POSIX-compatible systems, anyway), so you end up
with OS-specific branches.
I did implement this an a few of my projects. One example is Boost.Log,
where I opportunistically use robust mutexes:
https://github.com/boostorg/log/blob/develop/src/posix/ipc_sync_wrappers.hpp
https://github.com/boostorg/log/blob/develop/src/posix/ipc_reliable_message_queue.cpp
You can see Windows implementation is quite different:
https://github.com/boostorg/log/blob/develop/src/windows/ipc_sync_wrappers.hpp
https://github.com/boostorg/log/blob/develop/src/windows/ipc_sync_wrappers.cpp
https://github.com/boostorg/log/blob/develop/src/windows/ipc_reliable_message_queue.cpp
The best solution to these problems, however, is to avoid locks
altogether and use lock-free algorithms in such a way that any data in
the shared memory is valid and can be handled.
Boost list run by bdawes at acm.org, gregod at cs.rpi.edu, cpdaniel at pacbell.net, john at johnmaddock.co.uk