Boost logo

Boost Users :

Subject: [Boost-users] [interprocess] Sharing data in a peer-to-peer fashion
From: Brett Gmoser (bgmoser_at_[hidden])
Date: 2010-03-05 11:37:29


Hello,

I'm working on what seems to be a fairly interesting problem, and I'm
looking for any interprocess experts to lend any advice. After reading
all of the documentation for Interprocess, it seems that all of the
examples work with a "one parent, many children" type model. The
problem that I am having is that I am working with a "many children, no
parent" model. My processes are spawned by the web server (and
FastCGI), so there is no easy way to modify that code to manage the
interprocess data.

At least one instance of my program is meant to stay alive forever, so
I'm not very concerned about deleting the interprocess data at any time.
  My problem however is the mutexes, and the stale locks that result if
one of the processes crashes or exits abnormally. It seems that once an
instance of my application crashes, no other instance may ever get a
lock on the shared mutex, because the original process still holds a lock.

I've thought of a few different ideas to solve this problem:

  * Using an interprocess shared_ptr for the mutex and data, with a
custom deleter to remove all instances once the last application is
exiting and the use count is zero. However this suffers from the same
problem - it seems the use count is never decremented when the
application exits abnormally (for example, kill -9, crash, or CTRL+C).

  * Using a "heart beat" type system to keep track of processes. My
idea was to do something like this: Keep an interprocess associative
array of process ID's mapped to last heartbeat time, with each process
updating it's own heartbeat. At the same time, every other process
(since there is no one parent process) must keep track of the
heartbeats, and remove processes from the array which have not responded
with a heartbeat in awhile. The problem is this - what about the mutex
for the associative array? And the locks that another application might
have on them? I wind up back at the point that I started at. The stale
locks are destroyed if the mutex is deleted (via
boost::interprocess::named_mutex::delete), and I can detect the stale
lock pretty reliably with a timed try lock (wait 30 seconds or so to
aquire the lock, if it can never be acquired then there is obviously a
problem). But what of thread safety, since I now have to delete the
lock and re-create it, where other processes might wind up doing the
same thing? And what if those other processes are also trying to obtain
a lock on the mutex at the same time I'm deleting and recreating it?

Those are pretty much the only ideas I've had. Does anybody have
anything better? Maybe some of you have tackled a similar problem?

Thanks!


Boost-users list run by williamkempf at hotmail.com, kalb at libertysoft.com, bjorn.karlsson at readsoft.com, gregod at cs.rpi.edu, wekempf at cox.net