Boost logo

Boost Users :

From: Zeljko Vrba (zvrba_at_[hidden])
Date: 2008-04-13 10:31:38


On Sun, Apr 13, 2008 at 10:46:42AM +0200, Ion Gaztañaga wrote:
>
> Ok. You need to register all processes attached to one particular
> segment the segment somewhere. This imposes some reliability problems
> because a process might crash when doing another task than using the
> segment.
>
Yes. I used a separate "bootstrap [shm] segment" to hold all global
bookkeeping data. Actually, for this particular purpose, you don't
even need it: you can use the SHM segment itself to hold a list of
processes attached to it. (Unfortunately, there's no POSIX API to
get the list of processes attached to a particular SHM -- most probably
because such information is volatile and potentially already worthless
at the time you get to use it.)

Regarding reliability: a process can crash at any time for any cause;
introducing error-free SHM grow code (however it's implemented) will
not make the program crash more or less frequently or introduce some
new failure mode.

What _can_ happen though is that a process crashes and remains registered
as having the segment attached. The same problem occurs also when
handling SIGSEGV to make the remapping. Since a dead process may be
replaced by a random process with the same PID, sending an asynchronous
notification to that process may do unpredictable things -- most likely,
terminate it [the default action for most signals].

So the reason for handling SIGSEGV and other fatal signals would NOT be
to remap segments, BUT to deregister the process from the SHM manager
before terminating it. This is again only a half-solution because the
process may be terminated by other signals that it doesn't handle, and
most definitely with SIGKILL which can't be caught.

Potential solution would be to have all cooperating processes have a
common parent controller -- thus, when the process dies, it will remain
in the zombie state, and since parent will be coded to NOT call wait()
and not to exit until all childs have exited (SIGCHLD), this will prevent
the reuse of PIDs. This parent controller could than deregister the
process from its SHM segments, and finally wait() for it after everything
has been cleaned up. Where (at which level of complexity) to stop,
depends on the needs - but the solution _can_ be made very reliable and
portable.

(An extremely simple solution that doesn't require controller process and
won't kill random processes: just run the cooperating processes under a
dedicated user ID.)

>
> An asynchronous notification via signal does not carry enough context
> (sigval from sigqueue onlyl stores an int or void*) to notify which
>
That would be enough with a bootstrap segment that contains a list of
all SHM segs managed by boost. The you just send the offset/pointer
into this segment.

>
> And if that does not discourage you from implementing this, there is no
> much you can do inside a signal handler. You can see a list here:
>
> http://www.opengroup.org/onlinepubs/000095399/functions/xsh_chap02_04.html#tag_02_04
>
> This means that you can't call mmap from a signal handler. You can't
> remap memory asynchronously according POSIX. It's possible that some OSs
> support that.
>
Good point. You still have two choices:

  1. You ignored the possibility of sending a message through a POSIX msgq
     with SIGEV_THREAD notification (see mq_notify()).
  2. Have a dedicated signal + dedicated thread in each process to catch it
     (see sigwait()). [All other threads shall block this signal.]

>
> If remapping is possible, a more correct and robust mechanism could be
>
Correct according to which specification?

> catching SIGSEGV from processes that have not updated their memory
> mappings and doing some remapping with some global growable segment list
> stored in a singleton (this has problems when using dlls). Less
> interprocess communication means more reliability.
>
Far from it: that very same URL says the following:

"The behavior of a process is undefined after it returns normally from a
signal-catching function for a [XSI] SIGBUS, SIGFPE, SIGILL, or SIGSEGV signal
that was not generated by kill(), [RTS] sigqueue(), or raise()."

This applies to what you have just proposed. Furthermore, this venue shall
lead you into a mess of platform-specific code: please see GNU libsigsegv.

Anyway, a line has to be drawn somewhere: perfection is the worst enemy
of good enough. Why should a library ensure its correct operation, when
the client program breaks its preconditions?

It's a tradeoff between being clean and having stronger preconditions (my
approach), or relying on undefined behavior with weaker preconditions (trying
to compensate for broken programs).

A question: let's say that you have a situation like this:

[SHM segment] [unmapped memory]
                  ^
                  X

A program generates SIGSEGV at address X. How are you going to design a *roubust*
mechanism that can distinguish the following two cases:

  - true SIGSEGV (access through corrupt pointer)
  - SIGSEGV that should grow the segment?

Note that there's a race condition: a program might make a true invalid access
with corrupt pointer, but by the time that you've looked up the address, and
found the nearest SHM segment, *another* process might have already grown the
segment. Thus, instead of process being terminated, you will instead grow the
faulting process's SHM mapping and let it go berzerk over valid data.

Protecting the signal handler with a mutex/semaphore isn't enough: you'd need
a way to *atomically* enter the signal handler and acquire a mutex/semaphore.


Boost-users list run by williamkempf at hotmail.com, kalb at libertysoft.com, bjorn.karlsson at readsoft.com, gregod at cs.rpi.edu, wekempf at cox.net