Boost :

Date view	Thread view	Subject view	Author view

Subject: [boost] [interprocess] More robust message_queue and interprocess_condition?
From: Ross MacGregor (gordonrossmacgregor_at_[hidden])
Date: 2011-04-14 19:11:24

Next message: Jeffrey Lee Hellrung, Jr.: "Re: [boost] [move] Declarations of the copy constructor and copy-assign operator produced by BOOST_MOVABLE_BUT_NOT_COPYABLE"
Previous message: Sergey Cheban: "Re: [boost] Fw: [locale] Formal review of Boost.Locale library"
In reply to: Sergei Politov: "[boost] [interprocess] message_queue hangs when another process dies"
Next in thread: kopo: "Re: [boost] [interprocess] More robust message_queue and interprocess_condition?"
Reply: kopo: "Re: [boost] [interprocess] More robust message_queue and interprocess_condition?"
Reply: Ross MacGregor: "Re: [boost] [interprocess] More robust message_queue and interprocess_condition?"

Sergei Politov <spolitov <at> gmail.com> writes:
>[interprocess] message_queue hangs when another process dies
>
> Suppose we have 2 processes, one sends messages to queue, another reads
> them.
> When reading process dies (for instance using End process) during
> message_queue.receive the another process hangs in send.

I've run into an apparently old problem with message_queue (see post above from
two years ago) and I am wondering if there isn't a fairly simple solution. It
wouldn't be perfect but would be far better behavior than we have now.

Please let me know if this looks like a good idea for inclusion into the
interprocess library.

The problem is the message_queue send operation will block forever trying to
send to a process that has been abnormally terminated. The send is trying to do
a interprocess_condition::notify_one call. Inside
interprocess_condition::notify it executes the statement: "m_enter_mut.lock()".
This mutex is holding back the send call from completing because the dead
process still has ownership.

My solutions to the problem lie within the interprocess_condition class, as
this is really the source of the problem.

Solution 1: Fixed timeout notify
--------------------------------

Change the mutex lock call in interprocess_condition::notify to a timed_lock
call using a fixed timeout value. This feature could be enabled/disabled and
the timeout value configured through use of preprocessor symbols.

Replace:

  inline void interprocess_condition::notify(boost::uint32_t command)
  {
      m_enter_mut.lock();

With:

  inline void interprocess_condition::notify(boost::uint32_t command)
  {
  #ifdef ENABLE_BOOST_INTERPROCESS_TIMEOUT
     boost::posix_time::ptime expires
       = boost::posix_time::microsec_clock::universal_time() +
         boost::posix_time::milliseconds(BOOST_INTERPROCESS_TIMEOUT_MS);
     if (!m_enter_mut.timed_lock(expires))
       throw timeout_exception();
  #else
      m_enter_mut.lock();
  #endif

This allows an exception to be thrown if it waits too long at the mutex. This
may be adequate for most applications, I don't see a good reason for this to
block for very long. This change will of course effect anything using
interprocess_condition, which could be seen as a good thing or a bad thing.
Good in that anything using it, like message_queue for instance, will
immediately get improved functionality. The message_queue send will now throw
an interprocess_timeout exception on the send without any code changes! However
it may be seen as bad thing because the thrown exception may be unexpected
behavior (although not expecting exceptions is not a wise thing).

Solution 2: Notify with timeout
-------------------------------

We introduce an interprocess_condition::notify that specifies the time to wait
for notification to complete.

Add:

  inline void interprocess_condition::notify(
      boost::uint32_t command,
      const boost::posix_time::ptime &abs_time)
  {
     if (!m_enter_mut.timed_lock(abs_time))
       throw timeout_exception();

This solution is much the same as the first but introduces new methods to
accomplish the functionality. The advantages of this approach would be control
of the timeout value and existing functionality would not be changed.
Disadvantage would be that software wanting this feature would need to be
rewritten. For example the message_queue send & try_send functions could have
additional timeout values. One issue I am having with this solution, is why
would I want to use the old notify API? It seems the new methods would
deprecate the old ones and create mild confusion.

Actually looking closely at the message_queue API, this presents some
challenges:

   // We can add the timeout here, no problem.
   void send (
       const void *buffer,
       std::size_t buffer_size,
       unsigned int priority,
       const boost::posix_time::ptime& abs_time); // <-- new timeout value

   // The nature of this method is not to block,
   // so adding a timeout value here is counter intuitive.
   // But this is exactly what we need to do because it blocks
   // in our exceptional case.
   bool try_send(
       const void *buffer,
       std::size_t buffer_size,
       unsigned int priority,
       const boost::posix_time::ptime& abs_time); // <-- new timeout value

   // Here we probably need to use the existing timeout for
   // the timed_notify call.
   bool timed_send(
       const void *buffer,
       std::size_t buffer_size,
       unsigned int priority,
       const boost::posix_time::ptime& abs_time); // <-- existing timeout value

I was originally thinking this was the best solution, but now after looking at
the details, solution 1 is looking more appealing.

Solution 2: Try notify
----------------------

This solution pushes the waiting code back to the caller. The advantage is that
this solution does not block at any time, but its usage will be more
complicated.

Add:

  inline bool interprocess_condition::try_notify(
      boost::uint32_t command)
  {
     if (!m_enter_mut.try_lock())
       return false;

Not sure I am loving this solution, but it could be used to create a better
behaved message_queue::try_send. One that would be difficult to use too, and
I'm afraid not very popular (ie try_send returning false because it can't
aquire the mutex right away).

Date view	Thread view	Subject view	Author view

Boost list run by bdawes at acm.org, gregod at cs.rpi.edu, cpdaniel at pacbell.net, john at johnmaddock.co.uk