Boost logo

Boost :

From: danl_miller (danl_miller_at_[hidden])
Date: 2002-03-13 23:48:24


--- In boost_at_y..., "dmoore99atwork" <dmoore_at_a...> wrote:
> --- In boost_at_y..., "danl_miller" <danl_miller_at_b...> wrote:
> > Note in this topic
> > that the population is not inherently fixed-size [just like the
> > Dorito's ads with Jay Leno a decade ago: "They'll make more!"]
> > (although for efficiency reasons, the message-queue---the
> > stereotypical use of the producer/consumer pattern in real-time
> > systems---would impose a fixed size in real-time systems so that
> > an upper-bound is established to guarantee availability of
> > resources as the real-time embedded software executes without
> > restart for months or years or decades).
>
> I think this is where my question is coming from, and I don't mean
to
> be difficult, but -how- do you enforce the upper-bound on the
message-
> queue size using only a classic "Dijkstra" semaphore? If the
> production is running ahead of consumption, the warehouse will be
> full of Doritos!!! :-) It just seems to me that to provide the
> ability for the producer to block when this upper limit is reached,
> then that's more like a condition variable to me. Semaphores seem
a
> natural fit for controlling consumption, but not for capping
> production. You'd need a reverse-semaphore or something like
that...
>
> I get the distinct impression that I'm still missing something
here.
> Please help.
>
> Dave

  There are two choices of what to do when a would-be (n+1)th element
is attempted to be pushed into a n-element message-queue. One can
either emphasize nonlossy interthread communications or emphasize a
nonblocking architecture.

  In the practice of embedded real-time systems, the emphasis is
usually on nonblocking architecture. The viewpoint is that it is far
better to lose one microscopic unit of work than to permit one defect
in the system (e.g., efficiency of the processing of the units of work
by the receiving thread(s) pending on the queue) to have cascading
effects which in turn could have cascading effects which in turn could
have cascading effects which in turn could bring down the entire
macroscopic system. In this case the would-be (n+1)th element is
silently discarded (except for the fact that there is usually an
overflow standing-condition that message-queues usually represent
somehow to indicate that the message-queue is too small and that
overflow occurred).

  Alternatively, one could apply back-pressure on the producer,
blocking the producer from pushing one more work-request until at
least one consumer consumes a work-request. This blocking of the
producer in isolation appears to be a good idea so that work-requests
are not lost due to message-queue overflow. But problems tend to
arise when considering the macroscopic system. Often there is not
merely one producing thread pushing work-requests into one
message-queue which is pended on by one consuming thread. Instead
there are often a dozen (or even multiple dozens) of threads (or
messague-queue-scheduled thread-pools) in participating in a
dependency tree, directed acyclic graph (DAG), or even a cyclic
graph. In the tree & DAG cases, a downstream bug/inefficiency/defect
can cause at least degradation (and at most total failure) of the
upstream tree/DAG as blocking back-pressure is applied upstream. As
deliterious as that might be, the real nightmare is the cyclic graph
case where a cycle of threads are working on a topic in assembly-line
fashion, each performing their added-value work on bits of work which
pass around the cycle, such as the work performed by each state in a
state-machine/finite-state-automaton (FSA). A cascading blocking
back-pressure mechanism can cause the entire cycle of threads to at
least exhibit drastically reduced concurrency (as the entire cycle
waits for the troubled operation on a single downstream consuming
thread to complete) or at worst completely lock up forever as every
thread is waiting to push a unit-of-work to some other thread who in
turn is waiting to push a unit-of-work and so forth. Even if this
aberrant system state eventually self-corrects out of this amount
cascading blocking back-pressure, the software system as a whole might
have failed to accomplish some of its primary intended functions
(e.g., air traffic controller's image of the airplanes froze up,
showing the airplanes as they were 30 seconds ago), might have missed
a real-time deadline (e.g., the protection-switch between two
redundant fiber-optic cables due to a backhoe cut of one of the two
fiber-optic cables did not occur soon enough, causing 144,000
telephone conversations to be disconnected).

  Generally when working with message-queues which overflow in a lossy
way, the units of work pushed into them are categorized at
engineering-time into two categories:
  1) those units of work which individually can be lost, such as
because there is some higher-layer recovery mechanism or because a
single individual unit of work can occasionally be lost without
affecting the integrity of the over-all system (e.g., a missing 10
millisecond portion of audio where the prior 10 ms can be continued to
fill in the missing gap)
  2) those units of work which absolutely cannot be lost. In a
lossy-overflow message-queue asynchronous MT architecture each thread
working with these mission-critical units of work would retain a
rememberance of each unit of work. There would be some expression of
acknowledgement or completion by one or more downstream threads for
each of these remembered mission-critical units of work. That
acknowledgement would directly or indirectly make its way back to the
rememberance data-structure/thread. The unit of work would be
considered a success and would typically be removed from the
data-structure remembering partially completed work which was
submitted for downstream processing. If no acknowledgement ever came
back before some deadline (measured in elapsed real-time or some other
metric) for a rememberance of partially completed
submitted-for-downstream-processing work, then that unit of work would
be considered to be in need of resubmission. (By the way these units
of work would need to be idempotent, which means that multiple
performances of a request r for a unit of work yield the same result
as a single performance of r (i.e., duplicates are okay), because
maybe instead of the unit of work being lost, the acknowledgement was
lost after the work was in fact successfully accomplished unbeknownst
to the producing thread retaining the rememberance.) The unit of work
needing resubmission would again be pushed onto the message-queue and
eventually the mission-critical unit of work would be successfully
accomplished on the first or second or third or nth try.


Boost list run by bdawes at acm.org, gregod at cs.rpi.edu, cpdaniel at pacbell.net, john at johnmaddock.co.uk