Boost logo

Boost :

From: Kevin S. Van Horn (kevin.vanhorn_at_[hidden])
Date: 2001-11-11 14:09:35

I finally managed to eke out enough free time to finish reviewing and
commenting on the documentation for the Boost threads library (as a prelude to
attempting some formal correctness proofs.) Here are my remarks. They
include a mixture of comments on design, exposition, wording, and sometimes


Fairness and strongness

Missing from the documentation is a discussion of the three levels of
increasingly-strong guarantees for mutexes, which parallel the same categories
for semaphores:

1. Weak mutexes. When a thread unlocks a weak mutex and there are other
threads blocked waiting on the mutex, the mutex may enter the unlocked state
for a short period of time until some thread locks it again. The locking
thread may not have been one of the threads waiting at the moment the mutex
was unlocked.

Consider the following scenario with a weak mutex. Thread A is in a loop
where it repeatedly acquires the mutex, does something, then releases the

  while (1) {
    <processing outside of the critical section>;
    boost::mutex::scoped_lock l(mtx);
    <critical section>;

If there are one or more other threads waiting to acquire the mutex, thread A
may monopolize the mutex, starving the other threads so that they *never*
acquire the mutex. This can occur because thread A may reacquire the mutex
after releasing it, before one of the waiting threads manages to acquire it.

Weak mutexes are nearly useless. As far as I can tell, it is impossible to
build starvation-free synchronization mechanisms on top of them. As one
example, consider trying to build semaphores on top of weak mutexes. The
acess to the integer giving the semaphore count must be protected by a mutex.
If this is a weak mutex, then a signal/up_count/V operation may block
forever if there is heavy activity on the semaphore!

*** I have looked over the POSIX implementations of condition variables and
the various Mutex types, and several of these are weak mutexes. ***

2. Strong mutexes. Suppose that a thread A holds a lock on the mutex while
there are other threads blocked trying to acquire the mutex. When A "unlocks"
the mutex, it does not actually enter an unlocked state; instead, the mutex
remains locked, but one of the blocked threads becomes unblocked and acquires
the mutex. This prevents any one thread from monopolizing the mutex.
However, it is not sufficient to prevent starvation: two threads in a
lock/unlock loop such as presented in the discussion of weak mutexes can
starve a third by playing a game of "keep away". That is, when thread A
releases the mutex, thread B always acquires it, and when thread B releases
the mutex, thread A always acquires it, leaving thread C forever waiting to
acquire the mutex.

3. Fair mutexes. A fair mutex is a strong mutex with one additional
guarantee: as long as every thread that acquires the mutex eventually releases
it, any thread that is blocked on the mutex will eventually require it. Note
that mutexes that use FIFO scheduling are fair, but a mutex can be fair
without using FIFO scheduling.

I seem to recall that one can implement fair semaphores on top of strong
semaphores, though I do not recall the algorithm. I'm not sure whether or not
one can implement a fair mutex on top of strong mutexes. Note the important
difference between binary semaphores and mutexes: any thread can unlock a
locked binary semaphore, but only the locking thread can unlock a locked
mutex. This added capability of binary semaphores is very useful for ensuring
lack of starvation in synchronization algorithms.

As with mutexes, condition objects can also be strong or fair. (The
definition of the notify_one() and notify_all() operations rules out weak
condition objects.) Suppose that a thread A is blocked in a wait() on a
condition object. If the condition object is fair, then A can only remain
blocked forever because of a lack of notify() calls on the condition object,
and not because A always gets passed over in favor of some other thread to
unblock on a notify_one() call. The Boost.Threads documentation needs to
specify whether or not condition objects are fair in this sense. Also, the
documentation should make clear that there are two kinds of blocking in a
wait() operation: a thread may be blocked on the condition object itself, or
it may have been unblocked and then subsequently blocked trying to reacquire
the mutex.

If at all possible, and if the implementation cost is not too high, all
mutexes and condition objects should be fair. Otherwise you run the risk of
users having mysterious, intermittent and irreproducible thread "lock-ups."
If making the mutexes or condition variables fair adds a significant
implementation cost, then Boost.Threads should have both efficient-but-unfair
and fair-but-less-efficient mutexes.

Handling of programmer errors

There are many places where the documentation specifies that exceptions *will*
be thrown if some precondition is not met. Do you really want to guarantee
this, or do you just want to say that the exception *may* be thrown? At debug
time you do not want exceptions thrown for what amounts to programmer error
(e.g., unlocking a mutex that you do not have locked), as this throws away a
lot of useful information as the stack unwinds. At debug time it is much
preferable for the program to dump core so that you can examine the core dump
with a debugger and have full information on the context of the error. I
recognize that a core dump is unacceptable for some production environments,
so it would be nice to be able to choose between the core-dump and exception
options via a compile-time flag.

In addition, I would suggest that even when you do throw an exception, the
exception object should include the file and line number. (See discussion
below for overview.html.)

Atomic reads and writes

I think a discussion is needed of why even reads and writes of variables of
primitive types may not be atomic, and that there is *one* type for which
reads and simple writes are guaranteed atomic (sig_atomic_t).



Quote: "Priority failures (priority inversion, infinite overtaking,
starvation, etc.)"

Comment: The programmer cannot do anything about priority inversion; that has
to be addressed at the OS level. I don't believe that "infinite overtaking"
and "starvation" are defined anywhere. I'm not sure that "priority failure"
is an appropriate term to describe starvation. Absence of starvation just
means that a thread waiting its turn to do something does, eventually, get its
turn; the concept is meaningful even in the absence of any notion of thread

Quote: "Every multithreaded program must be designed carefully to avoid race
conditions and deadlock."

Comment: I would add starvation to the list of things to avoid. This is a
commonly-overlooked error in multithreaded code.

Quote: "Common requirements for all Boost.Threads components."

For implementors they are requirements, but for users (the great majority of
those who will be reading the document) they are guarantees / specifications.


Quote: Under "Flexiblity" it says, "When functionality might be compromised by
the desire to keep the interface safe, Boost.Threads has been designed to
provide the functionality, but to make it's use prohibitive for general use."

Comment: I have no idea what that final clause means. Also, "it's" means "it
is"; the possessive of "it" is "its." (You write "whose", not "who's"; "his",
not "he's"; "theirs", not "their's; and so forth.)

Quote: "Recently there has been a lot of research in other concepts, such as
in 'Communicating Sequential Processes.'"

Comment: CSP is not new. It wasn't new in in 1985, when I first learned about


Quote: "1. Are lock objects thread-safe?"

Comment: The hyphen is necessary only when the adjective phrase "thread safe"
immediately precedes a noun or noun phrase. For example, "f() is a
thread-safe function." The hyphen indicates grouping: "thread" modifies
"safe", and not "safe function".

Quote: "They are meant to be short lived objects..."

Comment: Should be "short-lived" objects: "short" modifies "lived", not "lived

Quote: "One easy way of doing this [locking mutexes in a standard order] is to
use each mutex's address to determine the order..."

Comment: This is not portable. The standard only allows comparisons other
than == and != between pointers that point into or just past the end of the
same object. The example code for item 5, using "<" and ">" to compare
mutex pointers, is not portable for this reason. Interestingly enough,
however, I seem to recall that function objects of type std::less<T *> are
guaranteed to provide a complete ordering.

Quote: The example code for item 5 includes the lines "m_value =
other.m_value;" and "return m_value;", which are protected by mutex locks.

Comment: It's worth mentioning why the protection is necessary: with the
exception of the sig_atomic_t type, you don't know know whether reading or
writing a value of even a primitive type can be done as a single atomic action
(single memory read/write.)

Quote: [Under item 6] "The internal state of mutex types could have been made
mutable, with all lock calls made via const functions, but this does a poor
job of documenting the actual semantics."

Comment: In fact, such a const declaration would be outright incorrect, since
the logical state of a locked mutex clearly differs from the logical state of
an unlocked mutex!

Quote: "Semaphore was removed as too error prone. The same effect can be
achieved with greater safety by the combination of a mutex and a condition

Comment: This is not so easy to do correctly if you wish to avoid


Quote: "Thread State." [Followed by definitions of "Ready" and "Running".

Comment: You are getting into scheduler implementation details that have no
place in a behavioral specification. The usual way of reasoning about
concurrent programs is that you assume nothing about the relative speeds of
different threads, and that arbitrarily long (but finite) delays may occur at
any point in the execution of a thread. The change of state from Running to
Ready and then eventually back to Running again looks just like an arbitrary
pause in the execution of the thread.

Quote: [Under "Race Condition"] "The result of a race condition may be a bit
pattern which isn't even a valid value for the data type."

Comment: It's worth mentioning why this is so (see previous comment on

Quote: "A priority failure (such as priority inversion or infinite

Comment: Need to define priority inversion and infinite overtaking. Also need
to define and discuss starvation.


Quote: "However, computer science research has shown that use of these
primitives is difficult since there's no way to mathematically prove that a
usage pattern is correct, meaning it doesn't result in race conditions or

Comment: The above statement is false. CS researchers have been
proving the correctness of concurrent algorithms using semaphores, mutexes, or
even just reads and writes of shared variables with busy waiting (e.g., the
Bakery Algorithm) for decades. It would be correct, however, to say that
doing so is *difficult* for low-level synchronization constructs.

Quote: "Though it's possible to create a lock object outside of block scope
and to share it between threads to do so would not be a typical usage."

Comment: I would make a stronger statement: "to do so would very likely be an
error." Also, you need a comma after "between threads."

Quote: [Under "Rationale for Non-Copyable Thread Type".] "... could provide
more efficient reference management then wrappers."

Comment: "then" (order) should be "than" (comparison.)

Quote: "Analysis of whether or not these arguments would hold true don't
appear to bear them out."

Comment: "Analysis" is the simple subject, so "don't" should be "doesn't."

Quote: "...but thread_ref no longer suffers because of it's reference

Comment: "it's" should be "its."


There is a systematic misuse of "concept" and "model" in this document. To
- A concept is a set of requirements on a type.
- Mutex is a concept.
- A type T is a model of concept C if T satisfies all of the requirements of
- The type boost::mutex is a model of Mutex.
- Actual objects are not models of concepts nor of classes.

The phrase "mutex model" is repeatedly used when "mutex" or "mutex object"
(object whose type is a model of Mutex) is the correct term. Sometimes "mutex
concept" is used when "mutex object" would be correct.

Quote: "A mutex (short for mutual-exclusion) concept serializes access..."

Comment: Concepts don't *do* anything; they're just collections of type
requirements. You mean "mutex" or "mutex object".

Quote: "A model that implements Mutex and its refinements has two states:"

Comment: Concept models are types, and types don't have state; objects do.
Suggested rewrite: "A mutex (object whose type models Mutex) has two
states..." The "and its refinements" is redundant, since any type that
models a refinement of Mutex also models Mutex itself.

Quote: "mutex model object" [Appears in many places.]

Comment: Replace with "mutex" or "mutex object."

Quote: "mutex model" [Appears in many places; e.g., "unlock a mutex model."]

Comment: Replace with "mutex" or "mutex object."

Quote: "With this pattern, a lock concept is employed where the lock model's
constructor locks the associated mutex..."

Comment: "lock concept" and "lock model" should be "lock object."

Quote: "... mutex model ..."

Comment: Should be "model of Mutex" or "Mutex model", as "Mutex," not "mutex,"
is the name of the concept.

Quote: "With an unspecified locking strategy, when a thread attempts to
acquire a lock on a mutex model for which the thread already owns a lock the
operation results in undefined behavior. When a mutex model has an
unspecified locking strategy the programmer must assume that the mutex model
instead uses an unchecked strategy."

Comment: The model *may* use an unchecked strategy. If the strategy is
unspecified, the programmer shouldn't assume that that the thread is
*guaranteed* to deadlock; "undefined behavior" means that anything can happen,
including deadlock, aborting the program, or the computer going "Star Trek"
and exploding in a shower of sparks :-).

Scheduling policies: What is the difference between "Undefined" and
"Unspecified"? Under "Priority Driven", it might be worth mentioning the
usual way of handling priorities without starving low-priority threads: use
aging, where a thread's priority increases the longer it waits, then goes back
to its original level after it is serviced.

Quote: [Under "Scheduling Policies," "Undefined"] "Users should assume that
low-priority threads may wait indefinitely, and that threads of the same
priority acquire the lock in essentially random order."

Comment: Does "indefinitely" mean "forever", or just "an unspecified but
finite amount of time"? See also general notes on weak vs. strong vs. fair

Quote: [In Mutex concept, for expression "M::scoped_lock".] "A type meeting
the ScopedLock requirements.

Comment: Suggest "A model of ScopedLock." Similar comment applies to TryMutex
and TimedMutex.

Quote: [In TryMutex concept.] "A TryMutex must meet the Mutex requirements."

Comment: Suggest "TryMutex is a refinement of Mutex." Similar comment applies
to TimedMutex.

Quote: "Boost.Threads currently supplies six classes which model mutex

Comment: Suggest "Boost.Threads currently supplies six models of Mutex." In
the table underneath this, suggest replacing "Classes Modeling the Concept"
with "Models."


Quote: "The mutex, try_mutex and timed_mutex classes are full featured models
of the Mutex, TryMutex, and TimedMutex concepts."

Comment: Somewhat redundant. Suggest "The mutex, try_mutex, and timed_mutex
classes are models of Mutex, TryMutex, and TimedMutex respectively."

Table in Introduction: There is a column labeled "Implementation defined Lock
Type." This should be removed as an implementation detail on which library
users should not rely. Take a look at the description of the standard library
containers; do they tell you the "implementation defined iterator type" for
each container class?

Quote: "The mutex, try_mutex and timed_mutex classes use an Unspecified
locking strategy."

Comment: This isn't really a strategy; it's a refusal to specify the
strategy. Suggest "The mutex, try_mutex and timed_mutex locking strategies
are unspecified."

Quote: "Programmers should assume that threads waiting for a lock on objects
of these types acquire the lock in a random order, even though the specific
behavior for a given platform may be different."

Comment: "Random" has probabilistic connotations. Suggest "Programmers should
make no assumption as to the order in which waiting threads acquire a lock."
One exception: for strong mutexes, programmers can assume that when a thread
holding a mutex releases it and there are threads blocked on the mutex, one of
the waiting threads acquires the lock immediately. (At least, logically this
is what happens.)

Quote: "Class mutex Synopsis"

Comment: It's worth mentioning that mutex is a model of Mutex and NonCopyable,
and that the class provides no additional facilities beyond the requirements
of those two concepts. Similar comment applies to try_mutex and timed_mutex.


The comments for mutex.html apply here also.

Quote: "Attempts to unlock them by a thread that does not own a lock on them
will result in a lock_error exception being thrown."

Comment: Why not "undefined behavior"? Do you really mean to *guarantee* that
a lock_error exception will be thrown? During debugging, I would prefer an
assert() that causes a core dump over throwing an exception, as the latter
loses all information as to the context in which this programming error


As with mutex_concepts.html, this document systematically misuses "concept"
and "model." The phrase "lock model" should, in most cases, be replaced by
"lock object."

Quote: "The lock concepts provide exception safe means for locking and
unlocking a mutex model."

Comment: Concepts are specifications, not mechanisms. Suggest "Lock objects
provide ... a mutex."

Quote: "Instances of lock models are meant to be short lived."

Comment: Suggest "Lock objects are meant to be short lived."

Quote: "Lock models must maintain state to indicate whether or not they've
been locked and this state is not protected by any synchronization concepts."

Comment: Concepts are not mechanism, and thus cannot protect. Suggest "Lock
objects maintain state to indicate whether they've been locked, and this state
is not protected by any synchronization mechanism."

Quote: [Lock requirements table.]
  (&clk)->operator const void *()

Comment: I would suggest
  Expression Effects
  clk Convertible to void *. Returns nonzero if ...

Quote: [Lock requirements table.]
  Expression Effects
  clk.locked() Returns a bool, (&clk)->operator const void *() != 0

Comment: I would suggest
  Expression Effects
  clk.locked() clk != (void *)0

Quote: [Lock requirements table.]
  Expression Effects
  lk.lock() Throws a lock error if locked(). If the associated mutex
               is already locked by some other thread, places the current
               thread in the Blocked state until the associated mutex is
               unlocked, after which the current thread is placed in the
               Ready state, eventually to be returned to the running state.

Comment: Do you mean to guarantee that an exception will be thrown? When
debugging, I would rather have an assert() that causes a core dump, so that I
can investigate the context in which the programming error occurred.
Secondly, the wording seems to imply that when the thread holding the lock on
a mutex releases it, *any and all* threads blocked on the mutex get
unblocked. Finally, the discussion of a Ready state brings in thread
scheduler implementation details that should be of no concern to the

Quote: [Lock requirements table, entry for "lk.unlock()."] "If !locked(),
throws lock_error..."

Comment: Do you really mean to *guarantee* that an exception is thrown?

Quote: "A ScopedLock must meet the Lock requirements."

Comment: Suggest "ScopedLock is a refinement of Lock."

ScopedLock Concept: If mutex m is of type M, isn't it required that type L and
M::scoped_lock be the same type in the expression "L lk(m);"?

ScopedTryLock Concept: Similar comments as for ScopedLock. Also, for
lk.try_lock() the table guarantees that an exception will be thrown for a
programming error.

ScopedTimedLock Concept: Same comments as for ScopedTryLock and ScopedLock.
Other comments:
- I'm uncomfortable with the use of absolute times (xtime) instead of
  durations. The xtime document refers to "the duration since the epoch
  specified by clock_type." This "clock_type" is passed to xtime_get(), but
  how does the timed lock know which clock_type value to use?
- lk.timed_lock(t) is specified to "return true if sucessful within the
  specified time t, otherwise false." You can't really guarantee this, due to
  scheduler delays, or delays in acquiring the auxiliary mutex used in
  the POSIX implementation of timed_mutex. The best you can say is that
  the thread won't wait forever, but will wait at least until time t before
  giving up.

Models table at end of document: The "Refines" column is wrong for every
entry. Every entry should have "Lock" in its "Refines" column.


I'm not sure this document is necessary or desirable. The C++ Standard
doesn't go into separate detail on C::iterator for each container class C; it
just says which concept it models, and any additional needed info is included
in the description of class C. Likewise, all of the information in this
document that users should see belongs either in the description of the
appropriate concept or in the description of the mutex class itself.

The mutex_type typedef is not mentioned in the Lock concept; it should be.

Quote: "Although this class is an implementation detail..."

Comment: Stop right there. Implementation details don't belong in
specifications, except as examples clearly labeled as possibilities only.

Quote: [For "operator const void * () const"] "If the associated mutex is
currently locked, a value convertible to true, else a value convertible to

Comment: Don't you mean "If the associated mutex is currently locked BY THIS
OBJECT..." Also, void * values are NOT convertible to bool, although they can
be used as the test of an if, while, or for, and can be used as an argument to
&&, ||, and !.

Quote: "A const void * conversion is considered safer than a conversion to

Comment: You should explain why.


See comments for scoped_lock.html.


Quote: "...a mutex object modeling a Mutex concept."

Comment: Types model concepts; objects do not.

Quote: "...a lock object modeling a Lock Concept..."

Comment: Types model concepts; objects do not.

Quote: "The mutex must be locked prior to waiting on the condition,
which is ensured by passing a lock object modeling a Lock Concept to
the object's wait functions."

Comment: This doesn't ensure that the mutex is locked, although it does ensure
that the mutex gets unlocked.

Quote: "While the thread is waiting on the condition object, the mutex
associated with the lock is unlocked."

Comment: The use of "While..." would seem to indicate that "is unlocked" is
not the passive form of "lock", but a simple statement of the state of the
mutex. That is, the sentence seems to say that the mutex remains unlocked
during the entire time that the thread is waiting on the condition object.
Suggest "Upon blocking on the condition object, the thread unlocks/releases
the mutex."

Quote: "When the thread returns from a call to one of the condition object's
wait functions, the mutex is again locked."

Comment: Does not make clear who locks the mutex, and is not clear on whether
this happens just before or just after the call. Suggest "The wait functions
reacquire the mutex just before returning."

Quote: "The tricky lock/unlock/lock sequence is performed automatically by the
condition object's wait function."

Comment: Only the unlock/lock part is performed by the wait function.

Quote: "notify_one()... Effects: If there is a thread waiting on *this, change
that thread's state to ready."

Comment: This is not completely accurate. After being unblocked, the thread
then has to reacquire the mutex, which may cause it to block again. Also, the
issue of fairness (lack of starvation) needs to be addressed: in the case
where some thread t is waiting on a condition and there are always multiple
threads waiting when a notify_one() occurs, is there any guarantee that
eventually thread t will be chosen, or may thread t *always* be passed up in
favor of another thread and wait forever on the condition? (I'm not
considering here the issue of subsequently reacquiring the mutex.)

Quote: "void wait(ScopedLock & lock); ... All effects occur in an atomic

Comment: This is not and cannot be true. The initial unlock/suspend can be
done atomically, but becoming unblocked is a two-stage process: (1) the thread
gets awakened by some other thread calling notify_one() or notify_all(); (2)
the thread reacquires the mutex. For conditions, we really need to
distinguish two separate blocking states: blocked on the condition itself, and
blocked on the mutex while attempting to reacquire the mutex.

Quote: "timed_wait ... Throws: lock_error if !lock.locked()"

Comment: Do you mean to guarantee that an excpetion is thrown?

Quote: "bool timed_wait(ScopedLock & lock, const xtime & xt);"

Comment: See my previous reservations about using an absolute time instead of
a duration.

Quote: "timed_wait ... Returns: false if xt is reached, otherwise true."

Comment: You can't really guarantee this. See my previous comments on


Quote: "This includes completion of all thread cleanup handlers,..."

Comment: What thread cleanup handlers? Boost.Threads has no such notion.

Quote: "if a thread object is destroyed without join() having first been
called, the thread of execution continues until its initial function

Comment: This sounds scary. Another possibility is to have the thread dtor
call join(). If you think the existing semantics are best, it would be nice
to have a section in rationale.html explaining your reasoning.

Quote: "thread(const boost::function0<void> & threadfunc);"

Comment: I would like to see a hyperlink here to a description of the
function0 template. I am not familiar with the function0 template, so I have
to ask: does the const prevent me from using a function object whose state is
intended to be modifiable by the newly-created thread?

Quote: "Comparison Operators..."

Comment: Gives requirements on *this (left-hand side of ==), but not on rhs.

Quote: "static void sleep(const xtime & xt);"

Comment: As previously mentioned, I have doubts about the use of absolute time
instead of duration. Again, how does sleep() know the appropriate
clock_type/epoch to use?


Quote: "void remove_thread(thread* thrd);
        Effects: Removes *this's list of managed thread objects.
        Throws: ? if thrd is not it *this's list of managed thread objects."

Comment: What is the "?" for? For effects, don't you mean "Removes thrd from
 *this's list of managed thread objects"?


Why does call_once() take a void (*func)() argument instead of a function


Quote: "Note: There is an implementation specific limit to the number of
thread specific storage objects that can be created, and this limit may be

Comment: This note is not very useful without *some* way of knowing what the
limit may be. Perhaps there should be a const static value giving the limit,
which will vary from one platform to another. It would be nice to know a
lower bound on this limit (i.e., you're guaranteed at least 5 thread-specific
pointers), and it would be nice to know the limit (or a reasonably tight lower
bound) on the most common platforms.

Quote: "Destructor... Does not destroy any data that may be stored in any
thread's thread specific storage. For this reason you should not destroy a
thread_specific_ptr object until you are certain there are no threads running
that have made use of its thread specific storage."

Comment: This sounds unsafe and prone to resource leaks. A rationale is
needed here for why you consider this better than having thread_specific_ptr
be like auto_ptr or scoped_ptr (dtor automatically deletes pointer).


Why not provide the obvious two-argument ctor for xtime? It would still be
compatible with the C version of xtime.

Quote: "xtime_get(struct xtime * xtp, int clock_type);"

Comment: It's not crystal clear what clock_type is. I gather that it's
supposed to be one of the enums in the synopsis, but this isn't explicitly
stated anywhere.


Quote: "class lock_error : public std::runtime_error"

Comment: A lock_error is not a runtime error. It is an error which it is
reasonable to expect the programmer to avoid. Since it is a programmer error,
it should derive from std::logical_error.

I would recommend that exceptions indicating programmer errors include the
file name and line number where the error was detected, to aid in debugging.

  class lock_error : public std::logical_error
    char const * file_;
    char const * line_;
    lock_error(char const * file, char const * line)
    : logic_error(string("Lock error at file: ") + file + ", line: " + line)
    { }
    char const * file() { return file_; }
    char const * line() { return line_; }

  throw lock_error(__FILE__, __LINE__); // Perhaps define a macro


You should make clear whether these macros are intended to be used as

  #ifdef MACRO_NAME



Boost list run by bdawes at, gregod at, cpdaniel at, john at