Boost logo

Boost :

From: John Maddock (John_Maddock_at_[hidden])
Date: 2000-06-05 06:31:34

Hi all,

I've been following the comments on multi-threading recently with interest,
as it's something that's important to me, I've put together a set of design
principles (below) that could serve as a "straw man list" starting point


I'm going to shamelessly ape Dave Abraham's exception guarantees here (I
hope you'll forgive me Dave!) - the idea is to present a coherent set of
thread safety descriptions so that we can all talk a common language :-)

Level 0:
At this level using unrelated instances of an object in separate threads is
not thread-safe, the objects can not be made thread safe even by applying
an external lock (typically this effects objects that share unprotected
non-constant global data).

Level 1:
Unrelated instances of the object can be safely used in separate threads
(by unrelated I mean "not a copy of"). Objects that meat this guarantee
have no unprotected non-constant shared global data, but may be mutable.

Level 2:
(weak guarantee)
Unrelated instances of an object can be used in separate threads.
An object may be shared between thread without external locking if all the
threads that operate on the object treat it as constant (access no mutable
data, call no non-constant member functions).
An shared object treated as non-constant by any one thread, must be
protected by an external lock for all accesses to that object (const and
non-constant) by all threads.

Level 3:
(strong guarantee)
Objects in this category may be shared between threads without any external
locking being required.


In addition we can define sub-levels for startup/exit code guarantees:

Level 1/2/3-
(startup unsafe)
The lifetime of all objects in this category, must be enclosed by the
lifetime of main(). Constructing an object in this category prior to the
start of main(), or destroying after main() has returned will (probably)
crash your code.

Level 1/2/3a
(startup single threaded)
Objects in this category assume that prior to the start of main() and after
main() has returned there is only one thread active: in other words, no
race conditions can occur during program startup/shutdown. If the object
contains it's own locks then these can be initialised on demand - provided
initialisation occurs at some point before main() is called.

Level 1/2/3b
(startup multi thread safe)
Objects in this category, assume that there are multiple threads active at
any point after user code starts executing. If the object has to protect
globals then the locks used must be statically initialised (before user
code executes).


Some examples are in order at this point:

boost shared pointers are level 1b,
builtin types are level 2b.

In fact most C++ classes are level 2b, unless:
They share non-constant global data - in which case depending upon the lock
used (and how it is inititialised), the suffix moves down from 'b' to 'a'
or '-'.
They contain mutable data - in which case the they move down to level 1.
They contain an object that offers a lower level guarantee.

Note that a class may contain mutable data if it has a mutable-qualified
member, or if it contains a pointer or reference to data (because
const-qualifying the object does not const-qualify the data it aliases).
Anything reference counted, along with all pimpl implementations fall into
this category (unless careful manual checking indicates otherwise).



At it's simplest, a lock is an object, that allows a section of code to be
protected so that only one thread at a time (or a resticted set of threads)
can execute that section of code. Most existing practice (including ACE)
uses a scoped guard object to protect a section of code, a typical example
would look something like this:

lock_type l;
int count;

void foo()
   lock_type::guard g(l);

However there are a number of possible defects with this idiom:

Issue 1:
The variable g is never used.
I'm not actually sure whether this is a problem or not, but in principle
the compiler could decide that variable g has no observable effect and
remove it altogether. However there is a variation on this idiom that
overcomes the problem, and lets us add an optional timeout to the guards

lock_type l;
int count;

void foo()
   lock_type::guard g(l);
   else throw timeout_exception(); // could carry on with local processing

Issue 2:
Compiler instruction reordering:
As far as the compiler is concerned operations on the lock object are
independent of those on the protected data, the compiler is therefore free
to reorder the assembler it produces so that the guard objects destructor
is over before the protected code has finished executing. However if the
member functions of the guard object are defined as extern (not defined in
the current translation unit) then the compiler has no information on what
these functions will do, or what data they will access, and therefore
*must* emit assembler that follows the semantics of the abstract machine.

Issue 3:
Hardware out of order execution:
Most modern hardware can now execute instructions in parallel or "out of
order", That means that when the guard objects destructor executes there
may still be instructions that have been scheduled, but not executed.
Fortunately this is the right place to deal with this - this method must
already contain platform specific code - so this is the right place to deal
with hardware specific issues. If the destructor simply calls platform
specific locking API's then presumably these already take care of this
issue. Similar arguments can be applied to the guard objects constructor.

Issue 4:
Dangling braces:
The main problem with the scoped guard object is that the extent of the
lock may not coincide with the extent of the scope, one solution is to use
"dangling braces":

void foo()
   lock_type::guard g(l);
   else throw timeout_exception(); // could carry on with local processing
   /* do something unlocked here: */

Here the extra braces control scope, but are a maintenance nightmare, a
programmer new to the code may well wonder why they are present and remove
them. At best this will impair performance, at worst we may get a deadlock.
 (Thanks to the ACE docs for bringing this one to light).
Solution: add an "acquire" method to the guard object that lets us control
the locks duration (an analogous example would be "reset" on the smart

void foo()
   lock_type::guard g(l);
   else throw timeout_exception(); // could carry on with local processing
   /* do something unlocked here: */
      /* do something locked here */
   else throw timeout_exception(); // could carry on with local processing

Issue 5:
Unnamed variable bug:
One problem with the scoped variable method is that it is too easy to
forget to name the guard variable:

void foo()

Which of course doesn't work, however the requirement to use a subsequent
"if" statement more or less knocks this one on the head.

Issue 6:
Generic thread coding:
One common idiom may be to add a "threaded" bool template parameter to
template classes, when true the object is thread safe, when false its
thread unsafe (or offers a lower guarantee - see smart pointers for
example). Although we could provide a partial specialisation for the
thread safe version this is a maintenance problem, better to parameterize
the lock:

template <bool threaded>
class mylock;

now we can use mylock<false> for non-thread safe code, and let the compiler
optimize away the "do nothing" code, or mylock<true> for the thread safe
version. One alternative to this would be to provide a template wrapper
implementing this idiom for existing non-template locks.

Issue 7:
Emulation slowdown:
There are many different kinds of locks, only some of which will be
natively available on a given platform. Although given a small number of
basic locks, most other operations can be emulated, sometimes the
emulations can be so slow that an alternative method would be better
employed. I would like to see an "is_specialized" member of each lock that
is true only if that lock type is implemented on that platform in an
efficient manner. This allows for "compile time discovery" of the
platforms characteristics in a manner anologous to numeric_limits. Lets
suppose we have an read-write lock that is only implemented if there exists
an efficient implementation on the current platform, we can now choose (at
compile time) between a reader/writer lock if it's available or revert to a
traditional lock if its not.


Putting these constraints together, here is a tentative example of what
some arbitrary lock type might look like:

// do nothing primary template:
template <bool threaded>
class lock_type
        struct guard
                guard(const lock_type&, unsigned timeout = UINT_MAX){}
                bool acquire(bool get, unsigned timeout = UINT_MAX){ return
true; }
                operator void*() { return this; }
// specialisation:
template <>
class lock_type<true>
        static const bool is_specialized = false;
        struct guard
                guard(const lock_type&, unsigned timeout = UINT_MAX);
                bool acquire(bool get, unsigned timeout = UINT_MAX);
                operator void*();
        /* details */


Although there are a lot of possible lock types there would seem to be
three that are key and should be available on all platforms:

static lock: need not be re-entrant, but must be an aggregate type to allow
for static initialisation.
thread lock: should be re-entrant and default constructable.
process lock: should be re-entrant and sharable between processes, the
constructor should take a string to act as a name for the lock.


Although not available on all platforms the performance gains available
from native atomic operations can not be ignored, the question is which
operations? The following class may provide a starting point for an

// primary template, not thread safe:
template <class T>
struct atom_traits
        static const bool is_specialized = false;
        // read:
        static const bool has_read = false;
        T read(volatile T* atom)
        { return *atom; }

        // write:
        static const bool has_write = false;
        void write(volatile T* atom, const T& value)
        { *atom = value; }

        // add:
        static const bool has_add = false;
        T add(volatile T* atom, const T& value)
        { return *atom + value; }

        // test and set:
        static const bool has_tas = false;
        bool tas(volatile T* atom)
        { bool result = (0 == *atom); *atom = 1; return result; }

        // test and reset:
        static const bool has_tar = false;
        bool tar(volatile T* atom)
        { bool result = (0 != *atom); *atom = 0; return result; }

        // compare and exchange:
        static const bool has_cxc = false;
        bool cxc(volatile T* atom, T* with, const T& value)
                if(value == *atom)
                        std::swap(*atom, with)
                        return true;
                return false;


I hope this can provide some kind of starting point. I've deliberately
stayed away from issues like reader/writer locks, events, thread creation
and management etc, likewise I've stayed away from implementation details -
most of these are already reasonably well known (in ACE for example) - even
so I may have taken this too far too soon, whatever I'm sure you guys will
let me know what you think (and what I've missed!).

All the best,

- John.

Boost list run by bdawes at, gregod at, cpdaniel at, john at