Boost logo

Boost :

From: Roland Schwarz (roland.schwarz_at_[hidden])
Date: 2006-10-29 10:44:55

Triggered by the question how to do
static initialization of a mutex, I
tried to write one that is.

The most basic initialization that
wil be performed by the compiler, is
zero initialization of static (POD)
objects. This kind of initialization
will take place before any other

Consequently the mutex is initialized
by its mere declaration. However this
does come at a price:

1) Since the mutex has no ctor, one
must not forget to initialize it when
using it as a member of some containing
object. But this is the same as with
other POD's.

2) Altough the mutex is copyable this does
not make any sense. The identifying
property of a mutex is its memory address.
Trying to lock a copied mutex will get
you in trouble. It is not possible to
protect against such usage since we have
no ctor that could be declared private.
On the other side you can make a object
that contains a mutex copyable by simply
reinitializing it to zero, which is not
possible with the current mutex, since
copying is strictly prohibited.

I appended to this mail my first attempt
to implement such a mutex . The code is
commented and a pingpong test is included
as example. So far this is only for windows.

I would be very interested to get feedback
about this idea.


// Copyright (C) 2006 Roland Schwarz
// Distributed under the Boost Software License, Version 1.0. (See accompanying
// file LICENSE_1_0.txt or copy at

#ifndef MUTEX_HPP__
#define MUTEX_HPP__

class scoped_lock;
class semaphore;

typedef scoped_lock* volatile mutex;

class scoped_lock
    scoped_lock(mutex& m, bool initially_locked = true);
    void lock();
    void unlock();
    bool is_locked;
    mutex* pmutex;
    scoped_lock* next_lock;
    scoped_lock* pred_lock;
    semaphore* volatile next_flag;
    semaphore* volatile sema_flag;
    semaphore* sema;
    void set(semaphore* volatile* pflag);
    void wait_for(semaphore* volatile* pflag);


// Copyright (C) 2006 Roland Schwarz
// Distributed under the Boost Software License, Version 1.0. (See accompanying
// file LICENSE_1_0.txt or copy at

// This is an experimental implementation of a mutex
// that is of type POD. I.e. typedef scoped_lock* volatile mutex;
// An unowned mutex has value 0. Consequently there is no need
// for dynamic initialization, which solves the static
// initialization problem.
// Furthermore it also is not necessary to e.g.
// mutex mx = 0; because static objects are zero initialized
// by default, before any other initializations take place.
// As such, availability of this mutex type makes a once
// function unnecessary.
// Memory and resource management is fully automatic. I.e.
// no leaks under normal usage. (In the basic version no cleanup
// in the atexit handler is required.) The implementation needs
// availability of some atomic primitives and semaphores
// from the operating system.
// The required data structures are all within a scoped_lock
// object. Destruction of the lock also frees up the resources.
// Typical usage:
// mutex mx;
// void foo()
// {
// scoped_lock lk(mx);
// ...
// }
// How does it work:
// The scoped_lock object contains a doubly linked list.
// The mutex is pointing to the head of this list. When
// the mutex is zero the list ist empty and the mutex
// unowned.
// Locking the mutex means putting a list entry to the
// head of the list. This is done atomically. Concurrent
// requests are queued up, and each put on a separate
// semaphore. This scheme already could be used by itself
// (see e.g mcs-locks in win32 pthreads), however it
// imposes a FIFO scheduling behaviour.
// This implementation instead uses this queue only during
// the build up, and then hands over the waiting to a
// shared semaphore, to let the system scheduler wake up
// the threads. The doubly linked list makes it possible
// to remove list items from the list in an arbitrary
// order.
// Performance:
// I did only a minimalistic "pingpong" test, which showed
// comparable performance to the current boost::mutex.
// I expect that the performance can be improved, if the
// semaphore objects are not completeley destroyed after
// temporary use, but instead put in a lock free container
// for later reuse. This however would raise the requirement
// to destroy them during atexit.
// Missing parts:
// I havent tried yet if it is possible to also provide
// try and timed lock variants.
// Altough I tried to take care of the necessary memory
// barriers for SMP, I am not convinced yet that I got it
// straight in this first attempt.
// Also I have not yet been able to prove the correctness of
// the algorithm.
#include "mutex.hpp"

// The operating system dependent part
#include <windows.h>
#include <limits.h>
#include <cassert>

template<class T>
inline T* atomic_swap_full(T* volatile * p, T* a)
        return (T*)InterlockedExchangePointer(p,a);

template<class T>
inline T* atomic_load_full(T* volatile *p)
        return (T*)InterlockedExchangeAdd((LPLONG)p,0);

template<class T>
inline bool atomic_bool_compare_and_swap_full(T* volatile *p, T* c, T* s)
        return (c == (T*)InterlockedCompareExchange((LPLONG)p,(LONG)s,(LONG)c));

template<class T>
inline T* atomic_compare_and_swap_full(T* volatile *p, T* c, T* s)
        return (T*)InterlockedCompareExchange((LPLONG)p,(LONG)s,(LONG)c);

class semaphore
    // The open and close functions could be extended
    // to put the closed items onto a lock free
    // container instead.
    static semaphore* open(int count = 0) {
        return new semaphore(count);
    static void close(semaphore* sema)
        delete sema;
    void post() {
        ReleaseSemaphore(sem, 1, NULL);
    void wait() {
        WaitForSingleObject(sem, INFINITE);
    semaphore(int count) {
        sem = CreateSemaphore(NULL, count, LONG_MAX, NULL);
    ~semaphore() {
    HANDLE sem;

// The following should be operating system independent
scoped_lock::scoped_lock(mutex& m, bool initially_locked)
: is_locked(false)
    pmutex = &m;
    if (initially_locked)

    if (is_locked)

void scoped_lock::lock()
    next_lock = 0;
    pred_lock = 0;
        sema_flag = 0;
        next_flag = 0;

    scoped_lock* pred = atomic_swap_full(pmutex, this);

    if (0 != pred) {
        pred_lock = pred;
        pred->next_lock = this;
        // pass on the shared semaphore
        sema = pred->sema;
    else {
        // create the shared semaphore
        sema = semaphore::open();
    is_locked = true;

void scoped_lock::unlock()
        scoped_lock* n = atomic_load_full(&next_lock);
    if (0 == n) {
        if (0 != pred_lock) {
            pred_lock->next_lock = 0;
            pred_lock->next_flag = 0;
        if (atomic_bool_compare_and_swap_full(pmutex,this,pred_lock)) {
            if (0 == pred_lock) {
                // We have no predecessor and are the last
                // to hold this semaphore. Sucessors wil get
                // a fresh semaphore for this mutex.
                // Release it.
            else {
                // We have been removed from the head, but have
                // a predecessor to wake up.
            is_locked = false;
                n = atomic_load_full(&next_lock);
    else {
    if (0 != pred_lock)
        pred_lock->next_lock = n;
    n->pred_lock = pred_lock;
    is_locked = false;

void scoped_lock::set(semaphore* volatile* pflag)
    // If unflagged the value is 0.
    // The special value -1 is getting used when set is called
    // before a wait takes place. Since this is an illegal memory
    // address be careful not to call set again without having
    // reset the flag in between!
    semaphore* f = atomic_compare_and_swap_full(pflag,(semaphore*)0,(semaphore*)-1);
    if (0 != f) {

void scoped_lock::wait_for(semaphore* volatile* pflag)
    if (0 == atomic_load_full(pflag)) {
        semaphore* f = semaphore::open();
        if (atomic_bool_compare_and_swap_full(pflag,(semaphore*)0,f)) {

// Copyright (C) 2006 Roland Schwarz
// Distributed under the Boost Software License, Version 1.0. (See accompanying
// file LICENSE_1_0.txt or copy at

#include "mutex.hpp"

#include <iostream>
#include <ctime>
#include <boost/thread/thread.hpp>
#include <boost/thread/mutex.hpp>
#include <boost/thread/condition.hpp>
#include <boost/bind.hpp>

// set to 0 if you want to test against boost::mutex
#if 1
typedef mutex test_mutex;
typedef scoped_lock test_lock;
typedef boost::mutex test_mutex;
typedef boost::mutex::scoped_lock test_lock;

test_mutex mtx0;
test_mutex mtx1;
test_mutex mtx2;
test_mutex mtx3;

boost::mutex smx;
boost::condition cs;
bool start = false;

void pingpong(long int turns)
    test_lock lk0(mtx0, false);
    test_lock lk1(mtx1, true);
    test_lock lk2(mtx2, false);
    test_lock lk3(mtx3, true);
    boost::mutex::scoped_lock lkstart(smx);
    start = true;

    for (long int i=0; i<turns; ++i) {

// this needed about 30 sec on my machine
const long int N = 1000000;

int main(int argc, char* argv[])
    // pingpong timing test

    test_lock lk0(mtx0, true);
    test_lock lk1(mtx1, false);
    test_lock lk2(mtx2, true);
    test_lock lk3(mtx3, false);
    boost::mutex::scoped_lock lkstart(smx);
    boost::thread thd(boost::bind(pingpong, N));
    while (!start) cs.wait(lkstart);

    std::time_t t0 = std::time(0);

    for (long int i=0; i<N; ++i) {
    std::time_t t1 = std::time(0);

    std::cout << "time: " << t1-t0 << std::endl;

    return 0;

Boost list run by bdawes at, gregod at, cpdaniel at, john at