Boost logo

Boost :

Subject: Re: [boost] [thread] Can Boost.Thread use Boost.Atomic without falling on a compatibility issue?
From: Vicente J. Botet Escriba (vicente.botet_at_[hidden])
Date: 2013-01-13 09:57:35

Le 12/01/13 18:29, Andrey Semashev a écrit :
> On Saturday 12 January 2013 17:49:38 Vicente J. Botet Escriba wrote:
>> Le 12/01/13 17:09, Andrey Semashev a écrit :
>>> On Saturday 12 January 2013 15:48:21 Vicente J. Botet Escriba wrote:
>>>> Le 12/01/13 14:26, Andrey Semashev a écrit :
>>>>> On Saturday 12 January 2013 12:08:10 Vicente J. Botet Escriba wrote:
>>>>>> Le 12/01/13 11:51, Andrey Semashev a écrit :
>>>>>>> Anyway, can Boost.Thread be modified in such a way so that
>>>>>>> Boost.Atomic
>>>>>>> use is not exposed to the user? E.g. so that call_once invokes a
>>>>>>> compiled
>>>>>>> function implemented within Boost.Thread library that uses
>>>>>>> Boost.Atomic
>>>>>>> to modify the once flag.
>>>>>> yes, this will be great. I don't know Boost.Atomic details to try to do
>>>>>> this. Andrey do you mind to provide a patch that doesn't needs to link
>>>>>> with boost_atomic?
>>>>> I attached the patch (for posix only). It appeared a bit hacky and I'm
>>>>> not
>>>>> sure if you're ok with it.
>>>> Thanks for the quick patch.
>>>> Shouldn't the patch concern only boost/thread/pthread/once.hpp and
>>>> libs/thread/src/pthread/once.hpp?
>>>> Could you send the resulting files also?
>>> There is no libs/thread/src/pthread/once.hpp file as far as I can see.
>> RIght. I meant
>> libs/thread/src/pthread/once.cpp
> It is patched.
>>> The
>>> boost/thread/pthread/once.hpp file is no longer needed and can be removed
>>> (as well as boost/thread/win32/once.hpp when win32 version is
>>> implemented). The complete public code is platform-independent and
>>> resides in
>>> boost/thread/once.hpp after the patch is applied.
>> IIRC the problem was on the Posix implementation, so a specific patch
>> for the pthread files will be desirable.
> Ok, I intended to port both Windows and POSIX implementations but we could do
> it just for POSIX variant.
> I attached the updated patch to the ticket. The Jamfile will also have to be
> updated to add the dependency on Boost.Atomic. However, the Jamfile in
> Boost.Thread is rather complicated so I didn't do that.
>> Anyway, could you send the resulting files to make easier the review?
> The files are attached. These should be boost/thread/posix/once.hpp and
> libs/thread/src/posix/once.cpp, other files are intact.

I have studied your patch and I like how you have separated the test on
whether the function has been called, the commit and the rollback. I
believe I will adopt this separation.
I have studied also the current algorithm implementation and the
algorithm on which it was based on [1].

The efficiency of the algorithm in [1], depends on, I quote "

    There is a way to access thread-specific data or thread-local
    storage. The technique is useful only if such an access is faster
    than a memory barrier. For the disassembly above, the compiler used
    its ability to access thread-local storage using variables declared
    with __thread, so the thread-local access is a normal "mov"
    instruction with a different segment register.
    There exists an integral type T (such as sig_atomic_t) such that
    loads and stores of a variable of type T are atomic. It is not
    possible to read from such a variable any value that was not written
    to the variable, even if multiple reads and writes occur on many
    processors. Another way to say this is that variables of type T do
    not exhibit word tearing when accessed with full-width accesses.
    The number of pthread_once_t variables in a programme is bounded and
    can be counted in an instance of type T without overflow. If T is
    32-bits, this implementation requires that the number of
    pthread_once_t variables in a programme not exceed 2**31-3. (Recall
    that pthread_once_t variables are intended to have static storage
    class, so it would be remarkable for such a huge number to exist.)
    Accesses to a single variable V of type T by a single thread X
    appear to occur in programme order. For example, if V is initially
    0, then X writes 1, and then 2 to V, no thread (including but not
    limited to X) can read a value from V and then subsequently read a
    lower value from V. (Notice that this does not prevent arbitrary
    load and store reordering; it constrains ordering only between
    actions on a single memory location. This assumption is reasonable
    on all architectures that I currently know about. I suspect that the
    Java and CLR memory models require this assumption also.)

The current implementation uses pthread_getspecific/pthread_setspecific
which I'm not sure "is faster than a memory barrier".
The type T is unsigned long which doesn't satisfy [atomicity] and
[monotonicity] requirements on all platforms.

The Boost.Thread should be changed so that the current algorithm is used
only when the requirements are meet (this will mean at least to change
the implementation of the thread specific data and that the datatype is
adapted to the platform). So, to be portable we need an alternative
algorithm, which could be the one you (Andrey) proposed in the patch. As
the patch uses Boost.Atomic, this alternative will be really efficient
only when atomic<uchar_t> is lock free.

libc++ implements it using an algorithm that is close to the double
checked locking. It reuse the mutex needed to wait on the condition
variable to protect the whole data and in addition making a non
protected test on the flag state (see ********)

template<class _Callable> void call_once(once_flag& __flag, _Callable
     if (__flag.__state_ != ~0ul) // **********
         __call_once_param<_Callable> __p(__func);
         __call_once(__flag.__state_, &__p,
&__call_once_proxy<_Callable>); // Here the flag is tested again with
the mutex locked.

This algorithm is efficient but I suspect that this algorithm is not
always correct for the same reasons the Boost.Thread current algorithm
is not. But on which platforms the libc++ implementation is correct?
Howard, could you comment if you are reading this post.

For the time been I will play with the Andrey, the Howard (libc++)
implementation and update the current implementation for the thread
specific data. I will add some compiler flags that will direct to one of
these implementations and make the best choice depending on the

Many thank Andrey,

[2] and

P.S. It is surprising that gcc implementation in libstdc++v3 needs to
initialize a lot of data before using __gthread_once and seems to don't
manage the case when the user function throw an exception.

   extern "C"
     void __once_proxy()
       function<void()> __once_call = std::move(__once_functor);
       if (unique_lock<mutex>* __lock = __get_once_functor_lock_ptr())
         // caller is using new ABI and provided lock ptr
         __get_once_functor_lock_ptr() = 0;
         __get_once_functor_lock().unlock(); // global lock

   /// call_once
   template<typename _Callable, typename... _Args>
     call_once(once_flag& __once, _Callable&& __f, _Args&&... __args)
       auto __bound_functor =
       __once_callable = &__bound_functor;
       __once_call = &__once_call_impl<decltype(__bound_functor)>;
       unique_lock<mutex> __functor_lock(__get_once_mutex());
       auto __callable = std::__bind_simple(std::forward<_Callable>(__f),
       __once_functor = [&]() { __callable(); };

       int __e = __gthread_once(&(__once._M_once), &__once_proxy);

       if (__functor_lock)

       if (__e)

Boost list run by bdawes at, gregod at, cpdaniel at, john at