Boost logo

Boost Users :

Subject: Re: [Boost-users] [thread] boost::call_once() optimization idea on Win32
From: Thomas Jarosch (thomas.jarosch_at_[hidden])
Date: 2010-11-18 05:50:55


Hello Tony,

On Thursday, 18. November 2010 05:13:03 Gottlob Frege wrote:
> The CPU (not just the compiler, but the CPU, memory subsystem, etc)
> can reorder that as:
>
> First Thread:
> flag = function_complete_flag_value;
> important_data = new ...;
>
> Second Thread:
> register temp = important_data; // start memory read early, so
> that we don't wait for reads to complete
> if (flag == function_complete_flag_value)
> use_important_data(temp);
>
> See the problem? important_data may be read before it is ready.

Thanks for the detailed explanation! As you pointed out, the second thread
will mess up without the barrier. Time to fix my code :)

One more small thing: IMHO the write reordering in the first
thread can't happen because of the memory barrier created
by interlockedExchange(). The first thread translates to this:

    important_data = new ...; // the init call

    lock // create memory barrier:
                               // - Don't allow reordering
                               // from below the barrier
                                                           // - Finish all outstanding writes

    flag = function_complete_flag_value;

> > Also one more (silly?) question: "flag" is not a volatile variable.
> > Does boost::detail::interlocked_read_acquire() make sure the value
> > doesn't get cached inside a register? IMHO we lock the mutex
> > and we might still hold a cached, old value of "flag" in a register.
> > -> Do we need an interlocked read here, too?
> > Or mark the flag type "volatile"?
>
> volatile is almost useless is threaded programming. It is typically
> both insufficient for threads (ie no memory barrier) and at the same
> time superfluous - when inside a mutex - as the mutex handles the
> barrier for you.

True that.

> volatile only helps with the compiler - it doesn't control what the
> CPU might then do to your instructions (like reorder them, do
> speculative execution, etc), so it doesn't help much with threads
> running on separate CPUs. And MS's version of volatile (which _does_
> enforce memory barriers) is non-standard. So don't use it in portable
> code. ie don't use it at all.

Let me illustrate it a bit more:

- register temp = interlocked_read(&flag) // fetch from mem location xyz
  -> Init flag not set, so execute init code:

  - create mutex (also creates memory barrier)

  - Another thread 2 already entered the mutex,
    executes the init code and does an interlocked
    write of "flag". Then it leaves the mutex
    so thread 1 can continue.

  - Thread 1 re-reads the flag without interlocked read or volatile:
    The compiler recognizes it's the same memory location xyz
    and uses the cached value from "register temp".
    So we would redo the initialization.

    Don't we need an interlocked read here, too?

    Or does the mutex/memory barrier ensure the compiler
    isn't allowed to do register caching?

> P.S. I will hopefully be doing another talk on this stuff at BoostCon
> in May - you should go!

Nice! Too bad "www.boostcon.com" currently issues
a "500 - Internal Server Error" :o)

Best regards,
Thomas Jarosch


Boost-users list run by williamkempf at hotmail.com, kalb at libertysoft.com, bjorn.karlsson at readsoft.com, gregod at cs.rpi.edu, wekempf at cox.net