Boost logo

Boost Users :

Subject: Re: [Boost-users] [thread] boost::call_once() optimization idea on Win32
From: Gottlob Frege (gottlobfrege_at_[hidden])
Date: 2010-11-17 23:13:03


On Wed, Nov 17, 2010 at 11:15 AM, Thomas Jarosch
<thomas.jarosch_at_[hidden]> wrote:
> Hello,
>
> here's a small optimization idea for boost::call_once() on Win32.
>
> It should be safe to do a non-interlocked read of the flag like this:
> ----------------------------------------------
> if(flag!=function_complete_flag_value &&
>  ::boost::detail::interlocked_read_acquire(&flag)!=function_complete_flag_value)
> ----------------------------------------------
>
> If our non-interlocked read would see something different than
> "function_complete_flag_value", we still do the interlocked read, too.
> For normal operation inside a thread safe singleton,
> this saves us from always issuing a memory barrier.
>
> Any flaws with this approach?
>

Yes.

Your thread will see flag correctly, but the "important data" being
protected/initted by the once, might not bee seen correctly.

Typically the code essentially boils down to (if you imagine the
call_once being "inlned", since to the CPU, it is all "inline"):

First Thread in ("wins", thus does the init)
   important_data = new ...; // the init call, protected by call_once()
   flag = function_complete_flag_value;

Second Thread:
   if (flag == function_complete_flag_value) // inside call_once()
       use_important_data(important_data); // outside call_once()

The CPU (not just the compiler, but the CPU, memory subsystem, etc)
can reorder that as:

First Thread:
   flag = function_complete_flag_value;
   important_data = new ...;

Second Thread:
   register temp = important_data; // start memory read early, so
that we don't wait for reads to complete
   if (flag == function_complete_flag_value)
       use_important_data(temp);

See the problem? important_data may be read before it is ready.

You need a memory barrier *in each thread* to ensure neither
reordering happens. In particular, the memory barrier in the
interlocked_read_acquire(&flag) prevents the reordering in the Second
Thread.

>
> Also one more (silly?) question: "flag" is not a volatile variable.
> Does boost::detail::interlocked_read_acquire() make sure the value
> doesn't get cached inside a register? IMHO we lock the mutex
> and we might still hold a cached, old value of "flag" in a register.
> -> Do we need an interlocked read here, too?
>   Or mark the flag type "volatile"?
>

volatile is almost useless is threaded programming. It is typically
both insufficient for threads (ie no memory barrier) and at the same
time superfluous - when inside a mutex - as the mutex handles the
barrier for you.

volatile only helps with the compiler - it doesn't control what the
CPU might then do to your instructions (like reorder them, do
speculative execution, etc), so it doesn't help much with threads
running on separate CPUs. And MS's version of volatile (which _does_
enforce memory barriers) is non-standard. So don't use it in portable
code. ie don't use it at all.

> (http://msdn.microsoft.com/en-us/magazine/cc163405.aspx#S3
>  look at figure 7 and the text below)
>
>
> Best regards,
> Thomas Jarosch
>

Tony

P.S. I will hopefully be doing another talk on this stuff at BoostCon
in May - you should go!


Boost-users list run by williamkempf at hotmail.com, kalb at libertysoft.com, bjorn.karlsson at readsoft.com, gregod at cs.rpi.edu, wekempf at cox.net