On 1/23/07, Ovanes Markarian <om_boost@keywallet.com> wrote:

Actually I read all your and Tony's points and may be I was misunderstood.

You were not misunderstood at all. I've gone down the same road as you. More than once. With various techniques, including this create_getter vs non_creating_getter idea.

My first question is:

If mutex does not guarantee thread safety what then?

It only guarantees thread safety when used for ALL accesses of the shared variables. Not just on the write of the shared variables. You need it for both read and write. Not just because the shared variable may change 'while' you are reading it, but because it may have changed, but your processor hasn't 'seen' those changes yet, even though it has seen other changes that happened 'before' the shared variable changed. This is the seeming paradox of DCLP and modern CPU architecture.

//creative get

Singleton* Singleton::creating_singleton_getter()
{
        boost::mutex::scoped_lock   lock(s_m); //allways called when entered

                                                           //all other calls
to this function
                                                           // are blocking,
so it is not possible
                                                           // to enter this
function twice if lock is active
        if(Singleton::pInstance == NULL)
                Singleton::pInstance = new Singleton; //does not matter how
these steps are executed
                                                                  // and
reordered by compiler, since the function
                                                                  // can
only be entered when s_m is unlocked
        Singleton::getter = &non_creating_getter;   //this is still guarded
by locked mutex!!!

No, getter is not 'still guarded'. As soon as it is set, another thread can now start using the non_creating_getter. What if the compiler DID reorder the instructions?:

        Singleton::getter = &non_creating_getter; // line 1
        if(Singleton::pInstance == NULL)
                Singleton::pInstance = new Singleton; // line 2

certainly in this case, between line 1 and 2, another thread could come in and start using non_creating_getter too early.

Now, imagine that it wasn't the compiler that reordered the lines, but instead the processor (ie using speculative exection). Or not the processor, but the memory bus. That's what happens. They will still appear in order for the one processor, but not necessarily for another processor. Worse, it depends on the platform, so this bug is not yet very visible, and that's why we have so much code relying on it working. So much that I'm surprised that chip makers even consider allowing the reordering to happen - I would expect it to break too much code.

Similarly, by the way, you can even be sure the pointer pInstance is seen to be set before the bytes of Singleton that it points to are seen to be written!

} //mutex unlock

So let's tighten the mutex boundary:

Singleton* Singleton::creating_singleton

_getter()
{
   {
      boost::mutex::scoped_lock lock(s_m);

   //aquire mutex here
   if(Singleton::pInstance==NULL)
   Singleton::pInstance = new Singleton;
   }

Singleton::getter = &non_creating_singleton_getter;
return Singleton::pInstance;
}

Now the mutex is unlocked before getter is set - this puts a write barrier between the 2 instructions - which means that THIS processor (and its memory-handler queue) will NOT change the order of when getter is set. In effect, it flushes the memory-write-request queue before getter is written (or, more accurately, the request to write the global memory for getter, is placed in the write-queue).

And this is where it gets fuzzy for me - from my understanding, it requires the other processor (where some other thread is running) to queue up 2 read requests:
- 'read getter please'
- 'read the bytes of the new Singleton'
and then have those requests reordered.

The oddity being why would the second request be in the queue before the first request was answered - ie the second request *depends* on the answer of the first. I can only imagine that this happens because of 2 reasons:
   - speculative execution - the CPU could see that it was 'probably' going to read pInstance regardless of getter (which seems more plausible in the traditional DLCP case where getter is just a flag, then an checked in an if, so the CPU can easily look ahead).
   - the CPU (or memory controller) had recently read and cached the memory where pInstance points, and didn't feel a need to re-read it (ie there where no obvious dependencies and/or no reason that the memory should be different since the last time it read or wrote that memory). Basically, the idea here is that the CPU, as a single CPU, is consistent - it is only inconsistent in the presence of other CPUs, and it depends on the architecture as to whether those inconsistencies are allowed to exist or not.

And this is where/when you need to start asking on comp.programming.threads, but I suspect they'll tell you (with better detail and understanding) the same thing - it just doesn't work without a read barrier on the other threads.

So the point is: As long as Singleton::instance is called from multiple
threads and these are not created from global vars before main is called,
this code should be thread safe.

I'm not sure what you are saying about before main, etc. If you are just concerned about creating_getter being initially set properly, I agree you are probably OK, since it is static initialization. My only concern there would be, as mentioned, with Singletons inside DLLs / shared libraries - I don't think loading shared libraries is thread safe under linux (which boggles my mind, but that's what I've heard). And the standard doesn't say anything about shared libraries.

The scenario is like this:

Threads:

  A              B              C                D
instance       instance       instance                     //only A, B or C
will get access to instance, other will wait
                                               instance    // if creating
get was successful, D calls the lightweigt version
                                                           //  of getter

The scenario is that D reads the 'new' getter, but still manages to read the 'old' (uninitialized) Singleton, because of crazy modern memory architectures.

Static class variables are guaranteed to be initialized before main is
entered:
C++ standard 9.4.2 states:
...
Static data members are initialized and destroyed exactly like non-local
objects (3.6.2, 3.6.3).
...

3.6.2 states:
...
Objects with static storage duration (3.7.1 ) shall be zero-initialized (8.5)
before any other initialization
takes place. Zero-initialization and initialization with a constant
expression are collectively called static
initialization; all other initialization is dynamic initialization.
...

So I assume, that initialization of getter with address of a (static) class
function is a constant expression and therefore is not a dynamic
initialization.

OK.

(Please see 5.19 of a standard especially:
...
Other expressions are considered constant-expressions only for the purpose
of non-local static object
initialization (3.6.2). Such constant expressions shall evaluate to one of
the following:
...
- an address constant expression,
...
An address constant expression is a pointer to an lvalue designating an
object of static storage duration, a
string literal (2.13.4), or a function.)

Therefore there should be a guaranty that the Singleton static members are
initialized before main is entered. The locked mutex guarantees that only
one thread at one processor will enter the function at the same time. Isn't
it so?

Yep, only one thread gets into the guarded part of creating_singleton_getter, but non_creating_getter might still be seen and used too early.

Thanks for your ideas and answers.

Best Regards,
Ovanes