Boost logo

Boost Users :

Subject: Re: [Boost-users] shared_ptr and weak_ptr concurrency
From: John Dlugosz (JDlugosz_at_[hidden])
Date: 2009-09-04 19:51:55


> I can see why it is not atomic in general.
> (although I still think the documentation should be changed. I don't
> think
> very many people understand that statement as "the c++ standard doesn't
> guarantee atomicity for builtin types, so shared_ptr isn't either", but
> as "I
> can do with shared_ptr anything I can do with an 'int' on my platform.)

Having just had an issue with documentation myself on another thread, I agree that it is spartan and not illustrative in nature.

 
> but I'm still not convinced that there's a lock required in my case,
> which
> was:
> "writing to an expired weak_ptr while multiple readers are trying to
> lock() it
> seems safe to me. (in the current implementation)"
>

I think it matters if you are simply dereferencing (as long as px is one value or the other, you'll take it), or copying into another smart pointer object (which must get px and pn in sync to work correctly).

It's my recent experience in my current work that carefully considering use cases is what allows for ultra-high performance code. But these assumptions also makes it brittle against future maintenance and changes to the program, so it's important to understand and document them.

> if we assume that the reads/writes are not reordered by the compiler
> (which I
> think is true because assigning to pn acquires a mutex or does
> something
> equivalent on lock-free platforms which should act as a memory
> barrier),

The compiler is free to rearrange non-volatile reads and writes, and with inlining can get pretty creative with that. Just coding "do all this stuff to the structure, AND THEN assign a pointer to that completed structure" is a known pitfall. Even if the pointer itself is declared volatile, the contents can still be written after the "final" pointer assignment.

Looking at the calls to inc and dec involved, I think (it's hard to follow) it ends up calling the Win32 API function. Oh, but you didn't say what platform you are on. In the past, I've seen compilers surprise me by keeping things in registers even across function calls, as it assumed that something declared locally and never apparently having its address taken could not be known anywhere else. Well, it was wrong <g>. I don't know to what extent the compiler may take liberities in assuming that an imported function might know an alias to some variables of yours. But a smart compiler *could* re-arrange things. Adding compiler-specific decorations to the functions is a way to improve performance, so it might very well "know" that the function only uses its parameters and they don't alias anything (Microsoft has several ways of promising that). Point is, if it's not declared volatile, the compiler MAY re-arrange it, even across function calls.

The compiler re-arranging access to variables, holding them in registers and sending them back later, etc. is a separate issue from what the platform does once it hits the "mov" instruction targeting that memory location. CPU memory fences are distinct from Compiler memory fences. You must use both.

So... make both writes to volatile variables so the compiler will do that promptly and not reverse them. You can use reference casts to make "just this write" volatile.
Meanwhile, I know that on the x86/x64 that writes take effect in the order in which they are issued (it's mixing reads and writes that things get interesting).

Furthermore, in this example, the shown operator= is only for
    #if defined(__BORLANDC__) || defined(__GNUC__)
and it normally uses the generated assignment operator. I don't think that the standard requires the members to be assigned in any particular order (but I'd have to check to be sure). But, since neither variable is volatile, it could rearrange at will. In particular, the expanded inlined pn assignment contains several statements, and all those combined with the assignment to px are fair game to re-arrange to maximize throughput and avoid memory bottlenecks.

To make sure it works, add an explicit operator= that's like the one shown for BORLANDC and GNUC, but aliases both px and pn to volatile variables. Ah, but then you'll have trouble with the function call, so add 'volatile' to that function, and so it goes. You might also use the compiler-specific features to prevent code movement. For Microsoft, that would be the intrinsic pseudo-function _WriteBarrier(). But it between the two statements, and you know it will code for the px assignment first in the final machine code.

--John

TradeStation Group, Inc. is a publicly-traded holding company (NASDAQ GS: TRAD) of three operating subsidiaries, TradeStation Securities, Inc. (Member NYSE, FINRA, SIPC and NFA), TradeStation Technologies, Inc., a trading software and subscription company, and TradeStation Europe Limited, a United Kingdom, FSA-authorized introducing brokerage firm. None of these companies provides trading or investment advice, recommendations or endorsements of any kind. The information transmitted is intended only for the person or entity to which it is addressed and may contain confidential and/or privileged material. Any review, retransmission, dissemination or other use of, or taking of any action in reliance upon, this information by persons or entities other than the intended recipient is prohibited. If you received this in error, please contact the sender and delete the material from any computer.


Boost-users list run by williamkempf at hotmail.com, kalb at libertysoft.com, bjorn.karlsson at readsoft.com, gregod at cs.rpi.edu, wekempf at cox.net