Date: Fri, 5 May 2006 14:09:33 -0600
From: Ray Whitmer < ray@personallegal.net>
Subject: Re: [Boost-users] Thread-safe smart pointers, hashtables
without locking
To: boost-users@lists.boost.org
Message-ID: < 0D5E3BD4-9A5D-4CE2-9040-35500773C190@personallegal.net>
Content-Type: text/plain; charset=US-ASCII; delsp=yes; format=flowed

> The algorithm makes sense, and if all these claims hold then it
> should work, I think. I'm still concerned about step (2), however,
> if only because of the assumption that what you write as a load/
> store in C++ maps to a load/store in ML. Will it? Always? I ask as
> a non-rhetorical question: does the C++ standard *guarantee* it?
> Because if it doesn't then it seems like a potential problem, but
> maybe I'm being too careful.

I'd also say the idea of moving a pointer out of the way, and then using the flip-gate-count-thing to wait for any 'potentially-using' readers to finish, definitely has 'potential'. You would need to show detailed code to really be sure. And not to us - to the guys on comp.programming.threads.

If I could answer in the affirmative, I could prove it works and not
have to talk to anyone.  I can make a bunch of lesser claims, that of
course fail to add up to the strong claim but make me feel better:

3. I have looked at the output of many C compilers and have never
seen a pointer assignment take multiple instructions.  The exception
might be if someone used a poorly-written memcpy-like operations to
transfer the contents of a structure byte by byte.  A well-written
one takes the respective alignments into account and transfers
aligned 32-bit quantities one-by-one.  We are talking about code in a
controlled Hashtable class, it would have to be the "C" compiler
choosing to transfer a class consisting of a single address field
using something other than a single instruction.  It doesn't sound
logical to me, but it wouldn't surprise me completely to see it
happen, but I would report it as a compiler bug and expect it to be
addressed even though it is not in the standard.

As long as the address of the pointer is properly aligned, it will be atomic. The language doesn't guarantee that (the language doesn't talk about atomic or threads at all - at least not yet), but the processors + compilers tend to guarantee it. HOWEVER! *even if it is atomic*, on a multi-cpu machine each cpu might see memory differently - you need memory barriers to ensure the other cpus see your new pointer value correctly. Think of cpus and memory like harddrive controllers that reorder reads and writes to lessen seeks - the cpu does the same sort of thing with memory - and in doing so, it may reorder the write of your pointer to 'happen' before you expected (because it was near other memory it was writing) and/or read the pointer later than expected (because it optimized other reads into a 'bundle' before the read of your pointer).

Anyhow, if you haven't already, take a look at memory barriers, and, as a canonical example, the broken 'double checked locking pattern'. You can't do lock free without understanding this stuff.

4. These issues have to be addressed with the multi-processor cores
becoming more prevalent because they are basic to lock-free
processing, or can C++ accept significant inferiority to Java with
respect to lock-free multiprocessing?

see above. Much of it doesn't come for free.

For what it is worth, if anyone else thinks the solution is worth a
generic contribution, or is of the opposite opinion that it would not
be generic enough, I would be interested in hearing. I suppose this
sort of question should go to the developers list. I think it would
be much more difficult to implement a table of shared_ptr than it is
for intrusive_ptr, because assigning a shared_ptr is much less likely
to be uninterrupted, leaving other threads with a broken visible
state. There is clearly more detail to discuss as well.
Ray Whitmer

Your wording of 'much less likely' is scary, and I think similar wording made James apprehensive as well. There is no grey - it is black or white, thread safe or not. And you are correct, you can't make { reading the location of a shared_ptr + incrementing its ref-count } atomic, it least not on most processors.

I'm not sure of the exact implementation of intrusive_ptr, but assuming it is like most, you can make

ip1 = ip2;

atomic IF ip2 is 'stable', ie guaranteed to not be changing during the assignment. ie ip1 can be read and/or written by other threads "while" the assignment is happening, and you will either get the before state of ip1, or the after state.

I don't think the boost implementation of intrusive_ptr's operator=() is atomic, but it could be made to be.