Boost logo

Boost :

From: Chris Thomasson (cristom_at_[hidden])
Date: 2006-10-21 23:30:55


"Peter Dimov" <pdimov_at_[hidden]> wrote in message
news:000601c6f582$10c92890$6507a8c0_at_pdimov2...
> Chris Thomasson wrote:
>> The basic technique for 'copying' is as follows:

[...]

> I thought that that might be the case. Won't you need to undo the XADD
> before unlocking and retrying, though? That, or use CAS.

I could use CAS, however I wanted to try an come up with an algorithm that
can avoid CAS...

> The decrement thread may not care, but if another increment thread gets in
> first, bad
> things happen; or am I missing something else?

This is where the reload-and-compare logic comes into play. It basically
boils down to a simple coherent lock-based snapshot algorithm. Load before
lock, reload after lock, compare, if equal you know that the previous load
is coherent with the loaded value that you grabbed under the protection of
the lock, if not unlock and retry... This will detect ALL swaps from shared
locations that jump in just before we lock; I guess its kind of similar
logic to the double check in SMR... If SMR did not perform that
double-check, that it would be broken beyond repair... Just like my
algorithm would. The snapshot logic does not touch any part of a refcount
object. It operates on shared locations that contain pointers to refcount
object.

It is not possible for a increment thread to load a pointer to refcount
object, after it drops to zero, except in the condition I described earlier
wrt ABA occurs; the algorithm is compatible with ABA.

A refcount object drop to zero condition means that there are no shared
locations that contain any pointers to it. Any increment threads that have
loaded pointers, will fail when they reload-and-compare. If one gets
through, and their compare succeeds, it doesn't matter because it has
already locked the associated spinlock, so it has exclusive access to the
refcount. Since its reload and compare succeeded, that means it happened
before any decrement thread got to execute; swaps happen before decrements
(e.g., swap shared loc with 0, dec the old ptr). A decrement thread will
always lock the spinlock before it calls the destructor, so it will wait for
the one that got through the compare logic...

One thing I forgot to mention in my last post, nit-picking, your algorithm
sketch invoked the user-provided destructor in the context of the
critical-section provided by the lock that is associated with the count ptr.
Can't call function pointers while you hold a lock? ;)

Humm... I have to admit that if I used CAS, the algorithm logic would be
easier to understand...

;)

>> IMHO, anytime you can reduce the number of atomic operations and/or
>> memory
>> barriers is a plus...
>
> Maybe... Given a choice between the two, I would prefer to somehow
> eliminate
> the spinlock in 'copy', though. :-)

You can get rid of it by using PDR... I have a hybrid version of my
algorithm that uses it. Things get tricky here because of all of the RCU,
SMR patents, not to mention my vZOOM patent... I think Boost could make use
of one of the following collectors:

http://groups.google.com/group/comp.programming.threads/msg/f443b38cf7bbca8a
http://groups.google.com/group/comp.programming.threads/msg/a4a796e25a157ca1

One of them uses DWCAS... I have workaround, you can make use of offset
trick explained here:

http://groups.google.com/group/comp.arch/msg/a3ebfe80363a4399

The other one uses CAS, however it operated on fixed number of threads. I
also have a workaround using an offset trick, or another technique that
allows for X number of threads to acquire proxy reference without
blocking...

This is good because reads and writes can be lock-free, and reads and writes
are no longer mutually excludable... Humm... I don't have docs on the
algorithm... Wonder if I should post it; its experimental...


Boost list run by bdawes at acm.org, gregod at cs.rpi.edu, cpdaniel at pacbell.net, john at johnmaddock.co.uk