Boost logo

Boost :

From: Chris Thomasson (cristom_at_[hidden])
Date: 2006-10-21 21:33:37


"Peter Dimov" <pdimov_at_[hidden]> wrote in message
news:002a01c6f498$3e7a0bf0$6507a8c0_at_pdimov2...
> Chris Thomasson wrote:
>> I think my scheme could possibly be more efficient

[...]

>> Humm... I would have to see a sketch of the algorithm you have in mind
>> Peter...

> Basically... using copy/replace on a shared_ptr would make it behave
> atomically ('global') and using the other ops would make it non-atomic
> ('local'/semi-local).

[...]

> As I understand your scheme, you take a spinlock in your equivalent of the
> copy operation (in atomic_add_load_strong).

Yes.

> However you are able to avoid
> the write lock in the replace operation by deferring it until the refcount
> reaches zero,

Exactly.

> something like (best guess):

[...]

> except I don't see the "re-check for zero" part in your assembly, so I
> might
> be missing something; how do you protect against that scenario?

Here is some more detailed pseudo-code I posted in response to Joe Seigh:

http://groups.google.com/group/comp.programming.threads/msg/224aa9d097f4300e

The basic technique for 'copying' is as follows:

rc* copy(rc **sloc, int count) {
  1: load ptr from *sloc
  if ptr is null goto 2
  lock spinlock associated with ptr
  re-load ptr from *sloc
  compare w/ previous load
  if not equal unlock and goto 1
  XADD ptr's with count
  * if the previous value was less than 1 unlock and goto 1
  unlock
2: return ptr
}

It also good to keep the following in mind:

*: If the value was less than 1, that means that we detected a drop to zero
condition on the 'ptr'. Keep in mind that we are locked on the spinlock that
is associated with 'ptr'. The decrement thread always locks before it
destroys, so it will have to wait for us. It also means that all of the
shared locations that previously contained a pointer value equal to 'ptr'
are now either null or have **changed to another value; we fail, and try
again.

The decrement logic looks like this:

bool dec(rc *ptr, int count) {
  XADD ptr's refs with negated count;
  if new value is greater than 0 then return false;
  D1: lock spinlock associated with ptr;
  unlock spinlock associated with ptr;
  call ptr dtor;
  return true;
}

If D1 is reached, that means we have to lock the spinlock for 'ptr'. This
must be done because there could be an increment thread(s) trying to
increment; if any of them has it locked, we have to wait until they
fail-and-unlock: If there is an increment thread that has locked, that means
that it will fail because it will notice that it tried to decrement a value
that was less than 1. If there happens to be an increment thread that has
loaded a pointer value that is equal to 'ptr' and is waiting or just about
to lock the same spinlock, then it will also fail; it's re-load and compare
will ensure that.

**: If there happens to be an increment thread that has loaded a pointer
value that is equal to 'ptr' and is waiting or just about to lock the same
spinlock, and the decrement thread runs the dtor and the user-application
reuses the 'ptr' and swaps it into the exact location that the increment
thread loaded its pointer from, then all is well. The increment threads
lock-and-compare will succeed, and the increment will attempt to proceed.
This algorithm is ABA proof.

My documentation is almost complete; it should help clear things up. Did my
description address some of your initial concerns?

> I can't easily do that for shared_ptr since its swap is not atomic
> (shared_ptr is double-wide) but it could be possible...

Yes. IIRC, you mentioned that you could use DWCAS to modify a shared
location that contains a shared_ptr. However, as we know, DWCAS will
introduce your algorithm to a rather annoying portability problem:

http://groups.google.com/group/comp.arch/browse_frm/thread/71f8e0094e353e5

;^)

> but I'm not sure whether it's worth it, since in a typical application the
> copy ops will
> outnumber the replace ops, so avoiding only the write lock may not be that
> important.

IMHO, anytime you can reduce the number of atomic operations and/or memory
barriers is a plus...

:^)


Boost list run by bdawes at acm.org, gregod at cs.rpi.edu, cpdaniel at pacbell.net, john at johnmaddock.co.uk