Boost logo

Boost :

Subject: Re: [boost] [lockfree::fifo] Review
From: Helge Bahmann (hcb_at_[hidden])
Date: 2009-12-09 05:45:31


On Tue, 8 Dec 2009, Andy Venikov wrote:
>>> Third, and the biggest limitation, is the algorithm's assumption that you
>>> can store a pointer to an element plus an ABA tag in a contiguous memory
>>> space that can be passed to a CAS instruction. I believe not all platforms
>>> provide a DCAS instruction, that's why you're trying to work around this
>>> issue by using "packed pointers", i.e. storing the ABA tag in the part of
>>> the pointer that presumably is never used. The problem with that, however,
>>> is that some libc implementations of malloc/free do use all the bits in
>>> the pointer for their own needs, so you can't assume that you can meddle
>>> with them.
>>
>> puh, can you elaborate on this? i haven't been aware of any use of the
>> unused address space bits.
>
>
> Yes. Recently I did an experiment on 32 bit 2.6 linux using SUSE 9.1 and libc
> that came with it. (libc version was 2.3.3-97).

So far I have not yet met any 64-bit platform that provides 64 bits of
(user-space!) address space (the same cannot be said of 32-bit platforms),
so I would say that any use of spare bits on 64 bit is safe (for
user-space!).

FWIW there are other ways to avoid ABA than tagging pointers, and without
DCAS on 32-bit, see e.g. this sample implementation of a lock-free stack:

http://www.chaoticmind.net/~hcb/projects/boost.atomic/stack.hpp

>>> Fourth, on platforms where CAS is implemented using LL/SC (such as PowerPC
>>> and MIPS), it is possible to livelock. The problem with LL/SC on most
>>> platforms is that the write granulum for marking LL's address as "dirty"
>>> isn't just the memory pointed to by LL, but the whole cache line where the
>>> memory resides. On these platforms it is important to make sure that each
>>> "node" of the queue is located on a separate cache line.

just to make this clear... there are two types of "live-lock"s with ll/sc:

a) one thread accessing memory between ll/sc
b) two threads racing in a tight loop on ll/sc, continually aborting the
other's operation

If ll/sc is encapsulated in CAS, than a) is impossible

There is nothing you can do about b) without changing the CAS
implementation -- and there is generally no reason to even attempt that:
With a window of usually two-three instructions between ll/sc, the rate of
failure (i.e. repetition required due to interference) is surprisingly low
even with two threads racing in tight ll/sc loops.

Moving things to separate cache lines is a useful optimization (even
without ll/sc), but nothing that is required.

> You know, it's an interesting question whether CAS really implies a memory
> barrier or not. Obviously, you've implemented it that way, but I think it not
> need be that way. I think there's really no "standard" on CAS so we need to
> go with what is commonly accepted. It may be another question that would be
> good to ask at c.p.t
> If someone can answer this question, please chime in.

There are valid uses for CAS with and without various types of barriers.
See the explicit memory_order specification in my atomic operations
library.

Using correct ordering constraints is unfortunately tricky -- use "too
strong" ones, and performance will look ridiculous compared to locking.
Use "took weak" ones, and your implementation will spuriously fail in
ways that are nearly impossible to diagnose.

Best regards,
Helge


Boost list run by bdawes at acm.org, gregod at cs.rpi.edu, cpdaniel at pacbell.net, john at johnmaddock.co.uk