Boost logo

Boost :

Subject: Re: [boost] [lockfree::fifo] Review
From: Andy Venikov (avenikov_at_[hidden])
Date: 2009-12-08 16:14:25

Tim Blechmann wrote:

>> Third, and the biggest limitation, is the algorithm's assumption that
>> you can store a pointer to an element plus an ABA tag in a contiguous
>> memory space that can be passed to a CAS instruction. I believe not all
>> platforms provide a DCAS instruction, that's why you're trying to work
>> around this issue by using "packed pointers", i.e. storing the ABA tag
>> in the part of the pointer that presumably is never used. The problem
>> with that, however, is that some libc implementations of malloc/free do
>> use all the bits in the pointer for their own needs, so you can't assume
>> that you can meddle with them.
> puh, can you elaborate on this? i haven't been aware of any use of the
> unused address space bits.

Yes. Recently I did an experiment on 32 bit 2.6 linux using SUSE 9.1 and
libc that came with it. (libc version was 2.3.3-97). I was doing all
sorts of allocations and looking at the returned pointer. What I found
was that the only the two low bits were always the same (obviously
because malloc has to return aligned data). All other bits could be
different depending on the size of the requested allocation. I think
they were also depended on the thread that was requesting the
allocation, but I'm not sure on this point. So, other than the 2 lower
bits, you can't assume you can use any more. And 2 bits seems kinda low
for an ABA tag.

>> My experience with this algorithm shows that the safest approach is to
>> use a pre-allocated array for the queue and replace pointers with
>> indexes into the array. Assuming that the size of an index is
>> sufficiently lower than the size of a pointer, you can use the remaining
>> bits to store the ABA tag. This alleviates both point 2 and 3. This does
>> mean however, that the queue can't have an unbound size, the size has to
>> be specified during the queue instantiation.
> that would greatly increase the portability ... it would be even
> possible to use e.g. 16bit indices and 16bit tags on 32bit platforms
> without dcas

Yup, that's the idea. Originally though, I said that the queue's size
will have to be bounded. I've been thinking about it for a while and I
think there's a way to keep it growing dynamically. I'll bounce some
ideas around on c.p.t newsgroup and will come back to you shortly.

>> Fourth, on platforms where CAS is implemented using LL/SC (such as
>> PowerPC and MIPS), it is possible to livelock. The problem with LL/SC on
>> most platforms is that the write granulum for marking LL's address as
>> "dirty" isn't just the memory pointed to by LL, but the whole cache line
>> where the memory resides. On these platforms it is important to make
>> sure that each "node" of the queue is located on a separate cache line.
> interesting point! i should somehow handle size and alignment of the
> nodes on these platforms

>> 1. You're missing write barrier after a call to alloc_node().
>> Rational: alloc_node sets the "next" pointer of the newly allocated node
>> to NULL. That is a store to shared memory. Setting next pointer to NULL
>> has to be observable to other threads BEFORE they observe the inserted
>> node. If you don't insert a write barrier, it is possible for other
>> threads to see the new element in the list before element's next pointer
>> is initialized with NULL.
> i was assuming, that cas issues a full memory barrier (at least my cas
> implementation does), so i don't think, this would be an issue.

You know, it's an interesting question whether CAS really implies a
memory barrier or not. Obviously, you've implemented it that way, but I
think it not need be that way. I think there's really no "standard" on
CAS so we need to go with what is commonly accepted. It may be another
question that would be good to ask at c.p.t
If someone can answer this question, please chime in.


> i am currently porting the data structures to the boost.atomic proposal,
> which provides better language support to specify the memory order
> constraints. the code is available from a separate branch of my
> repository. i would be curious, if you could have a look at that
> implementation as well.

How can I get to your latest code?

> thanks a lot for your review!
> cheers, tim


Boost list run by bdawes at, gregod at, cpdaniel at, john at