Subject: Re: [boost] [lockfree::fifo] Review
From: Tim Blechmann (tim_at_[hidden])
Date: 2009-12-05 09:27:14
thanks for your comments!
> First of all I think that having a library of lock-free
> algorithms/containers would be greatly beneficial to a great number of
> boost users. On the other hand, having implemented several lock-free
> algorithms, I know that it is notoriously hard to get them right. We'll
> need all the help we can get from knowledgeable people in the field. The
> implementation needs to be thoroughly reviewed. Let me give the first
> attempt at that.
i know of the difficulty to get lock-free data structures right, and
several issues have been solved by people, reviewing my code ...
> Second, this algorithm makes certain assumptions about the ability to
> access memory that has been freed. The algorithm can try to read memory
> that has been de-allocated. (See further down for description of how).
> Several debug implementations of malloc/free store every allocated
> element on one page and when the element is freed, the page is marked
> protected for both reading and writing. That means that this algorithm
> can only work with a subset of custom allocators. Your "free_list" for
> example can't work with it since it's possible for a node to be freed
> when you call "deallocate". The "caching_freelist" and "static_freelist"
> on the other hand work just fine because they never return the memory
> back to the system.
the free_list class should not be used for the stack/fifo classes (i
should remove it to avoid confusion). i chose the freelist approach,
since hazard pointers and pass-the-buck are/may be subject of patents/
> Third, and the biggest limitation, is the algorithm's assumption that
> you can store a pointer to an element plus an ABA tag in a contiguous
> memory space that can be passed to a CAS instruction. I believe not all
> platforms provide a DCAS instruction, that's why you're trying to work
> around this issue by using "packed pointers", i.e. storing the ABA tag
> in the part of the pointer that presumably is never used. The problem
> with that, however, is that some libc implementations of malloc/free do
> use all the bits in the pointer for their own needs, so you can't assume
> that you can meddle with them.
puh, can you elaborate on this? i haven't been aware of any use of the
unused address space bits.
> My experience with this algorithm shows that the safest approach is to
> use a pre-allocated array for the queue and replace pointers with
> indexes into the array. Assuming that the size of an index is
> sufficiently lower than the size of a pointer, you can use the remaining
> bits to store the ABA tag. This alleviates both point 2 and 3. This does
> mean however, that the queue can't have an unbound size, the size has to
> be specified during the queue instantiation.
that would greatly increase the portability ... it would be even
possible to use e.g. 16bit indices and 16bit tags on 32bit platforms
> Fourth, on platforms where CAS is implemented using LL/SC (such as
> PowerPC and MIPS), it is possible to livelock. The problem with LL/SC on
> most platforms is that the write granulum for marking LL's address as
> "dirty" isn't just the memory pointed to by LL, but the whole cache line
> where the memory resides. On these platforms it is important to make
> sure that each "node" of the queue is located on a separate cache line.
interesting point! i should somehow handle size and alignment of the
nodes on these platforms
> Here's the code of your enqueue:
> bool enqueue(T const & t)
> node * n = alloc_node(t);
> if (n == NULL)
> return false;
> for (;;)
> atomic_node_ptr tail (tail_); //1
> read_memory_barrier(); //2
> atomic_node_ptr next (tail->next); //3
> if (likely(tail == tail_)) //4
> if (next.get_ptr() == 0) //5
> if ( tail->next.cas(next, n) )//6
> tail_.cas(tail, n);
> return true;
> tail_.cas(tail, next.get_ptr());
> 1. You're missing write barrier after a call to alloc_node().
> Rational: alloc_node sets the "next" pointer of the newly allocated node
> to NULL. That is a store to shared memory. Setting next pointer to NULL
> has to be observable to other threads BEFORE they observe the inserted
> node. If you don't insert a write barrier, it is possible for other
> threads to see the new element in the list before element's next pointer
> is initialized with NULL.
i was assuming, that cas issues a full memory barrier (at least my cas
implementation does), so i don't think, this would be an issue.
> 2. You've misplaced the read_barrier.
> Rational: Line 4 checks the validity of the load done on line 3. That
> means that the read_barrier should be between lines 3 and 4. If you
> don't have a read_barrier after loading *tail->next, then this load can
> happen after the load of tail_ on line 4, which defies the purpose of
> line 4.
> You still need to have a barrier between lines 1 and 3, but here only a
> data dependency barrier will suffice, which is a weaker form of a read
> barrier and on most all platforms is a no-op since data dependent loads
> can be handled by processors automatically.
thanks for this explanation ...
> 3. Your tail_ and head_ should be volatile. Furthermore, the getters for
> you tagged_ptr (like get_ptr, operator->() and operator* ) should
> return pointers/references to volatile types.
> Rational: If your tail_ is not volatile, then the compiler is free to
> load head_ just once and line 4 effectively becomes a no-op.
this was already pointed out a few days ago and is fixed in my repository.
> 1. Again, you need to put read_barrier between lines 4 and 5. Only data
> dependency barrier will suffice at line 2.
> 2. You need to add write barrier after line 7 for the same reason as in
> the enqueue.
same as above
> Hope this was descriptive,
i am currently porting the data structures to the boost.atomic proposal,
which provides better language support to specify the memory order
constraints. the code is available from a separate branch of my
repository. i would be curious, if you could have a look at that
implementation as well.
thanks a lot for your review!
-- tim_at_[hidden] http://tim.klingt.org There's no such entity as "most people". These are generalities. All generalities are meaningless. You've got to pin it down to a specific person doing a specific thing at a specific time and space. William S. Burroughs
Boost list run by bdawes at acm.org, gregod at cs.rpi.edu, cpdaniel at pacbell.net, john at johnmaddock.co.uk