Boost logo

Boost :

Subject: Re: [boost] [lockfree::fifo] Review
From: Andy Venikov (avenikov_at_[hidden])
Date: 2009-12-11 10:01:58


Tim Blechmann wrote:

Also, Tim, it Chris Thomasson found another problem with the current
code. In non pointer packing solution your head and tail are
multi-words. You can't load multi-word atomically without some special
instructions. Here's his comment:

Chris M. Thomasson wrote
> AFAICT, in the code by Tim Blechmann, the anchor is typedef'ed as:
> ____________________________________________________________
> typedef tagged_ptr<node> atomic_node_ptr;
> ____________________________________________________________
>
>
>
>
> and `tagged_ptr<node>' is double-wide when
> `BOOST_LOCKFREE_PTR_COMPRESSION' is not defined. Tim loads the head/tail
> anchor in his implementation of the MS algorithm like:
> ____________________________________________________________
> atomic_node_ptr tail (tail_);
> ____________________________________________________________
>
>
>
>
> and the copy constructor of the double-wide version of `tagged_ptr' is:
> ____________________________________________________________
> /** copy constructor */
> tagged_ptr(tagged_ptr const & p)//: ptr(0), tag(0)
> {
> set(p);
> }
> ____________________________________________________________
>
>
>
>
> and the `set()' procedure is:
> ____________________________________________________________
> void set(tagged_ptr const & p)
> {
> ptr = p.ptr;
> tag = p.tag;
> }
> ____________________________________________________________
>
>
>
>
> Which will not be atomic on a 32-bit system. AFAICT, this breaks the
> algorithm according to the MS queue pseudo-code. So, Tim should probably
> change this behavior and call `atomic_set()' instead. Of course this
> will only make the performance much worse! You could use the `MOVQ'
> instruction on the x86, but that would use MMX. This is why I pointed
> you to the blocking queue which does not need to do this. Yes, it's
> blocking but it's a MUCH simpler algorithm and has far better
> performance characteristics.

<snip>
> i just went through the replies ... (maybe i should upcase some parts in
> the documentation, that the implementation focus on WORST CASE, not
> AVERAGE CASE performance ... people keep complaining that the stack/fifo
> may be outperformed be blocking algorithms, which is both true and
> irrelevant for me, as these implementations are soft real-time safe (and
> could be made hard real-time safe).

Agreed.
Are you planning to augment your library with algorithms for different
needs? Like MPSC, SPMC and so forth? I think that could be useful,
because it looks like implementing the most generic MPMC case won't give
you the best results.

>
> as for the aba tag ... increasing the tag is not necessary in the
> enqueue operation (chris thomasson made a valid point here), but then
> 16bit would give 2**16 different tags. of course a tag overflow is
> possible, but not very likely ...

I guess it all depends on the algorithm and the actual memory manger
used. For example, the ABA problem is more likely to appear in a simple
LIFO stack than in M-S's FIFO queue. The generation count just decreases
the chance. As I understand it, if 1/N is the probability of ABA in any
given algorithm, then using the generation count of M bits makes the
probability 1/(N * x^M). Now, coming up with the N for any given
algorithm is what's hard to do. I guess, you could say that if M is
sufficiently high, then N is negligible.

>
> by definition ll/sc are immune to aba problems, but implementing cas via
> ll/sc, one loses this feature ... personally i would prefer to have
> language support for ll/sc transactions instead of aba-prone cas ...
>
> most cas-architectures provide dcas, while most ll/sc architectures
> shouldn't use cas emulation, but ll/sc-style transactions directly, as
> it is by definition aba immune ... too bad, c++0x doesn't provide
> language support for ll/sc, but only for cas :/

Totally agree - when I was implementing this algorithm, I had two
different versions - one CAS-based with ABA tag and the other LL/SC
based without the tag. The problem with LL/SC on some platforms however
is sometimes not solvable. On PowerPC 7447, for example, the LL is
invalidated when ANY thread writes to ANY memory location, not just to
the same cache line.... ugh...

>
> tim
>
>
>
> ------------------------------------------------------------------------
>
> _______________________________________________
> Unsubscribe & other changes: http://lists.boost.org/mailman/listinfo.cgi/boost


Boost list run by bdawes at acm.org, gregod at cs.rpi.edu, cpdaniel at pacbell.net, john at johnmaddock.co.uk