Boost logo

Boost :

Subject: Re: [boost] [lockfree::fifo] Review
From: Chris M. Thomasson (cristom_at_[hidden])
Date: 2009-12-18 06:52:11


"Helge Bahmann" <hcb_at_[hidden]> wrote in message
news:alpine.DEB.1.10.0912101115460.456_at_m65s28.vlinux.de...
> On Wed, 9 Dec 2009, Anthony Williams wrote:
>
>>> So far I have not yet met any 64-bit platform that provides 64 bits of
>>> (user-space!) address space (the same cannot be said of 32-bit
>>> platforms), so I would say that any use of spare bits on 64 bit is
>>> safe (for user-space!).
>>
>> As Chris also noted, I would be wary of using them for ABA checking
>> though: it's easy to overflow a 16-bit counter quite quickly, and that's
>> all you've got even if the OS only uses 48 of your 64 bits for address
>> space.
>
> This concern is certainly valid for basically any algorithm that uses
> "generation counters" or similar constructs for disambiguation (it could
> be argued that even 32 bits may not be enough). Using spare bits at all
> should however be considered a valid technique.
>
>>> FWIW there are other ways to avoid ABA than tagging pointers, and
>>> without DCAS on 32-bit, see e.g. this sample implementation of a
>>> lock-free stack:
>>>
>>> http://www.chaoticmind.net/~hcb/projects/boost.atomic/stack.hpp
>>
>> Yup. Incidentally, you don't even need to count the nodes in push,
>> provided you're careful when reclaiming nodes in pop.
>
> hm, I'm not sure what you mean, I don't think I am counting anything but
> the number of threads in the critical section?
>
>> The downside of this technique is that if the traffic is too heavy then
>> the ref count never reaches zero, so the nodes can never be reclaimed.
>
> Right, and although I would argue that such a highly contended hotspot is
> a design flaw in itself,

You have created a proxy garbage collector that reminds me of one I did a
long time ago which also suffered from this exact same problem. I was using
it for memory reclamation:

http://groups.google.com/group/comp.programming.threads/browse_frm/thread/3ded0e3c5496eaa0/cee38f0e7ff17940

it went something like, membars aside for a moment:
_____________________________________________________________
struct node
{
    struct node* next;
};

struct proxy
{
    struct node* head; // = NULL
    uint32_t count; // = 0
};

void
proxy_acquire(struct proxy* const self)
{
    struct proxy cmp, xchg;

    cmp.head = self->head;
    cmp.count = self->count;

    do
    {
        xchg.head = cmp.head;
        xchg.count = cmp.count + 1;
    }

    while (! DWCAS(self, &cmp, &xchg));
}

struct node*
proxy_release(struct proxy* const self)
{
    struct proxy cmp, xchg;

    cmp.head = self->head;
    cmp.count = self->count;

    do
    {
        xchg.count = cmp.count - 1;

        if (! xchg.count)
        {
            xchg.head = NULL;
        }

        else
        {
            xchg.head = cmp.head;
        }
    }

    while (! DWCAS(self, &cmp, &xchg));

    if (! xchg.count)
    {
        return cmp.head;
    }

    return NULL;
}

struct node*
proxy_defer(struct proxy* const self,
            struct node* node)
{
    struct proxy cmp, xchg;

    cmp.head = self->head;
    cmp.count = self->count;

    do
    {
        node->next = cmp.head;

        xchg.count = cmp.count;

        if (! xchg.count)
        {
            xchg.head = NULL;
        }

        else
        {
            xchg.head = node;
        }
    }

    while (! DWCAS(self, &cmp, &xchg));

    if (! xchg.count)
    {
        return node;
    }

    return NULL;
}
_____________________________________________________________

This solves memory reclamation, but will pile up nodes during times of
sustained load...

> there are ways to do reliable reclaim even in that case (e.g. that allow
> reclaiming an element after the last thread that could actually have
> "seen" it leaves the critical section

Yes. My next incarnation of the proxy collector above was coded to
explicitly handle this situation. It simply breaks off collector objects
during deferment and backlinks them all together:

http://webpages.charter.net/appcore/misc/pc_sample_h_v1.html

> -- can be done with checkpoints similar RCU, so it can even be more
> efficient than the atomic counter I used), yet I am not sure this is
> really worth it -- I would like to make an entirely different point that
> has been nagging me:

I don't think it's worth using RCU just to get around ABA in a lock-free
stack. Heck, it does not even fit in with the general usage pattern which
does not expect a lot of writes. Humm, well, of course you can take
tremendous pressure off of RCU by combining it with version counters and
only using them to determine when to defer nodes. For instance, you don't
need to defer a node into RCU until a version counter is just about to
rollover.


Boost list run by bdawes at acm.org, gregod at cs.rpi.edu, cpdaniel at pacbell.net, john at johnmaddock.co.uk