Boost logo

Boost :

Subject: Re: [boost] [lockfree::fifo] Review
From: Chris M. Thomasson (cristom_at_[hidden])
Date: 2010-01-18 16:02:07


"Helge Bahmann" <hcb_at_[hidden]> wrote in message
news:201001181953.33044.hcb_at_chaoticmind.net...
> Am Saturday 16 January 2010 03:15:44 schrieb Chris M. Thomasson:
>> FWIW Helge, here is a new proxy garbage collector algorithm of mine that
>> I
>> am currently experimenting with:
>>
>> http://groups.google.com/group/comp.programming.threads/browse_frm/thread/a
>>53f24de178b419f
>
> interesting concept, but AFAICT it shares two undesirable properties with
> my
> implementation: Reclamation can be delayed indefinitely if
>
> - one thread is "stuck" within the critical section (e.g. scheduled away)

Well, if a thread actually deadlocks or "permanently livelocks" within the
critical section, then no reclamation can occur.

> - threads interlock such that at least one thread is always within the
> critical section

> Right?

No. My proxy algorithm is N = 2, while yours is N = 1 where N is the number
of collector objects. This means that you only have a _single_ reference
count. During periods of load, this single reference count might never reach
zero and the garbage will keep piling up.

In my case, I can atomically switch collector objects and reset the
reference count thus guaranteeing that the old one will drop to zero exactly
when the last thread accessing it releases it's reference. This is because
all new threads will take reference's to the new collector. Here is key area
in my algorithm, take note of the following procedure in the code:

http://cpt.pastebin.com/f537f2c7c

__________________________________________________________________
void prv_quiesce_begin(collector& c)
{
    // Try to begin the quiescence process.
    if (! m_quiesce.exchange(true, std::memory_order_acquire))
    {
        // advance the current collector and grab the old one.
128: size_t i = ((&c - m_collectors) + 1) & 1;

130: unsigned int old = m_current.exchange(i, std::memory_order_acq_rel);

        // decode reference count.
        unsigned int refs = old & 0xFFFFFFF0U;

        // verify previous collector index.
        RL_ASSERT((old & 0xFU) == (&c - m_collectors));

        // increment and generate an odd reference count.
        c.m_count.fetch_add(refs + 0x10U, std::memory_order_release);
    }
}
__________________________________________________________________

Line `128' simply determines the index of the next collector object. Line
`130' atomically resets the main reference count to zero _and_ swaps in the
next collector index. Now, all new threads will only be able to gain
references to the collector that was just swapped in which means that it
will be impossible for them to gain access to the old collector. Therefore,
the old collector will drop to zero. I guess this is similar to using split
counters to implement RCU. However, AFAICT my algorithm is unique and
completely different than Paul McKenney's split counter RCU solution.

> It's not that I consider this problematic (there is no reason I would ever
> want to share a data structure between threads if I know that the above
> can
> pose a problem).

It can be problematic for various usage patterns. For instance, think if I
wanted to use an N = 1 collector to solve the reader/writer problem (e.g.,
readers iterating over a data-structure while writers are concurrently
pushing/popping items). Now, I fully expect reads to be very frequent in
such scenarios. This would not work well for N = 1 collector because if
reader threads were constantly accessing the data-structure then the
reference count would probably never reach zero. However, in an N = 2
collector I can guarantee that new reader threads would not effect the
reference count of the swapped out collector in line `130'. This is a case
in which N = 2 is key difference between our implementations.

>> The algorithm is wait-free in practice on systems with native
>> fetch-and-add
>> and swap instructions like x86-32/64. If the architecture forces you to
>> implement fetch-and-add with a damn LL/SC or CAS loop, well, then it's
>> not
>> really wait-free now is it? No, I am afraid it's lock-free, or whatever
>> semantics the LL/SC is which could even be obstruction-free. Sad...
>
> Depends how it is implemented -- LL/SC is wait-free iff other reads to the
> location do not invalidate the reservation (PowerPC is in that category).

Here are some fairly interesting posts from Andy Glew (former Computer
Architect at Intel) wrt LL/SC:

http://groups.google.com/group/comp.arch/msg/18f0d50aa3c9a148

http://groups.google.com/group/comp.arch/msg/5d1ff313519005f8

> I
> strongly suspect that CAS on x86 is actually just a micro-coded LL/SC
> anyway...

I think they take locks to get the job done. Andy Glew would be the guy to
ask. So, I should probably post a query over in `comp.arch' where all the
hardware gurus hang out.

;^)


Boost list run by bdawes at acm.org, gregod at cs.rpi.edu, cpdaniel at pacbell.net, john at johnmaddock.co.uk