Boost logo

Boost :

Subject: Re: [boost] [lockfree::fifo] Review
From: Chris M. Thomasson (cristom_at_[hidden])
Date: 2010-01-20 18:51:31


"Helge Bahmann" <hcb_at_[hidden]> wrote in message
news:alpine.DEB.1.10.1001191553030.25332_at_m65s28.vlinux.de...
> On Mon, 18 Jan 2010, Chris M. Thomasson wrote:
>
>> "Helge Bahmann" <hcb_at_[hidden]> wrote in message
>> news:201001181953.33044.hcb_at_chaoticmind.net...
>>> Am Saturday 16 January 2010 03:15:44 schrieb Chris M. Thomasson:
>>>> FWIW Helge, here is a new proxy garbage collector algorithm of mine
>>>> that I
>>>> am currently experimenting with:
>>>>
>>>> http://groups.google.com/group/comp.programming.threads/browse_frm/thread/a
>>>> 53f24de178b419f
>>>
>>> interesting concept, but AFAICT it shares two undesirable properties
>>> with my
>>> implementation: Reclamation can be delayed indefinitely if
>>>
>>> - one thread is "stuck" within the critical section (e.g. scheduled
>>> away)
>>
>> Well, if a thread actually deadlocks or "permanently livelocks" within
>> the critical section, then no reclamation can occur.
>
> this means that reclamation (and by extension writing with bounded amount
> of memory) is not wait-free (this is not a problem I think, just pointing
> out)

Yes; very good point Helge. I simply _failed_ to clarify that the wait-free
properties come from the fact that I am not using loops (e.g., CAS loop) to
acquire/release a reference to a collector object. This has been a point of
interest in the past over on `comp.programming.threads'. Previous collectors
used CAS-loop to either acquire and/or release a reference. This does not
work out to well for a reader-writer pattern because only one reader thread
can gain access while the others fail their CAS and have to try again. This
lock-free property can result in fairly poor performance for readers.
Therefore, we have been coming up with wait-free techniques to allow readers
to gain access without looping around in software.

>>> - threads interlock such that at least one thread is always within the
>>> critical section
>>
>>> Right?
>>
>> No. My proxy algorithm is N = 2, while yours is N = 1 where N is the
>> number of collector objects. This means that you only have a _single_
>> reference count. During periods of load, this single reference count
>> might never reach zero and the garbage will keep piling up.
>
> [snip explanation]
>
> Okay thanks for your explanation, I did not understand your code correctly
> the first time, but now things are much more clear.

That's great! :^)

> The concept is very nice indeed, and I agree that N=1 and N=2 makes quite
> a difference. I *think* that your approach is much more valuable as a
> building block for "object-local RCU" for data structures with long-lived
> read access than ABA prevention for producer/consumer data structures
> (where the read-side critical section is very short).

I think I agree with that statement.

> Thanks for sharing the code,

You are welcome! If you can think of any optimizations I would LOVE to hear
all about them. One thing I can think of would be to remove the need for
threads to actually destroy objects. This can be moved to a separate thread
or the objects can simply be stuck in a cache to be reused. For instance, if
the `proxy::release(...)' procedure was turned into a function which
returned a pointer to a stack of nodes that are in a quiescent state, then
the thread could decide what to do with them. I think that this would make
the algorithm more flexible.

Please note that this is only an initial implementation and it's more of a
proof-of-concept than anything. There are probably many improvements that
can be made...

;^)

> I can already think of an application for this in my own code :)

Cool. IMVHO, the proxy collector is fairly generic in that you can use it to
solve memory reclamation in many different types of non-blocking algorithms
and *usage patterns. I think that if Boost does include a non-blocking
library, then something along the lines of the collector algorithm should
probably be included.

[*] - I am experimenting with the collector using the following usage
pattern. Imagine a database server that get very many sustained query on a
regular basis. Now, reader threads receive search requests and then execute
the query on a dynamic shared data-structure. Here is high-level pseudo-code
of how I am constructing the logic of the reader threads:
__________________________________________________________________
#define SYNC_INTERVAL 256
#define DEFER_THRESHOLD 128

typedef proxy<DEFER_THRESHOLD> proxy_type;

static proxy_type g_proxy;

void reader_thread()
{
    proxy_type::collector* c = &g_proxy.acquire();

    for (unsigned i = 1 ;; ++i)
    {
        query_type* q = try_to_get_a_query_request();

        if (! q)
        {
            g_proxy.release(*c);

            q = wait_for_a_query_request();

            c = &g_proxy.acquire();
        }

        execute_query(q);

        if (! (i % SYNC_INTERVAL))
        {
            g_proxy.release(*c);

            c = &g_proxy.acquire();
        }
    }

    g_proxy.release(*c);
}
__________________________________________________________________

Instead of acquiring/releasing a collector object for every single execution
of a query this technique instead attempts to amortize the number of times a
reader thread needs to acquire/release a collector.

Now, I can think of a possible optimization/enhancement to the proxy
algorithm itself. Take note that an odd reference count denotes that a
collection cycle is in progress. Therefore, if a reader thread can simply
test if the reference count of it's acquired collector object is even, then
it simple does not actually "need" to release a reference to it. If it's
odd, then a release is necessary in order to move along the current
collection process. This would allow the reader threads acquire/release
pattern to be amortized across the actual garbage deferment threshold. I
think I am going to alter the algorithm in order to support the following
usage pattern:
__________________________________________________________________
#define DEFER_THRESHOLD 256

typedef proxy<DEFER_THRESHOLD> proxy_type;

static proxy_type g_proxy;

void reader_thread()
{
    proxy_type::collector* c = &g_proxy.acquire();

    for (;;)
    {
        query_type* q = try_to_get_a_query_request();

        if (! q)
        {
            g_proxy.release(*c);

            q = wait_for_a_query_request();

            c = &g_proxy.acquire();
        }

        execute_query(q);

        if (c->is_collecting())
        {
            g_proxy.release(*c);

            c = &g_proxy.acquire();
        }
    }

    g_proxy.release(*c);
}
__________________________________________________________________

This would allow reader threads to curn along during periods of load with
fairly low overhead. I say this because the only time the readers would have
to release their collector object and re-acquire it is when they must
explicitly wait for query requests, or when the garbage threshold was
reached and a collection cycle was initiated by a writer thread. There are
some fairly promising enhancements I think I can make to the collector, and
that is one of them.

Humm...


Boost list run by bdawes at acm.org, gregod at cs.rpi.edu, cpdaniel at pacbell.net, john at johnmaddock.co.uk