Boost logo

Boost :

Subject: Re: [boost] [lockfree] Faster ring-buffer queue implementation
From: Alexander Krizhanovsky (ak_at_[hidden])
Date: 2014-01-16 04:06:09


On 01/16/2014 08:04 AM, Gavin Lambert wrote:
> On 16/01/2014 07:32, Quoth Alexander Krizhanovsky:
>> Otherwise I'm wondering if it has sense to integrate the queue
>> implementation into boost::lockfree?
>>
>> The algorithm description is available at
>> http://www.linuxjournal.com/content/lock-free-multi-producer-multi-consumer-queue-ring-buffer
>>
>
> I've only had a quick glance at it, but it looks less general-purpose
> -- it appears that for each queue instance, you need to predefine an
> upper bound on the number of producer and consumer threads and assign
> each thread a unique id within that bound.
>
> (Also, given the apparent locality of the thread-id, I don't like that
> it is stored and accessed as a thread-global; this would complicate
> use of multiple queues from one thread.

The queue requires thread local pointers to head and tail (the same
approach is used in LMAX Disruptor). Since each thread can have access
to many queues, then the local pointers are implemented as an array of
ThrPos structures and the array is a member of particular queue
instance. However, to quickly access the queue we need small consequent
thread identifiers.

For example if we have two threads with identifiers 0 and 1 and two
queues Q0 and Q1, then the thread will access their local pointers
through Q0.thr_p_[0] and Q0.thr_p_[1] for first queue and Q1.thr_p_[0]
and Q1.thr_p_[1] for the second queue. This can be continued to
arbitrary number of threads and queues. We only need to know on the
queue construction how many threads will be using it.

> And interleaving the per-thread head/tail indexes seems vulnerable to
> false sharing.)

False sharing is bad due to costly requests for ownership message of
cache coherence protocol. tail and head pointers are modified only by
the owning thread which is preferably should be binded to particular CPU
(however, OS scheduler doesn't reschedule a thread to other CPU without
need). Other threads/CPUs are only reads tail/head pointers of the thread.

There is sense to align ThrPos to cache line, so different CPUs won't
update the same cache lines. I tried this, but didn't get intelligible
performance improvement on my laptop (2 cores), 12 cores/2 processors
and 40 cores/4 processors servers. Probably, there is sense to make more
benchmarks....

>
> There are still many cases where manually assigning thread ids is
> achievable, so something like this might be useful to have.
Yes, I understand that current thread ids are bit ugly and it's good to
replace them by nicer interfaces. It seems boost::thread and std::thread
don't provide small consecuent thread ids... Any ideas how it's possible
to adjust the interface?

> But it couldn't be a replacement for the more general queue.

Yes, definitely agree. The ring buffer queue with all its limitations
was designed for classical work queue, not as a general fast queue.

>
>
>
> _______________________________________________
> Unsubscribe & other changes:
> http://lists.boost.org/mailman/listinfo.cgi/boost
>

-- 
Alexander Krizhanovsky
NatSys Lab. (http://natsys-lab.com)
tel: +7 (916) 717-3899, +7 (499) 747-6304
e-mail: ak_at_[hidden]

Boost list run by bdawes at acm.org, gregod at cs.rpi.edu, cpdaniel at pacbell.net, john at johnmaddock.co.uk