Subject: Re: [boost] [lockfree::fifo] Review
From: Tim Blechmann (tim_at_[hidden])
Date: 2009-12-21 04:51:44
On 12/20/2009 08:44 PM, Gottlob Frege wrote:
> On Sun, Dec 20, 2009 at 11:17 AM, Tim Blechmann <tim_at_[hidden]> wrote:
>> On 12/20/2009 04:57 PM, Chris M. Thomasson wrote:
>>> "Tim Blechmann" <tim_at_[hidden]> wrote in message
>>>>> Well, IMO, it should perform better because the producer and consumer
>>>>> not thrashing each other wrt the head and tail indexes.
>>>> the performance difference is almost the same.
>>> Interesting. Thanks for profiling it. Have you tried aligning everything on
>>> cache line boundaries? I would try to ensure that the buffer is aligned and
>>> padded along with the head and tail variables. For 64-byte cache line and
>>> 32-bit pointers you could do:
> How about we go through the ring buffer by steps of 2^n - 1 such that
> each next element is on a separate cache line? ie instead of
> m_head = (m_head == T_depth - 1) ? 0 : (m_head + 1);
> we do
> m_head = (m_head + 7) % T_depth;
> You still use each slot, just in a different order. You calculate 'n'
> to be whatever you need based on the cell size. As long as the
> resultant step size is prime mod T_depth.
> I'm not sure if the false-sharing avoidance would be worth the cost of
> using up more cache lines. Probably depends on how full the queue is,
yes, i would guess, the performance characteristics differ depending on
the number of elements in the buffer. also, the current implementation
can easily be adapted to enqueue/dequeue multiple objects efficiently ...
-- tim_at_[hidden] http://tim.klingt.org All we composers really have to work with is time and sound - and sometimes I'm not even sure about sound. Morton Feldman
Boost list run by bdawes at acm.org, gregod at cs.rpi.edu, cpdaniel at pacbell.net, john at johnmaddock.co.uk