Boost logo

Boost :

Subject: [boost] [thread] Customizing barrier for improved performance
From: Belcourt, Kenneth (kbelco_at_[hidden])
Date: 2010-06-06 14:53:25


I notice that the thread barrier class is fairly large (128 bytes on
Darwin with Intel 11.1).

         mutex m_mutex;
         condition_variable m_cond;
         unsigned int m_threshold;
         unsigned int m_count;
         unsigned int m_generation;

and sort of slow for my application (parallel iterative solvers of
sparse linear systems). Many iterative algorithms have both serial
and parallel sections during a single iteration and, for larger
algorithms, this can result in numerous (order 10 or so) rendezvous
points during each iteration. During cursory testing I've found that
a barrier implemented with atomics is a bit faster than a mutex based
barrier (though I recognize that an atomic spin-based implementation
can potentially hang if running on a single Intel core with hyper-
threading enabled).

I've attached a simple atomic based implementation built on Intel tbb
atomic though it's easily convertible to boost.atomic when the time
comes. This implementation just ping-pongs a counter alternating
between incrementing and decrementing the counter each time it's called.

Does anyone know if there's plans to extend barrier so that a user
could select a different implementation (like an atomic based one)?
For some applications this could be a very useful extension.


-- Noel

Boost list run by bdawes at, gregod at, cpdaniel at, john at