|
Boost : |
Subject: [boost] [thread] Customizing barrier for improved performance
From: Belcourt, Kenneth (kbelco_at_[hidden])
Date: 2010-06-06 14:53:25
Hi,
I notice that the thread barrier class is fairly large (128 bytes on
Darwin with Intel 11.1).
private:
mutex m_mutex;
condition_variable m_cond;
unsigned int m_threshold;
unsigned int m_count;
unsigned int m_generation;
and sort of slow for my application (parallel iterative solvers of
sparse linear systems). Many iterative algorithms have both serial
and parallel sections during a single iteration and, for larger
algorithms, this can result in numerous (order 10 or so) rendezvous
points during each iteration. During cursory testing I've found that
a barrier implemented with atomics is a bit faster than a mutex based
barrier (though I recognize that an atomic spin-based implementation
can potentially hang if running on a single Intel core with hyper-
threading enabled).
I've attached a simple atomic based implementation built on Intel tbb
atomic though it's easily convertible to boost.atomic when the time
comes. This implementation just ping-pongs a counter alternating
between incrementing and decrementing the counter each time it's called.
Does anyone know if there's plans to extend barrier so that a user
could select a different implementation (like an atomic based one)?
For some applications this could be a very useful extension.
Thanks.
-- Noel
Boost list run by bdawes at acm.org, gregod at cs.rpi.edu, cpdaniel at pacbell.net, john at johnmaddock.co.uk