On 30.03.2009, at 14:08, Anthony Williams wrote:
Oliver Abert <
abert@uni-koblenz.de> writes:
Thanks for alerting me to this thread Peter.
Oliver Abert <abert@uni-koblenz.de> writes:
On 29.03.2009, at 19:36, Peter Dimov wrote:
Oliver Abert:
Hi Everyone,
I am using Boost Threads (1.38) as threading library and I also use
the thread_specific_ptr to store a minor amount of data per thread
(I think currently it is like 5 different pointer values per
thread). Technically everything works out fine, but I am having a
performance problem on Mac OS X. On Linux the performance is 10
times faster than on Mac OS. If I use pthreads on Mac OS I have
identical performance to the Linux version. Both versions are
running on the same machine using 8 threads both.
What does your profiler say?
about 80% of the time is spend in __spin_lock which in turnwas called
by pthread_once. If I use only one thread (instead of 8) the
percantage goes down to 2.5% - which is still a bit much for my
taste.
pthread_once is called by the thread_specific_ptr code to ensure that
the TLS key it uses has been allocated and is valid. It's a real
pain if
that is too slow.
yes, i understand that so far - but there seems to be some more
serious problem. Is it possible that there is some unintended mutex
lock, because it seems like exactly that is happening. Maybe it is
related to the static variables, which might get mutexed
automatically? I heard there is a bug with the Apple gcc 4.0.1
regarding statics, but this morning I also tried the intel 11.0
compiler with the same dissapointing results. What makes me wonder,
ist that the same code runs just fine on Linux.
Some more background Information: The problem is definitevly caused by
calls to get() of the shared pointer. I am using it in a realtively
hot section of my code. Profiling is not so helpful, because there are
a bunch of unknown libraries in between my call and the pthread_once
call - and yes I also used a begug build of boost - I have not a clue
what is happening in between.
Could you show the code that accesses the thread_specific_ptr?
Okay, the calling is done by a simple:
HierarchyTraverser *ht = RenderThread::hierarchyTraverser();
(there is nothing boost related stuff before and after that call)
while that is:
inline HierarchyTraverser* RenderThread::hierarchyTraverser()
{
#ifdef BOOST
return reinterpret_cast<HierarchyTraverser*>(mHierarchyTraverser.get());
#else
return reinterpret_cast<HierarchyTraverser*>(pthread_getspecific(mHierarchyTraverser));
#endif
}
and the mHierarchyTraverser is of type
static boost::thread_specific_ptr<unsigned long int> mHierarchyTraverser;
Hope that helps, but as you can see its basically pretty unspectacular.
Oliver