Boost logo

Boost Users :

Subject: Re: [Boost-users] Thread local storage
From: Oliver Abert (abert_at_[hidden])
Date: 2009-03-30 10:12:11


On 30.03.2009, at 14:08, Anthony Williams wrote:

> Oliver Abert <abert_at_[hidden]> writes:
>
>>> Thanks for alerting me to this thread Peter.
>>>
>>> Oliver Abert <abert_at_[hidden]> writes:
>>>
>>>> On 29.03.2009, at 19:36, Peter Dimov wrote:
>>>>
>>>>> Oliver Abert:
>>>>>> Hi Everyone,
>>>>>>
>>>>>> I am using Boost Threads (1.38) as threading library and I also
>>>>>> use
>>>>>> the thread_specific_ptr to store a minor amount of data per
>>>>>> thread
>>>>>> (I think currently it is like 5 different pointer values per
>>>>>> thread). Technically everything works out fine, but I am
>>>>>> having a
>>>>>> performance problem on Mac OS X. On Linux the performance is 10
>>>>>> times faster than on Mac OS. If I use pthreads on Mac OS I have
>>>>>> identical performance to the Linux version. Both versions are
>>>>>> running on the same machine using 8 threads both.
>>>>>
>>>>> What does your profiler say?
>>>>
>>>> about 80% of the time is spend in __spin_lock which in turnwas
>>>> called
>>>> by pthread_once. If I use only one thread (instead of 8) the
>>>> percantage goes down to 2.5% - which is still a bit much for my
>>>> taste.
>>>
>>> pthread_once is called by the thread_specific_ptr code to ensure
>>> that
>>> the TLS key it uses has been allocated and is valid. It's a real
>>> pain if
>>> that is too slow.
>>
>> yes, i understand that so far - but there seems to be some more
>> serious problem. Is it possible that there is some unintended mutex
>> lock, because it seems like exactly that is happening. Maybe it is
>> related to the static variables, which might get mutexed
>> automatically? I heard there is a bug with the Apple gcc 4.0.1
>> regarding statics, but this morning I also tried the intel 11.0
>> compiler with the same dissapointing results. What makes me wonder,
>> ist that the same code runs just fine on Linux.
>>
>> Some more background Information: The problem is definitevly caused
>> by
>> calls to get() of the shared pointer. I am using it in a realtively
>> hot section of my code. Profiling is not so helpful, because there
>> are
>> a bunch of unknown libraries in between my call and the pthread_once
>> call - and yes I also used a begug build of boost - I have not a clue
>> what is happening in between.
>
> Could you show the code that accesses the thread_specific_ptr?

Okay, the calling is done by a simple:

HierarchyTraverser *ht = RenderThread::hierarchyTraverser();

(there is nothing boost related stuff before and after that call)
while that is:

inline HierarchyTraverser* RenderThread::hierarchyTraverser()
{
#ifdef BOOST
        return
reinterpret_cast<HierarchyTraverser*>(mHierarchyTraverser.get());
#else
        return
reinterpret_cast
<HierarchyTraverser*>(pthread_getspecific(mHierarchyTraverser));
#endif
}

and the mHierarchyTraverser is of type
  static boost::thread_specific_ptr<unsigned long int>
mHierarchyTraverser;

Hope that helps, but as you can see its basically pretty unspectacular.

Oliver

>
> Anthony
> --
> Author of C++ Concurrency in Action | http://www.manning.com/williams
> just::thread C++0x thread library | http://www.stdthread.co.uk
> Just Software Solutions Ltd | http://www.justsoftwaresolutions.co.uk
> 15 Carrallack Mews, St Just, Cornwall, TR19 7UL, UK. Company No.
> 5478976
>
> _______________________________________________
> Boost-users mailing list
> Boost-users_at_[hidden]
> http://lists.boost.org/mailman/listinfo.cgi/boost-users



Boost-users list run by williamkempf at hotmail.com, kalb at libertysoft.com, bjorn.karlsson at readsoft.com, gregod at cs.rpi.edu, wekempf at cox.net