On 20.04.2009, at 17:34, Steven Watanabe wrote:

AMDG

Oliver Abert wrote:
two or three weeks ago I reported a problem concerning very slow access to thread specific storage pointers on Mac OS in multithreaded environments. When using eight threads performance was reduced by about 80%. I have to say that I use more than a million calls to the thread specific storage per second, so you won't notice that problem with only a few calls.

In the meantime I have not found a clean solution, but I found the error, so the maintainer(s) or anybody more smart than me can think about a clean solution. The problem is caused in pthread/once.cpp in the method get_once_per_thread_epoch()

       boost::uintmax_t& get_once_per_thread_epoch()
       {
           BOOST_VERIFY(!pthread_once(&epoch_tss_key_flag,create_epoch_tss_key));
           void* data=pthread_getspecific(epoch_tss_key);
           if(!data)
           {
               data=malloc(sizeof(boost::uintmax_t));
               BOOST_VERIFY(!pthread_setspecific(epoch_tss_key,data));
               *static_cast<boost::uintmax_t*>(data)=UINTMAX_C(~0);
           }
           return *static_cast<boost::uintmax_t*>(data);
       }

On Mac OS X the first BOOST_VERIFY causes a fully executed call to pthread_once each time, which in turn uses mutexes to lock something. This is however not the case on Windows and Linux, where the performance is as expected. My "solution" to this problem was to simply comment the line out. As far as I understand the usage of BOOST_VERIFY it is only an assertion and not required to run the code properly. This then gaves me identical performance on all three platforms. I also tried to used different compilers, as I was told Apple gcc 4.0.1 had a problem with statics... but results were the same with the Intel compiler.

The pthread_once call is necessary for the code to function correctly.
BOOST_VERIFY differs from BOOST_ASSERT in that it always
evaluates its argument.

okay, I understand - However for whatever reasons my code runs just fine without it. And very stable as well. Then the question is, why does pthread_once(&epoch_tss_key_flag,create_epoch_tss_key) takes so much time while it does not on windows and linux. It seems that it uses locks, since one thread has native performance, two threads are only 1.6 faster, while eight threads are less than half as fast as a single thread. 


As far as I understand the Boost license, I am allowed to patch boost and distribute the compiled dynamic link library with my own software. I further understand, that I do not need to also distribute the patched source code. Is that correct?

Yes.

Thanks for the information.

Best regards,

Oliver


---------------

Dipl.-Inform. Oliver Abert             Email: abert@uni-koblenz.de
Institut für Computervisualistik     Fon  : +49 261 287-2770
Universität Koblenz                       Fax  : +49 261 287-2735
Postfach 20 16 02                         Raum : B213, Gebäude B
56070 Koblenz