two or three weeks ago I reported a problem concerning very slow access to thread specific storage pointers on Mac OS in multithreaded environments. When using eight threads performance was reduced by about 80%. I have to say that I use more than a million calls to the thread specific storage per second, so you won't notice that problem with only a few calls.
In the meantime I have not found a clean solution, but I found the error, so the maintainer(s) or anybody more smart than me can think about a clean solution. The problem is caused in pthread/once.cpp in the method get_once_per_thread_epoch()
boost::uintmax_t& get_once_per_thread_epoch()
{
BOOST_VERIFY(!pthread_once(&epoch_tss_key_flag,create_epoch_tss_key));
void* data=pthread_getspecific(epoch_tss_key);
if(!data)
{
data=malloc(sizeof(boost::uintmax_t));
BOOST_VERIFY(!pthread_setspecific(epoch_tss_key,data));
*static_cast<boost::uintmax_t*>(data)=UINTMAX_C(~0);
}
return *static_cast<boost::uintmax_t*>(data);
}
On Mac OS X the first BOOST_VERIFY causes a fully executed call to pthread_once each time, which in turn uses mutexes to lock something. This is however not the case on Windows and Linux, where the performance is as expected. My "solution" to this problem was to simply comment the line out. As far as I understand the usage of BOOST_VERIFY it is only an assertion and not required to run the code properly. This then gaves me identical performance on all three platforms. I also tried to used different compilers, as I was told Apple gcc 4.0.1 had a problem with statics... but results were the same with the Intel compiler.
As far as I understand the Boost license, I am allowed to patch boost and distribute the compiled dynamic link library with my own software. I further understand, that I do not need to also distribute the patched source code. Is that correct?
Best regards,
Oliver