I hava a question regarding the implementation of set_current_thread_data which looks like belowe:
 
void set_current_thread_data(detail::thread_data_base* new_data)
{
  boost::call_once(current_thread_tls_init_flag,create_current_thread_tls_key);
  BOOST_VERIFY(TlsSetValue(current_thread_tls_key,new_data));
}
 
Why do not the thread_specific_ptr ctor call create_current_thread_tls_key?  so the call_once can be removed?
 
The reason for asking is that I saw a significant performance increase while moving from boost::thread_specific_ptr to a custom wrapper around the win32 TLS functions (TlsAlloc ..)  for my experimental implementation of hazard pointers. The test involved equal number of thread_specific_ptr::get() and popping from a lock free stack (including calling delete).