Apologies if this is a question that gets asked ad nauseum on this list, but why is the shared_mutex implementation not a wrapper for pthread_rwlock on pthreads-based systems? I observe significantly better performance under concurrent reader access when using the pthreads implementation compared to acquiring a reader lock on shared_mutex, particularly on OSX where the cost of a contended pthread_mutex is extremely high (a short-hold, high-utilization, mostly-read shared_mutex on that platform effectively serializes access).
Best,
--nate