Boost logo

Boost :

Subject: [boost] Performance of boost::mutex and boost::mutex::scoped_lock
From: Zachary Turner (divisortheory_at_[hidden])
Date: 2009-06-10 11:19:30


I have been doing some profiling using VTune recently to investigate the
cause of some underperforming multi-threaded code, and it's possible that
I'm not interpreting these results correctly, but if I am it's a little
surprising. This isn't the best multi-threaded design in the world, but
it's basically a naive implementation of a producer consumer that uses a
deque as the production queue, a boost::mutex as the lock, and a
boost::condition as a signal to wake up when there's stuff in the queue
(queue comment about boost::circular_buffer, which this code really should
be using instead). There is LOTS of context switching going on, basically I
read a 4KB chunk of data from the filesystem, put it in the queue, the other
threads takes the 4KB of data, does something with it, and this repeats for
a really long time. Since it doesn't take a whole lot of time to read 4KB
of data from the disk, you can tell that there are going to be zillions of
context switches and contention going on.

So on to the point. Obviously most of the time is spent in
boost::detail::basic_timed_mutex::lock, but what's surprising is the
breakdown of time inside the lock function:

(Keep in mind this is not timed over a terribly long sample, hence the small
numbers, but it is over thousands of actual locks, and ultimately this
locking alone makes up a noticeable portion of the sample duration)

boost::detail::basic_timed_mutex.lock
Total Time: 1,032,486 microseconds (1.032 seconds)
Total Wait Time: 505,884 microseconds (.505 seconds)

It appears that only half the time is actually spent inside the lock, which
is pretty crazy. The rest is spent doing computations, which doesn't seem
right for a lock. I drilled down further and came up with this:

boost::detail.get_system_time_sentinel
Total Time: 266,657 microseconds
Total Wait Time: 264 microseconds

So that accounts for about half of the computational time. The rest is in a
Windows system call somewhere that I was not able to drill into.

I don't know about the details of date_time library, or much about the
details of thread library either for that matter, but why is a time
necessary here at all? This is eventually resolving down to a
boost::detail::basic_timed_mutex, but there's nothing timed about the
operations I've done. I simply did the following:

boost::mutex m;
boost::mutex::scoped_lock lock(m);

//...

This should ultimately result in an infinite timeout being specified to the
system call, so all of this computation seems completely unnecessary. It
seems like a scoped_lock should completely bypass anything having to do with
date_time, and perhaps even normal timed locks should come up with a more
efficient strategy.

Thoughts?


Boost list run by bdawes at acm.org, gregod at cs.rpi.edu, cpdaniel at pacbell.net, john at johnmaddock.co.uk