|
Boost : |
From: Phil Endecott (spam_from_boost_dev_at_[hidden])
Date: 2008-02-09 12:26:04
Peter Dimov wrote:
> Phil Endecott:
>> Peter's comment about evil spin-waits in lock() is interesting as I was
>> considering implementing something like that. I wonder about this
>> instead:
>>
>> void mutex::lock() {
>> if (atomic-attempt-to-lock) { // uncontended
>> return;
>> };
>> // first wait, try yielding since it's fast:
>> sched_yield();
>> if (atomic-attempt-to-lock) { // got lock on 2nd attempt
>> return;
>> }
>> // We failed again. Maybe there are lots of threads waiting; in that
>> // case it will be more efficient to futex_wait() since the kernel will
>> // then only wake us when our lock has been woken.
>> futex_wait(....);
>> }
>
> This isn't very good, either. Consider a scenario where your time slice is
> 10ms, but the mutex is unlocked 3ms after you attempt a lock.
So some other thread gets to do some (useful) work for 7 ms. But I
take your point.
> In my opinion, mutex::lock should never do fancy stuff behind the user's
> back in an attempt to achieve "better performance". If a
> trylock/sched_yield/lock pattern is better for a specific workload, it can
> be implemented in user space. If it isn't, there is no way to undo the
> damage.
Yes, I agree emphatically with this. I was pleased, for example, to
see that Howard's implementation of the multiple locking algorithm
doesn't need access to any "internals" to work efficiently. The
approach that I've been experimenting with is to supply a template
parameter to the mutex class that determines its behaviour, i.e.
template <typename WaitWake>
class mutex {
public:
void lock(); // tries to lock with an atomic operation, and
// calls WaitWake.wait() if it fails.
void unlock(); // calls WaitWake.wake().
// etc. - an N2447-like interface
};
The "WaitWake" concept is something with a futex-like interface:
class Futex {
int& var;
public:
Futex(int& var_);
void wait(int expected); // returns immediately if var!=expected,
// else blocks until wake() is called
void wake();
// etc.
};
I originally factored this out so that I could substitute a yield loop
or spinlock using these degenerate classes:
class Yield {
public:
Yield(int& var_) {}
void wait(int expected) { ::sched_yield(); }
void wake() {}
};
class Spin {
public:
Spin(int& var_) {}
void wait(int expected) {}
void wake() {}
};
Since then I've written a condition-variable class that also takes a
"WaitWake" template parameter.
But there are a couple of things that I can't do with this design:
- The yield-loop and spin-loop are compatible, in the sense that
different locks on the same mutex could use different implementations
(based on the expectation of blocking in each case) and it would still
work. The futex could be made compatible. So perhaps the WaitWake
should be a parameter of the lock, not of the mutex.
- But whether a lock is better off spinning or yielding is actually a
function of what the _other_ thread that currently holds the lock is
doing, not what this thread is going to do. So maybe we want a scheme
like this (pseudo-code):
mutex m;
...
{
lock l(m, 1e-6);
var++; // lock is only held for a few nanoseconds
}
...
{
lock l(m, 10);
var = result_of_http_request(); // lock is held for seconds
}
class lock {
mutex& m;
public:
lock(mutex& m_, float likely_hold_time):
m(m_)
{
if (m.likely_hold_time < 1e-3) {
m.spin_lock();
} else {
m.yield_lock();
}
m.likely_hold_time = likely_hold_time;
}
~lock() {
m.unlock();
}
.....
};
My thought process gets about this far and then I think "this is too
complicated". Simpler is better.
- My current scheme also doesn't allow for the sort of spin-then-wait
or yield-then-wait design at the top of this message. There are lots
of variations on this like "if (m.lock_holder_cpu != this_cpu) { spin
for a bit }".
In practice, my real code doesn't need most of these features; I have
maybe a dozen threads, not thousands, and lock contention seems to be
very rare. I got a worthwhile speedup by bypassing glibc's
pthread_mutex and pthread_condition implementation and going straight
to futexes (for the fast uncontended case), and I try to do as much
work as possible between context switches. If we had a set of
concurrency benchmarks available then I'd investigate a bit further.
Regards,
Phil.
Boost list run by bdawes at acm.org, gregod at cs.rpi.edu, cpdaniel at pacbell.net, john at johnmaddock.co.uk