Boost logo

Boost :

From: Sid Sacek (ssacek_at_[hidden])
Date: 2007-09-23 19:49:29


Hi Phil,

I had a chance to think about this issue with spin-locks a bunch more.

Okay, while I agree with you that the possibility of a thread being
pre-empted in the middle of the assembly lock-routine is small, there is
still that possibility that it will happen, and depending on the
application, it might happen noticeably too often.

Knowing that, the programmer should be given the option whether to use ( or
not use ) the spin-locking mechanism in their code. I'm certain that most
applications will be fine using the locks in this manner, but there will be
cases where the possibility of wasting CPU cycles will be unacceptable. This
will all depend on the application and how often locking is performed in
that program.

I know that I personally would want the option of being able to turn that
feature on and off at my discretion, and I'm sure that others would too.

If you make it so that ARM coders are forced to always use the spin-locks,
they may then be forced into not using the boost::shared_ptrs and come up
with their own implementations.

If I had a say in this matter, I would request that the underlying locking
mechanism be configurable.

Regards,
-Sid Sacek

-----Original Message-----
From: boost-bounces_at_[hidden] [mailto:boost-bounces_at_[hidden]]
On Behalf Of Phil Endecott
Sent: Sunday, September 16, 2007 12:16 PM
To: boost_at_[hidden]
Subject: Re: [boost] Shared pointer atomic assembler for ARM?

Sid Sacek wrote:
> I don't like the spin-lock approach for the atomic variables.
>
> If the mutex is already locked by another thread, spinning just wastes CPU
> cycles on a single-processor machine until the time-slice for the current
> thread comes to an end. That's usually a 10-millisecond waste of CPU in
the
> worst case scenario.

Hi Sid,

Yes, but because the variable is locked for just a handful of
instructions during the atomic increment/decrement, the probability of
being pre-empted is small. So spinning is rare, and the average
slowdown is small. It is more important to optimise the common case
where the variable is not locked. In particular, spinning is much
better than using pthreads, which is what boost::shared_ptr does at
present on ARM Linux; the function call overhead alone will dwarf the
time spent spinning. It also requires that each shared pointer is
larger since it must also store the mutex.

To quantify this, I've timed a simple single-threaded loop that does
100 million shared pointer increments and decrements. With pthreads it
takes 180 seconds, while with my spinning implementation it takes 22
seconds. That's about an additional microsecond per atomic operation.
Comparing with the 10 ms that is wasted by spinning when an atomic
operation is interrupted, the balance is when one in every 10 000
atomic operations is interrupted by a thread that will access the same
variable. On my hardware, the window during which the variable stores
the sentinel value is about 15 ns, so scheduling 100 times per second
the probability of interrupting an atomic operation will be one in 666
667. That's a much lower probability than one in 10 000, so the
spinning solution is better. [Do let me know if you can see any flaws
in my logic.]

An alternative is to simply yield when the thread is found to be locked:

#include <sched.h>

static inline long atomic_read_and_lock(long& mem)
{
   long val;
   do {
     // Equivalent to:
     // val = mem;
     // mem = -1;
     asm volatile ("swp\t%0, %1, [%2]"
                  :"=&r" (val)
                  :"r" (-1),
                   "r" (&mem)
                  :"memory");
     if (val != -1) {
       break;
     }
     sched_yield();
   } while (1);
   return val;
}

In the typical case this executes no more instructions than the spin
code, and my test program runs at the same speed. But it does increase
the code size, which matters to me on my embedded system with not much
cache. It is also OS-specific.

The benefits of each approach will depend on the workload, i.e. how
often shared pointers are actually used by more than one thread, and I
suggest benchmarking your application with each of the alternatives.

Regards,

Phil.

_______________________________________________
Unsubscribe & other changes:
http://lists.boost.org/mailman/listinfo.cgi/boost


Boost list run by bdawes at acm.org, gregod at cs.rpi.edu, cpdaniel at pacbell.net, john at johnmaddock.co.uk