Boost logo

Boost :

From: Boris Gubenko (Boris.Gubenko_at_[hidden])
Date: 2007-10-19 13:29:33


Markus Schoepflin wrote:
>> void foo (volatile int *mem, int val) { *mem = val; }
>
> .globl __7foo__FPVii
> .ent __7foo__FPVii
> 0000 __7foo__FPVii:
> .frame $sp, 0, $26
> .prologue 0
> .context full
> 0000 trapb
> 0004 stl val, (r16)
> 0008 ret (r26)
> .end __7foo__FPVii
>
> Here the compiler generates a trap barrier followed by a store instruction.
> As of chapter 5.2.2 of the Alpha architecture handbook, the access is
> guaranteed to be performed in a single atomic operation.

I'm not sure what trapb has to do with the issue at hand. trapb is not
memory barrier. According to "Tru64 Unix Assembly Language Programmer's
Guide", trapb "Guarantees that all previous *arithmetic* [emphasis mine]
instructions are completed, without incurring any arithmetic traps,
before any instructions after the trapb instruction are issued.".
Besides, I'm not sure it is guaranteed that the compiler will always
generate trapb in a function like foo() above.

>> void bar (volatile int *mem, int val) { __ATOMIC_EXCH_LONG(mem, val); }
>
> .globl __7bar__FPVii
> .ent __7bar__FPVii
> 0010 __7bar__FPVii:
> .frame $sp, 0, $26
> .prologue 0
> 0010 L$2:
> .context full
> 0010 mov val, r0
> 0014 ldl_l r1, (r16)
> 0018 stl_c r0, (r16)
> 001C unop
> 0020 beq r0, L$2
> 0024 ret (r26)
>
> Here the compiler generates a 'load locked' and 'store conditionally'
> sequence, wrapped by a loop repeated until the load/store has succeeded. I
> don't see why this should give me any advantage over the previous, when all
> I want is an atomic store, and I am not interested in the previous value.
> Could you please tell me?

I did not pay attention to the fact that atomic_write32() is a void
function. If it was returning value of the memory location to be updated,
then interlocked memory instructions would be necessary. For a void
function, perhaps, just 'stl' is fine meaning that your implementation
of atomic_write32() if fine. One advantage of using __ATOMIC_EXCH_LONG
I can see is that it enforces proper alignment of its first argument (by
aborting the process if it is not properly aligned).

Still, looking at atomic.hpp, I'm not sure why on some other architectures
atomic_write32() is implemented using special instructions like:

  winapi::interlocked_exchange((volatile long*)mem, val);
or
  atomic_xchg32(mem, val);

> Also, I now have two more questions, which you can probably answer:
>
> 1) Why is a trap barrier created in the first case, but not in the second?
>
> 2) According to the Alpha architecture handbook, branch prediction predicts
> backward branches to be taken, and it is recommended not to implement the
> load/store like above. (See documentation for STx_C, chapter 4.2.5.) Is
> this no longer true?

Unfortunately, I cannot answer any of these questions, hopefully, somebody
more knowledgeable in this area will. For a trapb, as I said before, I
don't think it has anything to do with the issue.

> Thank you for your help,
> Markus

Thank you for the interesting discussion and for all your efforts. Much
appreciated.

Boris

P.S. I'm disconnecting shortly (leaving for Rhode Island marathon to be
held tomorrow and won't have access to the computer until Sunday).


Boost list run by bdawes at acm.org, gregod at cs.rpi.edu, cpdaniel at pacbell.net, john at johnmaddock.co.uk