Boost logo

Boost :

From: Alexander Terekhov (terekhov_at_[hidden])
Date: 2007-03-23 12:57:15


Anthony Williams wrote:
>
> "Peter Dimov" <pdimov_at_[hidden]> writes:
>
> > On x86 all loads already have acquire semantics by default, and all stores
> > have release semantics.

On Itanium, sure.

<quote source=Intel Itanium Architecture Software Developer's Manual>

6.3.4 Memory Ordering Interactions

IA-32 instructions are mapped into the Itanium memory ordering model as
follows:

- All IA-32 stores have release semantics

- All IA-32 loads have acquire semantics

- All IA-32 read-modify-write or lock instructions have release and
  acquire semantics (fully fenced).

</quote>

>
> Not according to the intel specs. 25366818.pdf (IA32 software developers
> manual volume 3A), section 7.7.2:

The thing is that x86 native doesn't have officially defined memory
model (Itanium mapping may well be stronger than x86 native).

Note that what you quote below was written for testers with scopes on
"system bus".

>
> "1. Reads can be carried out speculatively and in any order."

However...

http://www.well.com/~aleks/CompOsPlan9/0005.html

<quote author=an architect at Intel>

The PPro does speculative and out-of-order loads. However,
it has a mechanism called the "memory order buffer" to ensure
that the above memory ordering model is not violated. Load
and store instructions do not get retired until the processor
can prove there are no memory ordering violations in the actual
order of execution that was used. Stores do not get sent to
memory until they are ready to be retired. If the processor
detects a memory ordering violation, it discards all unretired
operations (including the offending memory operation) and
restarts execution at the oldest unretired instruction.

</quote>

Consider also:

Kourosh Gharachorloo, Anoop Gupta, and John Hennessy. Two techniques to
enhance the performance of memory consistency models. In Proceedings of
the 1991 International Conference on Parallel Processing (Vol. I
Architecture), pages 1-355-364, August 1991.

<quote>

The speculative-load buffer provides the detection mechanism by signaling
when the speculated result is incorrect. The buffer works as follows.
Loads that are retired from the reservation station are put into the
buffer in addition to being issued to the memory system. There are four
fields per entry (as shown in Figure 4): load address, acq, done, and
store tag. The load address field holds the physical address for the load.
The acq field is set if the load is considered an acquire access. For SC,
all loads are treated as acquires. The done field is set when the load is
performed. If the consistency constraints require the load to be delayed
for a previous store, the store tag uniquely identifies that store. A
null store tag specifies that the load depends on no previous stores.
When a store completes, its corresponding tag in the speculative-load
buffer is nullified if present. Entries are retired in a FIFO manner. Two
conditions need to be satisfied before an entry at the head of the buffer
is retired. First, the store tag field should equal null. Second, the
done field should be set if the acq field is set. Therefore, for SC, an
entry remains in the buffer until all previous load and store accesses
complete and the load access it refers to completes. Appendix A describes
how an atomic read-modify-write can be incorporated in the above
implementation.

We now describe the detection mechanism. The following coherence
transactions are monitored by the speculativeload buffer: invalidations
(or ownership requests), updates, and replacements.3 The load addresses
in the buffer are associatively checked for a match with the address of
such transactions.4 Multiple matches are possible. We assume the match
closest to the head of the buffer is reported. A match in the buffer for
an address that is being invalidated or updated signals the possibility
of an incorrect speculation. A match for an address that is being
replaced signifies that future coherence transactions for that address
will not be sent to the processor. In either case, the speculated value
for the load is assumed to be incorrect. Guaranteeing the constraints
for release consistency can be done in a similar way to SC. The
conventional way to provide RC is to delay a release access until its
previous accesses complete and to delay accesses following an acquire
until the acquire completes. Let us first consider delays for stores.
The mechanism that provides precise interrupts by holding back store
accesses in the store buffer is sufficient for guaranteeing that stores
are delayed for the previous acquire. Although the mechanism described
is stricter than what RC requires, the conservative implementation is
required for providing precise interrupts. The same mechanism also
guarantees that a release (which is simply a special store access) is
delayed for previous load accesses. To guarantee a release is also
delayed for previous store accesses, the store buffer delays the issue
of the release operation until all previously issued stores are
complete. In contrast to SC, however, ordinary stores are issued in a
pipelined manner.

</quote>

and, also somewhat related:

http://www.cs.wisc.edu/~cain/pubs/micro01_correct_vp.pdf

regards,
alexander.


Boost list run by bdawes at acm.org, gregod at cs.rpi.edu, cpdaniel at pacbell.net, john at johnmaddock.co.uk