Boost logo

Boost :

From: Matt Hurd (matt.hurd_at_[hidden])
Date: 2004-09-20 05:55:59


Thanks for the interest Tom,

<tomas.puverle_at_[hidden]> wrote:
> (first post to this forum, so be nice guys! :)
>
> > For example, on ia32 32 bit aligned ops are atomic
>
> But you will also need to express the constraint that the data be
> aligned?

Yep, I just assume the appropriate alignment compiler option
presently, which isn't really good enough but it works for me as I use
8 byte alignment, the default on Intel/MSVC compilers. This will
also break if you otherwise created a non-aligned object. But in
practice needs_lock<T> works for many situations for me when used
wisely. The performance benefits can be dramatic from avoiding such
locks and automating the choice is handy especially as it adds to the
self documenting nature of the code.

Without such an assumptions you would need to check if the address of
an rvalue modulo 4 or 8 == 0. This is a bit different as this is
dependent on the object rather than the type and more than a compile
time trait can handle.

Thus need_lock<T> would only be indicative of capability unless
alignment is assumed. Perhaps a preprocessor check or some such of
appropriate alignment could be used as well, but this would still not
guarantee correctness, just help sidestep a common pitfall.
 
> > , on ia64
> > it is 64 bit.
>
> However, remember that the fact that a given data type can be written in
> a single operation != visibility across all processors in an MP system.

True.

The usefulness depends on how you use it. For example, for many
size_t, float, int setters and getters on ia32 it is just dandy.
However, as you suggest, care must be taken as if the setting or
getting of the property may interfere with the incomplete state of
another concurrent transactions then you're in dangerous water. I
usually find this not to be the case, but when saving such cycles you
need to be aware of this and be aware of the atomicity, consistency
and isolation aspects. Objects with complex operations should always
lock completely unless you are very sure of what you are doing.

For a class which consists of a simple bunch of properties, these
concerns typically don't exist.

> > doubles on ia32 need locking to be atomic, on
> > ia64 they don't.
>
> I think that the later models of ia32 can write up to 8 bytes at a time.

You're very right. From below it appears to me that Pentium and above
can, only the 486/386 seem to miss out. Allowing atomic doubles and
64 bit ints with need_lock<T> is much nicer as long as your platform
supports it.

Begs the question: Should boost have specific architecture flags for
libs such as Boost.Thread.

This is the song and verse from Intel's software developer ia32 manual (vol 3):
____________________________________
<quote>
7.1.1. Guaranteed Atomic Operations
The Pentium 4, Intel Xeon, P6 family, Pentium, and Intel486 processors
guarantee that the
following basic memory operations will always be carried out atomically:
  â€¢ reading or writing a byte
  â€¢ reading or writing a word aligned on a 16-bit boundary
  â€¢ reading or writing a doubleword aligned on a 32-bit boundary

The Pentium 4, Intel Xeon, and P6 family, and Pentium processors
guarantee that the following
additional memory operations will always be carried out atomically:
  â€¢ reading or writing a quadword aligned on a 64-bit boundary
  â€¢ 16-bit accesses to uncached memory locations that fit within a
32-bit data bus

The P6 family processors guarantee that the following additional
memory operation will always
be carried out atomically:
  â€¢ unaligned 16-, 32-, and 64-bit accesses to cached memory that fit
within a 32-byte cache
line

Accesses to cacheable memory that are split across bus widths, cache
lines, and page boundaries are not guaranteed to be atomic by the
Pentium 4, Intel Xeon, P6 family, Pentium, and Intel486 processors.
The Pentium 4, Intel Xeon, and P6 family processors provide bus
control signals that permit external memory subsystems to make split
accesses atomic; however, nonaligned data accesses will seriously
impact the performance of the processor and should be avoided.
</quote>
____________________________________

> > Any suggestions on best practice for this?
>
> I think you might want to send this post to comp.programming.threads too
> - you may get some interesting feedback. I think Alexander Therekhov
> has some class wrapper for atomic types that I am sure he'd be keen to
> discuss, but unfortunately I've never had the time to review it.
>

Different type of atomicity to what I'm discussin here I think to the
traditional view of atomic ops. Atomic ops do something to a
location. Here I'm just talking about getting or setting rather than
operating on the value. There is no exchange or operation consistency
taking place, just yep you can read or write this many bytes with
consistency guaranteed.

There has also been talk of a memory model for C++ on std.c++
recently. A causal syntax for required ordering would be useful, but
getting beyond sequence points to a memory model with fencing or
whatever is going to be a hard slog. Probably better off making
memory fence like primitive available through a portable mechanism.
The problem with that is that some platforms might offer 15 different
types and one might offer 3, as was an example given. No easy answers
to there...

> Tom
Thanks Tom.

Any view on the ifdef, separate header, api wrapping tradeofs for such code?

Regards,

Matt Hurd
matthurd_at_[hidden]
www.hurd.com.au


Boost list run by bdawes at acm.org, gregod at cs.rpi.edu, cpdaniel at pacbell.net, john at johnmaddock.co.uk