Boost logo

Boost :

From: Alexander Terekhov (terekhov_at_[hidden])
Date: 2001-08-03 12:37:51


> In particular, we need a library to see what changes
> are needed to the core language to make concurrency work: Beman
> has already outlined a synchronisation model in his paper.
> The key issue involves sharing data: if a thread writes to
> some shared memory, how can another thread read that data?
> When are the processor caches flushed?

processor caches/cache coherence in general has nothing
to do with memory synchronization (memory ordering guarantees).

http://rsim.cs.uiuc.edu/~sadve/Publications/models_tutorial.ps

5.2.1 Cache Coherence and Sequential Consistency
Several definitions for cache coherence (also referred to
as cache consistency) exist in the literature. The strongest
definitions treat the term virtually as a synonym for
sequential consistency. Other definitions impose
extremely relaxed ordering guarantees.
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^

also, memory visibility (synchronization) is hardly the
"key issue". POSIX already resolves that issue without
any changes with respect to "core language":

1388 A.4.10 Memory Synchronization
1389 In older multi-processors, access to memory by the processors was
strictly multiplexed. This
1390 meant that a processor executing program code interrogates or modifies
memory in the order
1391 specified by the code and that all the memory operation of all the
processors in the system
1392 appear to happen in some global order, though the operation histories
of different processors are
1393 interleaved arbitrarily. The memory operations of such machines are
said to be sequentially
1394 consistent. In this environment, threads can synchronize using
ordinary memory operations. For
1395 example, a producer thread and a consumer thread can synchronize
access to a circular data
1396 buffer as follows:
1397 int rdptr = 0;
1398 int wrptr = 0;
1399 data_t buf[BUFSIZE];
1400 Thread 1:
1401 while (work_to_do) {
1402 int next;
1403 buf[wrptr] = produce();
1404 next = (wrptr + 1) % BUFSIZE;
1405 while (rdptr == next)
1406 ;
1407 wrptr = next;
1408 }
1409 Thread 2:
1410 while (work_to_do) {
1411 while (rdptr == wrptr)
1412 ;
1413 consume(buf[rdptr]);
1414 rdptr = (rdptr + 1) % BUFSIZE;
1415 }
1416 In modern multi-processors, these conditions are relaxed to achieve
greater performance. If one
1417 processor stores values in location A and then location B, then other
processors loading data
1418 from location B and then location A may see the new value of B but the
old value of A. The
1419 memory operations of such machines are said to be weakly ordered. On
these machines, the
1420 circular buffer technique shown in the example will fail because the
consumer may see the new
1421 value of wrptr but the old value of the data in the buffer. In such
machines, synchronization can
1422 only be achieved through the use of special instructions that enforce
an order on memory
1423 operations. Most high-level language compilers only generate ordinary
memory operations to
1424 take advantage of the increased performance. They usually cannot
determine when memory
1425 operation order is important and generate the special ordering
instructions. Instead, they rely on
1426 the programmer to use synchronization primitives correctly to ensure
that modifications to a
1427 location in memory are ordered with respect to modifications and/or
access to the same location
1428 in other threads. Access to read-only data need not be synchronized.
The resulting program is
1429 said to be data race-free.
1430 Synchronization is still important even when accessing a single
primitive variable (for example,
1431 an integer). On machines where the integer may not be aligned to the
bus data width or be larger
1432 than the data width, a single memory load may require multiple memory
cycles. This means
1433 that it may be possible for some parts of the integer to have an old
value while other parts have a
1434 newer value. On some processor architectures this cannot happen, but
portable programs cannot
1435 rely on this.
1436 In summary, a portable multi-threaded program, or a multi-process
program that shares
1437 writable memory between processes, has to use the synchronization
primitives to synchronize
1438 data access. It cannot rely on modifications to memory being observed
by other threads in the
1439 order written in the application or even on modification of a single
variable being seen |
1440 atomically.
1441 Conforming applications may only use the functions listed to
synchronize threads of control
1442 with respect to memory access. There are many other candidates for
functions that might also be
1443 used. Examples are: signal sending and reception, or pipe writing and
reading. In general, any
1444 function that allows one thread of control to wait for an action
caused by another thread of
1445 control is a candidate. IEEE Std 1003.1-200x does not require these
additional functions to
1446 synchronize memory access since this would imply the following:
1447 · All these functions would have to be recognized by advanced
compilation systems so that
1448 memory operations and calls to these functions are not reordered by
optimization.
1449 · All these functions would potentially have to have memory
synchronization instructions
1450 added, depending on the particular machine.
1451 · The additional functions complicate the model of how memory is
synchronized and make
1452 automatic data race detection techniques impractical.
1453 Formal definitions of the memory model were rejected as unreadable by
the vast majority of
1454 programmers. In addition, most of the formal work in the literature
has concentrated on the
1455 memory as provided by the hardware as opposed to the application
programmer through the
1456 compiler and runtime system. It was believed that a simple statement
intuitive to most
1457 programmers would be most effective. IEEE Std 1003.1-200x defines
functions that can be used
1458 to synchronize access to memory, but it leaves open exactly how one
relates those functions to
1459 the semantics of each function as specified elsewhere in IEEE Std
1003.1-200x.
1460 IEEE Std 1003.1-200x also does not make a formal specification of the
partial ordering in time
1461 that the functions can impose, as that is implied in the description
of the semantics of each
1462 function. It simply states that the programmer has to ensure that
modifications do not occur
1463 ''simultaneously'' with other access to a memory location.

3111 4.10 Memory Synchronization
3112 Applications shall ensure that access to any memory location by more
than one thread of control
3113 (threads or processes) is restricted such that no thread of control
can read or modify a memory
3114 location while another thread of control may be modifying it. Such
access is restricted using
3115 functions that synchronize thread execution and also synchronize
memory with respect to other
3116 threads. The following functions synchronize memory with respect to
other threads:
3117 fork ()
3118 pthread_barrier_wait()
3119 pthread_cond_broadcast()
3120 pthread_cond_signal ()
3121 pthread_cond_timedwait()
3122 pthread_cond_wait()
3123 pthread_create()
3124 pthread_join ()
3125 pthread_mutex_lock()
pthread_mutex_timedlock()
pthread_mutex_trylock()
pthread_mutex_unlock()
pthread_spin_lock()
pthread_spin_trylock()
pthread_spin_unlock()
pthread_rwlock_rdlock()
pthread_rwlock_timedrdlock()
pthread_rwlock_timedwrlock()
pthread_rwlock_tryrdlock()
pthread_rwlock_trywrlock()
pthread_rwlock_unlock()
pthread_rwlock_wrlock()
sem_post()
sem_trywait()
sem_wait()
wait()
waitpid ()
3126 Unless explicitly stated otherwise, if one of the above functions
returns an error, it is unspecified
3127 whether the invocation causes memory to be synchronized.
3128 Applications may allow more than one thread of control to read a
memory location
3129 simultaneously.

http://www.primenet.com/~jakubik/mpsafe/MultiprocessorSafe.pdf
(no 4 is incorrect, i believe; synchronization works via mutex and not CV)

Here is what we can assume is safe to do when using POSIX (quoting
[Butenhof]):
1. Whatever memory values a thread can see when it creates a new thread can
also be seen by the new
thread once it starts. Any data written to memory after the new thread is
created may not necessarily be
seen by the new thread, even if the write occurs before the thread starts.
2. Whatever memory values a thread can see when it unlocks a mutex (leaves
a synchronized method or
block in Java), either directly or by waiting on a condition variable
(calling wait in Java), can also be
seen by any thread that later locks the same mutex. Again, data written
after the mutex is unlocked
may not necessarily be seen by the thread that locks the mutex, even if the
write occurs before the lock.
3. Whatever memory values a thread can see when it terminates, either by
cancellation, returning from its
run method, or exiting, can also be seen by the thread that joins with the
terminated thread by calling
join on that thread. And, of course, data written after the thread
terminates may not necessarily be seen
by the thread that joins, even if the write occurs before the join.
4. Whatever memory values a thread can see when it signals or broadcasts a
condition variable (calling
notify in Java) can also be seen by any thread that is awakened by that
signal or broadcast. And, one
more time, data written after the signal or broadcast may not necessarily
be seen by the thread that
wakes up, even if it occurs before it awakens.

regards,
alexander.


Boost list run by bdawes at acm.org, gregod at cs.rpi.edu, cpdaniel at pacbell.net, john at johnmaddock.co.uk