|
Boost : |
From: John Max Skaller (skaller_at_[hidden])
Date: 2001-08-05 16:07:51
Alexander Terekhov wrote:
> > The key issue involves sharing data: if a thread writes to
> > some shared memory, how can another thread read that data?
> > When are the processor caches flushed?
>
> processor caches/cache coherence in general has nothing
> to do with memory synchronization (memory ordering guarantees).
Is it not a simple case of it? If caches are all
flushed, and common memory is only accessed at this point,
then the threads will get a consistent memory image.
Of course that is much stronger than required: only
common memory needs to be flushed, and in some cases,
only the relationship between locations as seem by
some processor needs to be consistent (not necessarily
'up to date').
> http://rsim.cs.uiuc.edu/~sadve/Publications/models_tutorial.ps
Thanks.
> 5.2.1 Cache Coherence and Sequential Consistency
> Several definitions for cache coherence (also referred to
> as cache consistency) exist in the literature. The strongest
> definitions treat the term virtually as a synonym for
> sequential consistency. Other definitions impose
> extremely relaxed ordering guarantees.
> ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
Right.
> also, memory visibility (synchronization) is hardly the
> "key issue". POSIX already resolves that issue without
> any changes with respect to "core language":
Wrong. POSIX may have a solution, but I point out
two things. First, it is a C solution. It is KNOWN that
POSIX does NOT answer all C++ questions. In particular,
on thread cancellation, it isn't specified if destructors
run. And secondly, we're NOT talking about POSIX.
When I said that synchronisation is the key issue,
I didn't mean to imply that technical solutions didn't
exist. I sure hope they do! The point is that we're talking
about introducing concepts and requirements into the abstract
C++ machine, so that it is possible to even state the requirements.
In other words, its been done before (there is existing practice
to rely on), but we have to do it again anyhow.
> 1388 A.4.10 Memory Synchronization
> 1441 Conforming applications may only use the functions listed to
> synchronize threads of control
> 1442 with respect to memory access.
Exactly. And we do NOT have a POSIX implementation,
so we will need to make corresponding decisions, and then
modify the abstract machine model so it is even possible
to describe the requirements on the library.
Please look at the C++ abstract machine/conformance
model as described in the C++ Standard, and suggest how to
modify the wording so as to allow specifications for the
threading library to be written. Normative specifications
for the library can't use terminology like 'thread'
or 'synchronisation point' until these terms are introduced
into this model. This work is complicated by a requirement
to minimise changes to the bulk of the C++ Standard document.
I observed when reading the wording that there is
an _implicit_ assumption that events occur in some sequence,
even if the exact sequence is not fully specified.
This is not going to work in the presence of threads.
Instead, I think that 'for each thread' there is some sequence
of events, and that for the whole program there is a sequence
of event SETS, and the Standard places requirements on that:
thread 1 thread 2 program
e1 {e1}
e2 {e2}
e3 e3' {e3,e3'}
The compliance tester can only observe SOME of the events sets
for the whole program. The requirements that a compiler must meet
must therefore be specified in terms of allowable observable
event SETS, and we must work backwards from that to state
what the abstract requirements are on the behaviour of threads
of control executing translated code.
In particular: suppose you log events in some sequence.
Then, by examining the log, you have to be able to judge
the compiler NOT CONFORMING under some circumstances.
It is not enough to examine the sub sequence of events associated
with one of the threads in terms of the existing model,
since clearly writes to shared memory would violate that,
and, worse, you can't judge the compiler non-conforming
if the program fails to use a synchronisation technique
that the compiler is required to correctly translate.
One needs to make judgements like "thread 1 locked
a mutex here, and thread 2 waits on it, and then
they print the value of the same address, but the values
are different and they're required to be the same,
so the compiler is NOT CONFORMING".
Always remember the Standard is like a law of physics which
the experimenter (conformance tester) is judging by experiment.
It isn't enough, or even useful, to tell the programmer
what they must do. The programmer can do anything.
The question is always what the compiler is required
to do with a give piece of code, and how to judge if it
meets those requirements. Such indirect specification
of semantics is non-trivial, and very poorly represented
in many Standards documents which have been lifted from
programming guides like the ARM, which were intended to
address a different audience. That is, remember we're
NOT interested in telling the programmer how to use
the system correctly, we're interested in telling a conformance
tester how to judge a fault.
> 1456 It was believed that a simple statement intuitive to most
> 1457 programmers would be most effective.
[As an aside: I strongly dispute this kind of approach]
> 3116 threads. The following functions synchronize memory with respect to
> other threads:
> 3117 fork ()
> 3118 pthread_barrier_wait()
> 3119 pthread_cond_broadcast()
> 3120 pthread_cond_signal ()
> 3121 pthread_cond_timedwait()
> 3122 pthread_cond_wait()
> 3123 pthread_create()
> 3124 pthread_join ()
> 3125 pthread_mutex_lock()
> pthread_mutex_timedlock()
> pthread_mutex_trylock()
> pthread_mutex_unlock()
> pthread_spin_lock()
> pthread_spin_trylock()
> pthread_spin_unlock()
> pthread_rwlock_rdlock()
> pthread_rwlock_timedrdlock()
> pthread_rwlock_timedwrlock()
> pthread_rwlock_tryrdlock()
> pthread_rwlock_trywrlock()
> pthread_rwlock_unlock()
> pthread_rwlock_wrlock()
> sem_post()
> sem_trywait()
> sem_wait()
> wait()
> waitpid ()
Sounds like we should have a similar list.
> 3126 Unless explicitly stated otherwise, if one of the above functions
> returns an error, it is unspecified
> 3127 whether the invocation causes memory to be synchronized.
Right. Now note that we can't talk about
'synchronising' memory in the C++ abstract machine
until the model actually allows multiple threads of
control. I know this sounds easy to do, but I suspect
that it will be harder than writing a threading library.
That's because the threading library is written in a formal
language (C++) and can be easily reasoned about (indeed,
formal tools will allow much to be learned by mechanical
analysis -- that is, you can compile the code and expect type
errors to be detected) and tested (on existing systems with
threading support) whilst the wording of the
conformance model/abstract machine is basically an informal
natural language description of the formal core on which the
Standard rests.
> seen by any thread that later locks the same mutex. Again, data written
> after the mutex is unlocked
And this is an example of the kind of sloppy language
that leads to defect reports. Obviously, this is not meant to
apply to data writen 'after' a mutex is unlocked, by a thread
not participating in the synchronisation, since in this case
the very meaning of 'after' is undefined.
BTW: I'm writing all this because
1. I really really really <g> want C++ to support
concurrent programming.
2. The Standards committee SHOULD NOT accept any threads
library proposal UNTIL the abstract machine and conformance
models have been modified.
Pragmatically, a complete solution to the conformance model
isn't required to accept a threading library, but work must
have commenced on it so that there is evidence that it will
be possible to support the library.
-- John (Max) Skaller, mailto:skaller_at_[hidden] 10/1 Toxteth Rd Glebe NSW 2037 Australia voice: 61-2-9660-0850 New generation programming language Felix http://felix.sourceforge.net Literate Programming tool Interscript http://Interscript.sourceforge.net
Boost list run by bdawes at acm.org, gregod at cs.rpi.edu, cpdaniel at pacbell.net, john at johnmaddock.co.uk