On Thu, Jul 23, 2015 at 12:00 AM, "class7class@163.com" <class7class@163.com> wrote:
I found the the store() method can not support several memory_orders metioned in the subject , and the other methods can not support all memory_orders,such as:
  1.the load method can not support memory_order_release and memory_order_acq_rel
  2. the compare_exchange_strong method can not support memory_order_release and memory_order_acq_rel
  3. the compare_exchange_weak method can not support memory_order_release and memory_order_acq_rel

I really want to know the reason that why the methods of atomic can not support all memory_orders.

If I need  the methods of the atomic<> support all memory_orders,how can i do?

You are asking into the area of C++ where the fewest people in the world understand, namely relaxed atomics.  Before I tell my limited understanding of the issue, I have an advice:
  • Use mutexes like boost::mutex to ensure all your accesses to variables shared among threads are not accessed concurrently, and you'll be fine.

If your circumstance forces you to ignore the above advice (like me, e.g., when you cannot afford the scheduling overheads of mutexes), here is yet another advice which may save your life:
  • Make all your variables that can be accessed concurrently from multiple threads (with at least one thread writing) atomic.  (This makes your program "data race free".)
  • Always use the default memory_order_seq_cst memory ordering for accesses to your atomics, never use anything else.

If your circumstance forces you to ignore the above advices (unlike me, I just learn them because I haven't heard about the second advice before; you'll have to be writing for old ARM (i.e., ones without the LDA and STL instructions) or POWER for you to have a real reason to use relaxed atomics, I write for neither), here is what I have learnt.

tl;dr

The boost atomic doesn't support all memory ordering because the C++11 standard doesn't support all of them.  The C++ standard doesn't support all memory ordering because not all make sense in the "memory model" supported by C++.  It is not likely to change in the near future, i.e., unless some genius devises some novel memory model which changes the status quo.

Here "memory model" means what "loads" from memory is allowed to return.  Because of compiler and hardware optimization, writes (including atomic writes) to memory locations may be reordered, so the load doesn't always return "the last value stored".  Special hardware instructions are necessary to ensure a particular ordering that the programmer needs.  The easiest way to understand a program is that "All memory operations by all threads looks as if they are done one after another, interleaving among threads".  But it proves to be too tricky for hardware to provide any reasonable performance for this model.

The C++ memory model without relaxed atomics is essentially "All atomic memory operations by all threads looks as if they are done one after another, interleaving among threads.  All other memory operations of each thread looks as if they are completely done between adjacent atomic memory operations.  But the programmer guarantees not to write a data race.  The compiler uses this assumption in all its work, and your program can break in all mysterious ways if you break this rule."  This is essentially the illusion provided by sequential consistency.  Everybody should want it.

Except some don't, because in some architectures it is essentially impossible to provide sequential consistency efficiently.  To do sequential consistency, one has to ensure that (1) the compiler doesn't reorder the statements written by the programmer, which is controllable by compiler writer; and (2) the hardware doesn't reorder the instructions generated by the compiler, which is damn hard.  It usually means the compiler must insert memory fences before and after memory accesses (i.e., tell the CPU "don't return to me until everything I do up to this point are visible by all other CPUs" or "don't return to me until everything other CPUs do up to this point are visible by me"), which is very slow.  It is particularly bad because for these architectures, fences must be inserted not just for atomic stores, but for the supposedly fast atomic loads as well.  (Recent architectures do better because they link the requirement into the accesses of the particular memory location: the "particular memory location" here limits the scope where the "fences" have to do, and this makes a big difference in performance.)

That's why relaxed atomics (i.e., memory_order_acq, etc) exists in the standard.  They allow those architectures to perform better than using full sequential consistency in some common cases.  Because different architectures require different sort of fences, it doesn't make sense for these to mean "insert a fence after the load".  Instead, the C++ standard uses a more programmer centric view when defining relaxed memory ordering:

  • If a thread X stores a value with release semantics into an atomic variable, and another thread Y loads this value with acquire semantics from that same atomic variable, then everything done by thread X before this releasing store are guaranteed to be visible to thread Y after this acquiring load.

There is no other semantics attached to the acquire and release.  This is why acquiring stores and releasing loads are meaningless: they are not assigned any semantics.  But the deeper reason for the lack is that the computer industry cannot find a way to give additional guarantee without incurring hefty performance overheads.  And at the same time, sequentially consistent atomics are not so expensive in the new architectures.

As for what to do if you want more than the above release-acquire guarantee: Simple, just use sequentially consistent atomic operations.

Regards,
Isaac