Memory ordering describes the order of accesses to computer memory by a CPU. The term can refer either to the memory ordering generated by the compiler during compile time, or to the memory ordering generated by a CPU during runtime.
In modern microprocessors, memory ordering characterizes the CPUs ability to reorder memory operations - it is a type of out-of-order execution. Memory reordering can be used to fully utilize the bus-bandwidth of different types of memory such as caches and memory banks.
On most modern uniprocessors memory operations are not executed in the order specified by the program code. In single threaded programs all operations appear to have been executed in the order specified, with all out-of-order execution hidden to the programmer – however in multi-threaded environments (or when interfacing with other hardware via memory buses) this can lead to problems. To avoid problems memory barriers can be used in these cases.
The compiler has some freedom to sort the order of operations during compile time. However this can lead to problems if the order of memory accesses is of importance.
See also: Memory barrier |
These barriers prevent a compiler from reordering instructions during compile time – they do not prevent reordering by CPU during runtime.
asm volatile("" ::: "memory");
or even
__asm__ __volatile__ ("" ::: "memory");
forbids GCC compiler to reorder read and write commands around it.[1]
atomic_signal_fence(memory_order_acq_rel);
forbids the compiler to reorder read and write commands around it.[2]
__memory_barrier()
_ReadWriteBarrier()
There are several memory-consistency models for SMP systems:
On some CPUs
Type | Alpha | ARMv7 | PA-RISC | POWER | SPARC RMO | SPARC PSO | SPARC TSO | x86 | x86 oostore | AMD64 | IA-64 | z/Architecture | |
---|---|---|---|---|---|---|---|---|---|---|---|---|---|
Loads reordered after loads | Y | Y | Y | Y | Y | Y | Y | ||||||
Loads reordered after stores | Y | Y | Y | Y | Y | Y | Y | ||||||
Stores reordered after stores | Y | Y | Y | Y | Y | Y | Y | Y | |||||
Stores reordered after loads | Y | Y | Y | Y | Y | Y | Y | Y | Y | Y | Y | Y | |
Atomic reordered with loads | Y | Y | Y | Y | Y | ||||||||
Atomic reordered with stores | Y | Y | Y | Y | Y | Y | |||||||
Dependent loads reordered | Y | ||||||||||||
Incoherent instruction cache pipeline | Y | Y | Y | Y | Y | Y | Y | Y | Y |
Some older x86 and AMD systems have weaker memory ordering[9]
SPARC memory ordering modes:
See also: Memory barrier |
Many architectures with SMP support have special hardware instruction for flushing reads and writes during runtime.
lfence (asm), void _mm_lfence(void) sfence (asm), void _mm_sfence(void)[10] mfence (asm), void _mm_mfence(void)[11]
sync (asm)
sync (asm)
mf (asm)
dcs (asm)
dmb (asm) dsb (asm) isb (asm)
Some compilers support builtins that emit hardware memory barrier instructions:
__sync_synchronize
.atomic_thread_fence()
command was added.MemoryBarrier()
.__machine_r_barrier
, __machine_w_barrier
and __machine_rw_barrier
.