Memory ordering


Memory ordering is the order of accesses to computer memory by a CPU. Memory ordering depends on both the order of the instructions generated by the compiler at compile time and the execution order of the CPU at runtime. However, memory order is of little concern outside of multithreading and memory-mapped I/O, because if the compiler or CPU changes the order of any operations, it must necessarily ensure that the reordering does not change the output of ordinary single-threaded code.
The memory order is said to be strong or sequentially consistent when either the order of operations cannot change or when such changes have no visible effect on any thread. Conversely, the memory order is called weak or relaxed when one thread cannot predict the order of operations arising from another thread. Many naïvely written parallel algorithms fail when compiled or executed with a weak memory order. The problem is most often solved by inserting memory barrier instructions into the program.
In order to fully utilize the bandwidth of different types of memory such as caches and memory banks, few compilers or CPU architectures ensure perfectly strong ordering. Among the commonly used architectures, x86-64 processors have the strongest memory order, but may still defer memory store instructions until after memory load instructions. On the other end of the spectrum, DEC Alpha processors make practically no guarantees about memory order.

Compile-time memory ordering

Most programming languages have some notion of a thread of execution which executes statements in a defined order. Traditional compilers translate high-level expressions to a sequence of low-level instructions relative to a program counter at the underlying machine level.
Execution effects are visible at two levels: within the program code at a high level, and at the machine level as viewed by other threads or processing elements in concurrent programming, or during debugging when using a hardware debugging aid with access to the machine state. Compile-time memory order concerns itself with the former, and does not concern itself with these other views.

General issues of program order

Program-order effects of expression evaluation

During compilation, hardware instructions are often generated at a finer granularity than specified in the high-level code. The primary observable effect in a procedural programming language is assignment of a new value to a named variable.
sum = a + b + c;
print;
The print statement follows the statement which assigns to the variable sum, and thus when the print statement references the computed variable sum it references this result as an observable effect of the prior execution sequence. As defined by the rules of program sequence, when the print function call references sum, the value of sum must be that of the most recently executed assignment to the variable sum.
At the machine level, few machines can add three numbers together in a single instruction, and so the compiler will have to translate this expression into two addition operations. If the semantics of the program language restrict the compiler into translating the expression in left-to-right order, then the generated code will look as if the programmer had written the following statements in the original program:
sum = a + b;
sum = sum + c;
If the compiler is permitted to exploit the associative property of addition, it might instead generate:
sum = b + c;
sum = a + sum;
If the compiler is also permitted to exploit the commutative property of addition, it might instead generate:
sum = a + c;
sum = sum + b;
Note that the integer data type in most programming languages only follows the algebra for the mathematics integers in the absence of integer overflow and that floating-point arithmetic on the floating point data type available in most programming languages is not commutative in rounding effects, making effects of the order of expression visible in small differences of the computed result.
If the programmer is concerned about integer overflow or rounding effects in floating point, the same program may be coded at the original high level as follows:
sum = a + b;
sum = sum + c;

Program-order effects involving function calls

Many languages treat the statement boundary as a sequence point, forcing all effects of one statement to be complete before the next statement is executed. This will force the compiler to generate code corresponding to the statement order expressed. Statements are, however, often more complicated, and may contain internal function calls.
sum = f + g + h;
At the machine level, calling a function usually involves setting up a stack frame for the function call, which involves many reads and writes to machine memory. In most compiled languages, the compiler is free to order the function calls f, g, and h as it finds convenient, resulting in large-scale changes of program memory order. In a pure functional programming language, function calls are forbidden from having side effects on the visible program state and the difference in machine memory order due to function call ordering will be inconsequential to program semantics. In procedural languages, the functions called might have side-effects, such as performing an I/O operation, or updating a variable in global program scope, both of which produce visible effects with the program model.
Again, a programmer concerned with these effects can become more pedantic in expressing the original source program:
sum = f;
sum = sum + g;
sum = sum + h;
In programming languages where the statement boundary is defined as a sequence point, the function calls f, g, and h must now execute in that precise order.

Specific issues of memory order

Program-order effects involving pointer expressions

Now consider the same summation expressed with pointer indirection, in a language such as C or C++ which supports pointers:
sum = *a + *b + *c;
Evaluating the expression *x is termed "dereferencing" a pointer and involves reading from memory at a location specified by the current value of x. The effects of reading from a pointer are determined by architecture's memory model. When reading from standard program storage, there are no side-effects due to the order of memory read operations. In embedded system programming, it is very common to have memory-mapped I/O where reads and writes to memory trigger I/O operations, or changes to the processor's operational mode, which are highly visible side effects. For the above example, assume for now that the pointers are pointing to regular program memory, without these side-effects. The compiler is free to reorder these reads in program order as it sees fit, and there will be no program-visible side effects.
What if assigned value is also pointer indirected?
*sum = *a + *b + *c;
Here the language definition is unlikely to allow the compiler to break this apart as follows:
// as rewritten by the compiler
// generally forbidden
*sum = *a + *b;
*sum = *sum + *c;
This would not be viewed as efficient in most instances, and pointer writes have potential side-effects on visible machine state. Since the compiler is not allowed this particular splitting transformation, the only write to the memory location of sum must logically follow the three pointer reads in the value expression.
Suppose, however, that the programmer is concerned about the visible semantics of integer overflow and breaks the statement apart as the program level as follows:
// as directly authored by the programmer
// with aliasing concerns
*sum = *a + *b;
*sum = *sum + *c;
The first statement encodes two memory reads, which must precede the first write to *sum. The second statement encodes two memory reads which must precede the second update of *sum. This guarantees the order of the two addition operations, but potentially introduces a new problem of address aliasing: any of these pointers could potentially refer to the same memory location.
For example, let's assume in this example that *c and *sum are aliased to the same memory location, and rewrite both versions of the program with *sum standing in for both.
*sum = *a + *b + *sum;
There are no problems here. The original value of what we originally wrote as *c is lost upon assignment to *sum, and so is the original value of *sum but this was overwritten in the first place and it's of no special concern.
// what the program becomes with *c and *sum aliased
*sum = *a + *b;
*sum = *sum + *sum;
Here the original value of *sum is overwritten before its first access, and instead we obtain the algebraic equivalent of:
// algebraic equivalent of the aliased case above
*sum = + ;
which assigns an entirely different value into *sum due to the statement rearrangement.
Because of possible aliasing effects, pointer expressions are difficult to rearrange without risking visible program effects. In the common case, there might not be any aliasing in effect, so the code appears to run normally as before. But in the edge case where aliasing is present, severe program errors can result. Even if these edge cases are entirely absent in normal execution, it opens the door for a malicious adversary to contrive an input where aliasing exists, potentially leading to a computer security exploit.
A safe reordering of the previous program is as follows:
// declare a temporary local variable 'temp' of suitable type
temp = *a + *b;
*sum = temp + *c;
Finally consider the indirect case with added function calls:
*sum = f + g;
The compiler may choose to evaluate *a and *b before either function call, it may defer the evaluation of *b until after the function call f or it may defer the evaluation of *a until after the function call g. If the functions f and g are free from program visible side-effects, all three choices will produce a program with the same visible program effects. If the implementation of f or g contain the side-effect of any pointer write subject to aliasing with pointers a or b, the three choices are liable to produce different visible program effects.