Race condition


A race condition or race hazard is the condition of an electronics, software, or other system where the system's substantive behavior is dependent on the sequence or timing of other uncontrollable events, leading to unexpected or inconsistent results. It becomes a bug when one or more of the possible behaviors is undesirable.
The term race condition was already in use by 1954, for example in David A. Huffman's doctoral thesis "The synthesis of sequential switching circuits".
Race conditions can occur especially in logic circuits or multithreaded or distributed software programs. Using mutual exclusion can prevent race conditions in distributed software systems.

In electronics

A typical example of a race condition may occur when a logic gate combines signals that have traveled along different paths from the same source. The inputs to the gate can change at slightly different times in response to a change in the source signal. The output may, for a brief period, change to an unwanted state before settling back to the designed state. Certain systems can tolerate such glitches, but if this output functions as a clock signal for further systems that contain memory, for example, the system can rapidly depart from its designed behaviour.
Consider, for example, a two-input AND gate fed with the following logic: A logic signal on one input and its Boolean negation on another input can never, in theory, output a true value as. If, however, changes in the value of take longer to propagate to the second input than the first when changes from false to true then a brief period will ensue during which both inputs are true, and so the gate's output will also be true.
A practical example of a race condition can occur when logic circuitry is used to detect certain outputs of a counter. If all the bits of the counter do not change exactly simultaneously, there will be intermediate patterns that can trigger false matches.

Critical and non-critical forms

A critical race condition occurs when the order in which internal variables are changed determines the eventual state that the state machine will end up in.
A non-critical race condition occurs when the order in which internal variables are changed does not determine the eventual state that the state machine will end up in.

Static, dynamic, and essential forms

A static race condition occurs when a signal and its complement are combined.
A dynamic race condition occurs when it results in multiple transitions when only one is intended. They are due to interaction between gates. It can be eliminated by using no more than two levels of gating.
An essential race condition occurs when an input has two transitions in less than the total feedback propagation time. Sometimes they are cured using inductive delay line elements to effectively increase the time duration of an input signal.

Workarounds

Design techniques such as Karnaugh maps encourage designers to recognize and eliminate race conditions before they cause problems. Often logic redundancy can be added to eliminate some kinds of races.
As well as these problems, some logic elements can enter metastable states, which create further problems for circuit designers.

In software

A race condition can arise in software when a computer program has multiple code paths that are executing at the same time. If the multiple code paths take a different amount of time than expected, they can finish in a different order than expected, which can cause software bugs due to unanticipated behavior. A race can also occur between two programs, resulting in security issues.
Critical race conditions cause invalid execution and software bugs and often happen when processes or threads depend on some shared state. Operations upon shared states are done in critical sections that must be mutually exclusive. Failure to obey this rule can corrupt the shared state.
A data race is a type of race condition. Data races are important parts of various formal memory models. The memory model defined in the C11 and C++11 standards specify that a C or C++ program containing a data race has undefined behavior.
A race condition can be difficult to reproduce and debug because the end result is nondeterministic and depends on the relative timing between interfering threads. Problems of this nature can therefore disappear when running in debug mode, adding extra logging, or attaching a debugger. A bug that disappears like this during debugging attempts is often referred to as a "Heisenbug". It is therefore better to avoid race conditions by careful software design.

Example

Assume that two threads each increment the value of a global integer variable by 1. Ideally, the following sequence of operations would take place:
Thread 1Thread 2Integer value
0
read value0
increase value0
write back1
read value1
increase value1
write back2

In the case shown above, the final value is 2, as expected. However, if the two threads run simultaneously without locking or synchronization, the outcome of the operation could be wrong. The alternative sequence of operations below demonstrates this scenario:
Thread 1Thread 2Integer value
0
read value0
read value0
increase value0
increase value0
write back1
write back1

In this case, the final value is 1 instead of the expected result of 2. This occurs because here the increment operations are not mutually exclusive. Mutually exclusive operations are those that cannot be interrupted while accessing some resource such as a memory location.

Data race

Not everyone regards data races as a subset of race conditions. The precise definition of data race is specific to the formal concurrency model being used, but typically it refers to a situation where a memory operation in one thread could potentially attempt to access a memory location at the same time that a memory operation in another thread is writing to that memory location, in a context where this is dangerous. This implies that a data race is different from a race condition as it is possible to have nondeterminism due to timing even in a program without data races, for example, in a program in which all memory accesses use only atomic operations.
This can be dangerous because on many platforms, if two threads write to a memory location at the same time, it may be possible for the memory location to end up holding a value that is some arbitrary and meaningless combination of the bits representing the values that each thread was attempting to write; this could result in memory corruption if the resulting value is one that neither thread attempted to write. Similarly, if one thread reads from a location while another thread is writing to it, it may be possible for the read to return a value that is some arbitrary and meaningless combination of the bits representing the value that the memory location held before the write, and of the bits representing the value being written.
On many platforms, special memory operations are provided for simultaneous access; in such cases, typically simultaneous access using these special operations is safe, but simultaneous access using other memory operations is dangerous. Sometimes such special operations are called atomic or synchronization operations, whereas the ordinary operations are called data operations. This is probably why the term is data race; on many platforms, where there is a race condition involving only synchronization operations, such a race may be nondeterministic but otherwise safe; but a data race could lead to memory corruption or undefined behavior.

Example definitions of data races in particular concurrency models

The precise definition of data race differs across formal concurrency models. This matters because concurrent behavior is often non-intuitive and so formal reasoning is sometimes applied.
The C++ standard, in draft N4296, defines data race as follows in section 1.10.23

Two actions are potentially concurrent if
  • they are performed by different threads, or
  • they are unsequenced, and at least one is performed by a signal handler.
The execution of a program contains a data race if it contains two potentially concurrent conflicting actions, at least one of which is not atomic, and neither happens before the other, except for the special case for signal handlers described below . Any such data race results in undefined behavior.

The parts of this definition relating to signal handlers are idiosyncratic to C++ and are not typical of definitions of data race.
The paper Detecting Data Races on Weak Memory Systems provides a different definition:
two memory operations conflict if they access the same location and at least one of them is a write operation ...
Two memory operations, x and y, in a sequentially consistent execution form a race 〈x,y〉, iff x and y conflict, and they are not ordered by the hb1 relation of the execution. The race 〈x,y〉, is a data race iff at least one of x or y is a data operation.

Here we have two memory operations accessing the same location, one of which is a write.
The hb1 relation is defined elsewhere in the paper, and is an example of a typical "happens-before" relation; intuitively, if we can prove that we are in a situation where one memory operation X is guaranteed to be executed to completion before another memory operation Y begins, then we say that "X happens-before Y". If neither "X happens-before Y" nor "Y happens-before X", then we say that X and Y are "not ordered by the hb1 relation". So, the clause "... and they are not ordered by the hb1 relation of the execution" can be intuitively translated as "... and X and Y are potentially concurrent".
The paper considers dangerous only those situations in which at least one of the memory operations is a "data operation"; in other parts of this paper, the paper also defines a class of "synchronization operations" which are safe for potentially simultaneous use, in contrast to "data operations".
The Java Language Specification provides a different definition:
Two accesses to the same variable are said to be conflicting if at least one of the accesses is a write ... When a program contains two conflicting accesses that are not ordered by a happens-before relationship, it is said to contain a data race ... a data race cannot cause incorrect behavior such as returning the wrong length for an array.

A critical difference between the C++ approach and the Java approach is that in C++, a data race is undefined behavior, whereas in Java, a data race merely affects "inter-thread actions". This means that in C++, an attempt to execute a program containing a data race could crash or could exhibit insecure or bizarre behavior, whereas in Java, an attempt to execute a program containing a data race may produce undesired concurrency behavior but is otherwise safe.