Release consistency
Release consistency is one of the synchronization-based consistency models used in concurrent programming.
Introduction
In modern parallel computing systems, memory consistency must be maintained to avoid undesirable outcomes. Strict consistency models like sequential consistency are intuitively composed but can be quite restrictive in terms of performance as they would disable instruction level parallelism which is widely applied in sequential programming. To achieve better performance, some relaxed models have been explored, with release consistency being an aggressively relaxed attempt.Release consistency vs. sequential consistency
Hardware structure and program-level effort
Sequential consistency can be achieved simply by hardware implementation, while release consistency is also based on an observation that most of the parallel programs are properly synchronized. In programming level, synchronization is applied to clearly schedule a certain memory access in one thread to occur after another. When a synchronized variable is accessed, hardware would make sure that all writes local to a processor have been propagated to all other processors and all writes from other processors are seen and gathered. In release consistency model, the action of entering and leaving a critical section are classified as acquire and release and for either case, explicit code should be put in the program showing when to do these operations''.''Conditions for sequential consistent result
In general, a distributed shared memory is release consistent if it obeys the following rules:1. Before an access to a shared variable is performed, all previous acquires by this processor must have completed.
2. Before a release is performed, all previous reads and writes by this process must have completed.
3. The acquire and release accesses must be processor consistent.
If the conditions above are met and the program is properly synchronized, the results of any execution will be exactly the same as they would have been executed following sequential consistency. In effect, accesses to shared variables are separated into atomic operation blocks by the acquire and release primitives so that races and interleaving between blocks will be prevented.
Implementations
Lock release">Lock (computer science)">Lock release
A lock release can be considered as a type of release synchronization. Assume a loop operation is performed using the code shown to the right. Two threads intend to enter a critical section and read the most recent value of, then exit the critical section. The code shows that thread 0 first acquires the lock and enters the critical section. In order to execute correctly, P1 must read the latest value of written by P0. In that case, only one thread can be in the critical section at a time. Therefore, the synchronization itself has ensured that the successful lock acquisition at P1 occurs after lock release by P0. Besides, the S2 -> S3 ordering has to be ensured, since the P0 must propagate the new value of to P1. For the same reason, S5 must occur after S4.Correctness is not affected if memory accesses following the unlock issue before the unlock complete or memory accesses prior to a lock issue after the lock acquisition. However, the code in critical section can not be issued prior to the lock acquisition is complete because mutual exclusion may not be guaranteed.
Post-wait
Post-wait synchronization is another implementation form of release consistency. As shown in the code to the right, correctness can be ensured if post operations occur only after all memory access are complete, especially the store to ‘a’. Apart from that, read operation should not be executed until the wait operation has completed. S2 acts as a release synchronization and S3 acts as an acquire synchronization. Therefore, S2 needs to prevent previous execution from occurring after it, and S3 needs to prevent any later execution from occurring before it. S2 does not need to prevent later execution from occurring before it, Likewise, S3 does not need to prevent any previous execution from occurring after it.Lazy release consistency
Lazy release consistency is a further optimization of release consistency. It assumes that the thread executing an acquire access does not need the values written by other threads until the acquire access has completed. Hence, all behavior of coherence can be delayed and timing for write propagation can be tweaked.Example
Consider the scenarios described in the image to the right. This case shows when write propagation is performed on a cache-coherent system based on the release consistency model. The variable datum is completely propagated before the propagation of datumIsReady. But the value of datum is not needed until after the acquire synchronization access in P1 and it can be propagated along with datumIsReady without harming the result of the program.The second image displays what is the case when lazy release consistency is applied. Considering this scenario, all values written ahead of the release synchronization are delayed and propagated together with the propagation of the release access itself. Hence, datum and datumIsReady are propagated together at the release point.
"TreadMarks" is an actual application of lazy release consistency.
Performance improvement over release consistency
Lazy release consistency can outperform release consistency in certain cases. If there is a system with little bandwidth between processors or it suffers badly from the higher overheads because of frequent propagation of small block of data versus infrequent large data block propagation, LRC can really help the performance.Consider a system employs a software level shared memory abstraction rather than an actual hardware implementation. In this system, write propagation is executed at a page granularity, which makes it extremely expensive to propagate a whole page when only one block in this page is modified. Therefore, write propagation is delayed until a release synchronization point is reached and the entire page will be modified at this time and the whole page will be propagated.