Paxos (computer science)
In computer science, Paxos is a family of protocols for solving consensus in a network of unreliable or fallible processors. Consensus is the process of agreeing on one result among a group of participants. This problem becomes difficult when the participants or their communications may experience failures.
Consensus protocols are the basis for the state machine replication approach to distributed computing, as suggested by Leslie Lamport and surveyed by Fred Schneider. State machine replication is a technique for converting an algorithm into a fault-tolerant, distributed implementation. Ad-hoc techniques may leave important cases of failures unresolved. The principled approach proposed by Lamport et al. ensures all cases are handled safely.
The Paxos protocol was first submitted in 1989 and named after a fictional legislative consensus system used on the Paxos island in Greece, where Lamport wrote that the parliament had to function "even though legislators continually wandered in and out of the parliamentary Chamber". It was later published as a journal article in 1998.
The Paxos family of protocols includes a spectrum of trade-offs between the number of processors, number of message delays before learning the agreed value, the activity level of individual participants, number of messages sent, and types of failures. Although no deterministic fault-tolerant consensus protocol can guarantee progress in an asynchronous network, Paxos guarantees safety, and the conditions that could prevent it from making progress are difficult to provoke.
Paxos is usually used where durability is required, in which the amount of durable state could be large. The protocol attempts to make progress even during periods when some bounded number of replicas are unresponsive. There is also a mechanism to drop a permanently failed replica or to add a new replica.
History
In 1988, Lynch, Dwork and Stockmeyer had demonstrated the solvability of consensus in a broad family of "partially synchronous" systems. Paxos has similarities to a protocol used for agreement in "viewstamped replication", first published by Oki and Liskov in 1988, in the context of distributed transactions. Paxos offered an elegant formalism and included one of the earliest proofs of safety for a fault-tolerant distributed consensus protocol.Reconfigurable state machines have ties to prior work on reliable group multicast protocols that support dynamic group membership e.g. Birman's work in 1985 and 1987 on the virtually synchronous gbcast protocol. gbcast is uncommon in supporting durability and addressing partitioning failures. Most reliable multicast protocols do not have these properties which are required for implementations of the state machine replication model. This point is discussed in a paper by Lamport, Malkhi and Zhou.
Paxos protocols are members of a theoretical class of solutions to a problem formalized as uniform agreement with crash failures. Lower bounds for this problem have been proved by Keidar and Shraer. Derecho, a C++ software library for cloud-scale state machine replication, offers a Paxos protocol that has been integrated with self-managed virtually synchronous membership. This protocol matches the Keidar and Shraer optimality bounds and maps efficiently to modern remote DMA datacenter hardware. It uses TCP if RDMA is not available.
Assumptions
In order to simplify the presentation of Paxos, the following assumptions and definitions are made explicit. Techniques to broaden the applicability are known in the literature, and are not covered in this article.Processors
- Processors operate at arbitrary speed.
- Processors may experience failures.
- Processors with stable storage may re-join the protocol after failures.
- Processors do not collude, lie, or otherwise attempt to subvert the protocol.
Network
- Processors can send messages to any other processor.
- Messages are sent asynchronously and may take arbitrarily long to deliver.
- Messages may be lost, reordered, or duplicated.
- Messages are delivered without corruption.
Number of processors
Safety and liveness properties
In order to guarantee safety, Paxos defines three properties and ensures the first two are always held, regardless of the pattern of failures:; Validity : Only proposed values can be chosen and learned.
; Agreement : No two distinct learners can learn different values.
; Termination : If value C has been proposed, then eventually learner L will learn some value.
Note that Paxos is not guaranteed to terminate, and thus does not have the liveness property. This is supported by the Fischer Lynch Paterson impossibility result which states that a consistency protocol can only have two of safety, liveness, and fault tolerance. As Paxos's point is to ensure fault tolerance and it guarantees safety, it cannot also guarantee liveness.
Typical deployment
In most deployments of Paxos, each participating process acts in three roles: Proposer, Acceptor and Learner. This reduces the message complexity significantly, without sacrificing correctness:By merging roles, the protocol "collapses" into an efficient client-master-replica style deployment, typical of the database community. The benefit of the Paxos protocols is the guarantee of its [|safety properties].
A typical implementation's message flow is covered in the section [|Multi-Paxos].
Basic Paxos
This protocol is the most basic of the Paxos family. Each "instance" of the basic Paxos protocol decides on a single output value. The protocol proceeds over several rounds. A successful round has 2 phases: phase 1 and phase 2. See below the description of the phases. Remember that we assume an asynchronous model, so e.g. a processor may be in one phase while another processor may be in another.Phase 1
Phase 1a: ''Prepare''
Phase 1b: ''Promise''
- If n is higher than every previous proposal number received by the Acceptor, then the Acceptor must return a message to the Proposer, indicating that the Acceptor will ignore all future proposals numbered less than or equal to n. The Promise must include the highest number among the Proposals that the Acceptor previously accepted, along with the corresponding accepted value. The first Prepare message vacuously satisfies this condition.
- If n is less than or equal to any previous proposal number received by the Acceptor, the Acceptor needn't respond and can ignore the proposal. However, for the sake of optimization, sending a denial, or negative acknowledgement, response would tell the Proposer that it can stop its attempt to create consensus with proposal n.
Phase 2
Phase 2a: ''Accept''
This Accept message should be interpreted as a "request", as in "Accept this proposal, please!".Phase 2b: ''Accepted''
Note that consensus is achieved when a majority of Acceptors accept the same identifier number. Because each identifier is unique to a Proposer and only one value may be proposed per identifier, all Acceptors that accept the same identifier thereby accept the same value. These facts result in a few counter-intuitive scenarios that do not impact correctness: Acceptors can accept multiple values, a value may achieve a majority across Acceptors only to later be changed, and Acceptors may continue to accept proposals after an identifier has achieved a majority. However, the Paxos protocol guarantees that consensus is permanent and the chosen value is immutable.When rounds fail
Paxos can be used to select a leader
Notice that a Proposer in Paxos could propose "I am the leader". Because of the agreement and validity guarantees of Paxos, if accepted by a Quorum, then the Proposer is now known to be the leader to all other nodes. This satisfies the needs of leader election because there is a single node believing it is the leader and a single node known to be the leader at all times.Graphic representation of the flow of messages in the basic Paxos
The following diagrams represent several cases/situations of the application of the Basic Paxos protocol. Some cases show how the Basic Paxos protocol copes with the failure of certain components of the distributed system.Note that the values returned in the Promise message are "null" the first time a proposal is made.
Basic Paxos without failures
In the diagram below, there is 1 Client, 1 Proposer, 3 Acceptors and 2 Learners. This diagram represents the case of a first round, which is successful.Here, V is the last of.
Error cases in basic Paxos
The simplest error cases are the failure of an Acceptor and failure of a redundant Learner. In these cases, the protocol requires no "recovery" : no additional rounds or messages are required, as shown below.Basic Paxos when an Acceptor fails
In the following diagram, one of the Acceptors in the Quorum fails, so the Quorum size becomes 2. In this case, the Basic Paxos protocol still succeeds.Client Proposer Acceptor Learner
| | | | | | |
X-------->| | | | | | Request
| X--------->|->|->| | | Prepare
| | | | ! | | !! FAIL !!
| |<---------X--X | | Promise
| X--------->|->| | | Accept!
| |<---------X--X--------->|->| Accepted
|<---------------------------------X--X Response
| | | | | |