TL;DR Raft is a protocol for distributed consensus. It's a strong leader protocol where the leader forces the followers to have the same log entries. Only an up-to-date candidate can be elected which guarantees that a committed entry is contained in all future leaders.
Raft is a protocol for distributed consensus. Distributed consensus is about making all participants in a distributed system agree on certain things. Usually, each participant maintains a state with variables and there are events that lead to state changes. The sequence of events forms a log.
These participants collectively form a system that serves certain purposes. We want to keep all logs in sync. This requires establishing a total order of events.
Why don't we just make one of the nodes the leader and let it decide everything? That's a good idea, but which node?
Let's say we just hardcode one node as the leader. But if that node fails, everything is stalled; no progress can be made. The system has to sacrifice availability (liveness) to achieve correctness (safety).
Let's say we have an external system that picks the next leader. We need to ensure that such a failure event does not violate the contract between the client and the system. For example, the client expects every write acknowledged by the system to be available for reads again.
What about a semi-sync approach? The leader requires every data update to be replicated to one of the followers before acknowledging the client. If there is a leader failure, the external system needs to promote the follower with the most recent data. This is good overall but such a system doesn't solve the problem of distributed consensus:
Another problem is that such a problem can only tolerate a single node failure.
Raft (paper) solves the above problem
In Raft, a node can have one of the three states: follower, candidate, leader.
When there's no leader or the leader fails, followers turn themselves into candidates and launch campaigns to become leaders.
Raft has a concept called term, which is a monotonically increasing number. There's at most one leader per term.
A term corresponds to a certain time period. A term always starts with an election.
Each node keeps track of the current term, updating its term when another node informs it of a higher term.
Leader election process
What if there are multiple candidates at the same time? There could be split votes and none of the candidates wins. Raft introduces a randomness factor to the election timing to reduce such conflicts. Eventually, one candidate should win.
Term is also useful for evicting stale leaders. A leader of an old term turns itself into a follower when encountering a higher term.
Raft is a strong leader protocol. Log entries flow in one direction; only a leader can send log entries to followers. This simplifies the protocol. The leader forces the followers to duplicate its logs.
Only a leader processes request from clients. When there is an update, the leader adds a new entry to its log with its term. An entry is considered committed (final) when
Raft maintains certain properties
Leader Completeness: A leader must contain all committed entries.
Log matching: If a log entry at one position matches on logs on two nodes, all previous entries match as well.
State Machine Safety: If any server has applied a particular log entry to its state machine, then no other server may apply a different command for the same log index.
What if the leader commits a log entry and then dies immediately?
Is it possible for an entry to be replicated to the majority but never committed?