← Concepts
Distributed Systems·3 min read

Leader Election

Pick exactly one node to coordinate, and re-pick safely when it dies, without ever ending up with two leaders.

First time reading this? Start here

Plain English: some jobs need exactly one node in charge (one writer, one scheduler). Leader election is how a cluster agrees on who that is, and how it picks a new boss when the current one dies, while guaranteeing you never accidentally get two bosses (split-brain), which corrupts data.

Used in:Apache KafkaDistributed CacheStock Exchange (Matching Engine)
What it is

A coordination mechanism by which a set of nodes agree on a single 'leader' responsible for some exclusive role: accepting writes, assigning work, or coordinating others. It's a core building block implemented by consensus protocols (Raft, Paxos, ZAB) and coordination services (ZooKeeper, etcd).

The problem it solves

Many tasks must be done by exactly one node: a single write primary, a single job scheduler, a single sequence generator. Hard-coding the leader makes it a single point of failure. Leader election lets the cluster choose a leader dynamically and, critically, elect a new one when the leader fails, without two nodes both believing they're in charge (split-brain), which causes divergent writes and data corruption.

How it works

Nodes detect leader failure via heartbeats. When the leader's heartbeats stop, candidates start an election. Consensus protocols require a candidate to win votes from a majority quorum (N/2+1) before becoming leader, and this is what prevents two leaders, since two different majorities can't exist simultaneously. The new leader operates within a bounded 'term'/'epoch'; stale leaders that come back are fenced off by the higher epoch number. Coordination services expose this as ephemeral nodes or leases: hold the lease (renewed via heartbeat) and you're leader; lose it and someone else takes over.

Why use it
What it costs you
Where it shows up in our architectures
Gotchas

Your notes

Private to you