Distribute keys across N nodes such that adding or removing a node only reshuffles ~1/N of the keys.
Add or remove a node. Watch how few keys actually move.
Each key is owned by the next node clockwise on the ring. Adding or removing a node only disturbs the keys in that node's arc, so caches and shards barely reshuffle when the cluster changes size.
Plain English: a way of dividing data across servers so that adding or removing a server only shuffles a small slice of the data, instead of forcing every server to rebuild its cache.
A hashing scheme that maps both keys and nodes onto the same ring. Picture servers as seats around a round table. A key is owned by the next seat clockwise from where it lands. Adding or removing a seat only changes ownership of the keys near it, and everything else stays put.
Regular modular hashing (key % N) seems fine until you add a node. Suddenly N becomes N+1 and almost every key gets remapped to a different node. For a cache, that means a near-total cache miss. For a sharded DB, it means a large rebalance. Both are operational nightmares. Consistent hashing reduces the churn from ~100% to ~1/N.
Hash each node id onto a fixed ring (typically 0 to 2^32). To find the owner of a key, hash the key the same way and walk clockwise until you hit a node. Adding a node means inserting it on the ring, so only keys between the new node and its anticlockwise neighbor move. In practice, each physical node is mapped to many 'virtual nodes' so the load spreads more evenly.
The whole architecture is built around the ring, where every node owns a range of the keyspace
Counters in Redis are distributed by consistent hash so adding capacity doesn't reset every user's rate-limit window
Redis timeline cache shards by user_id via consistent hashing
A Cassandra compaction storm caused read latency to spike, backing up the message fanout queue until it overflowed.
Discord's message fanout pipeline copies messages to every online member's session via a Kafka-backed queue consumed by workers reading from Cassandra. During a Cassandra compaction event, read latency on that node spiked from single-digit milliseconds to hundreds. Workers waiting on Cassandra acks started piling up. The Kafka consumer group fell behind. Lag grew faster than workers could drain it. Discord's queue had a max-lag threshold: once crossed, older events were dropped to keep the pipeline from stalling permanently. Over 1 million message-delivery events were dropped. Users in large servers saw their friends' messages but not the server's activity feed. The lesson: consumer lag needs a circuit breaker, not a silent overflow. Treat Cassandra compaction like a planned partial-degradation, not a background task.
When an interviewer asks about distributing load across cache nodes, they want to hear 'consistent hashing' and specifically that you know WHY it exists: adding a node with regular mod hashing causes a near-total cache miss storm. Mention virtual nodes to show you know the basic implementation isn't enough for even distribution. The mistake candidates make is saying 'just add more cache nodes' without explaining how keys get reassigned.