Communication·3 min read

Message Queues

Decouple producers from consumers with a durable buffer so spikes get absorbed and slow work happens asynchronously.

Try it

Tap Produce a few times fast. The consumer only drains one every 1.5s.

Producer

0/5 sent

queue empty

depth: 01 msg / 1.5s →

Consumer

0 processed

The producer and consumer never talk directly. When you burst messages, the queue absorbs them and the consumer works through the backlog at its own pace, so a traffic spike never drops work or crashes the slow consumer. If the consumer died, the messages would simply wait.

First time reading this? Start here

Plain English: instead of Service A calling Service B directly and waiting, A drops a message in a queue and moves on. B picks it up whenever it's ready. This soaks up traffic spikes, survives B being down for a while, and lets slow work happen in the background.

Used in:WhatsApp Notification System Apache Kafka

What it is

A durable, intermediary buffer that decouples producers (who enqueue messages) from consumers (who dequeue and process them). Classic brokers (RabbitMQ, SQS) deliver-then-delete per message; log-based systems (Kafka) keep an ordered, replayable log that consumers read by offset. Both let work happen asynchronously and absorb load.

The problem it solves

Synchronous coupling is fragile: if A calls B directly and B is slow or down, A blocks or fails. A queue breaks that link: A enqueues and continues, B processes at its own pace. Queues also absorb bursts (a traffic spike fills the queue instead of melting the consumer), enable async work (resize the image after the upload returns), and let you scale consumers independently of producers.

How it works

Producers publish messages to a queue or topic. The broker persists them (so they survive a crash). Consumers pull (or are pushed) messages, process them, and acknowledge. Unacked messages are redelivered (at-least-once delivery). Failed messages eventually land in a dead-letter queue. Log-based brokers like Kafka instead keep messages on disk by offset, letting multiple consumer groups read the same stream and replay history.

Why use it

Decouples producers and consumers, so either side can scale, deploy, or fail independently
Absorbs traffic spikes: the queue buffers load the consumer couldn't take synchronously
Enables async background work and natural retries via redelivery + dead-letter queues

What it costs you

At-least-once delivery is the norm, so consumers must be idempotent or you'll double-process
Adds a moving part with its own scaling, monitoring, and backpressure concerns
Ordering and exactly-once are expensive and often only per-partition; global ordering is usually a fantasy

Where it shows up in our architectures

WhatsApp →
Messages to offline users are queued per-recipient and delivered when the device reconnects
Notification System →
Send requests are enqueued and workers fan them out to providers, retrying failures and dead-lettering the dead ones
Apache Kafka →
The log-based queue itself: partitioned, replicated, replayable, consumed by independent consumer groups

Gotchas

Assume at-least-once delivery and make consumers idempotent. Exactly-once is either a lie or extremely expensive, so design for duplicates instead.
A queue is not a fix for a permanently slow consumer; it just hides the backlog. If producers outpace consumers forever, the queue grows unbounded. Monitor queue depth and consumer lag.
Ordering guarantees are usually only per-partition/per-key, not global. If you need order, partition by the key that must stay ordered (e.g. per conversation).
Always wire up a dead-letter queue. Without one, a single poison message can block a partition or get retried forever.

When this went wrong in production

Discord's message queue backs up and drops 1M+ events · 2023

Postmortem ↗

A Cassandra compaction storm caused read latency to spike, backing up the message fanout queue until it overflowed.

Discord's message fanout pipeline copies messages to every online member's session via a Kafka-backed queue consumed by workers reading from Cassandra. During a Cassandra compaction event, read latency on that node spiked from single-digit milliseconds to hundreds. Workers waiting on Cassandra acks started piling up. The Kafka consumer group fell behind. Lag grew faster than workers could drain it. Discord's queue had a max-lag threshold: once crossed, older events were dropped to keep the pipeline from stalling permanently. Over 1 million message-delivery events were dropped. Users in large servers saw their friends' messages but not the server's activity feed. The lesson: consumer lag needs a circuit breaker, not a silent overflow. Treat Cassandra compaction like a planned partial-degradation, not a background task.

All war stories →

Interview angle

Message queues come up whenever you have a slow or unreliable downstream, or whenever you need to absorb a traffic spike. The thing to say is 'this decouples the producer from the consumer and gives us natural backpressure.' Then immediately address the reliability model: at-least-once delivery means consumers must be idempotent. Candidates lose points by proposing a queue without addressing what happens when a message fails repeatedly, because without a dead-letter queue, a single bad message can block the whole consumer.

Your notes

Private to you