Decouple producers from consumers with a durable buffer so spikes get absorbed and slow work happens asynchronously.
Tap Produce a few times fast. The consumer only drains one every 1.5s.
The producer and consumer never talk directly. When you burst messages, the queue absorbs them and the consumer works through the backlog at its own pace, so a traffic spike never drops work or crashes the slow consumer. If the consumer died, the messages would simply wait.
Plain English: instead of Service A calling Service B directly and waiting, A drops a message in a queue and moves on. B picks it up whenever it's ready. This soaks up traffic spikes, survives B being down for a while, and lets slow work happen in the background.
A durable, intermediary buffer that decouples producers (who enqueue messages) from consumers (who dequeue and process them). Classic brokers (RabbitMQ, SQS) deliver-then-delete per message; log-based systems (Kafka) keep an ordered, replayable log that consumers read by offset. Both let work happen asynchronously and absorb load.
Synchronous coupling is fragile: if A calls B directly and B is slow or down, A blocks or fails. A queue breaks that link: A enqueues and continues, B processes at its own pace. Queues also absorb bursts (a traffic spike fills the queue instead of melting the consumer), enable async work (resize the image after the upload returns), and let you scale consumers independently of producers.
Producers publish messages to a queue or topic. The broker persists them (so they survive a crash). Consumers pull (or are pushed) messages, process them, and acknowledge. Unacked messages are redelivered (at-least-once delivery). Failed messages eventually land in a dead-letter queue. Log-based brokers like Kafka instead keep messages on disk by offset, letting multiple consumer groups read the same stream and replay history.
Messages to offline users are queued per-recipient and delivered when the device reconnects
Send requests are enqueued and workers fan them out to providers, retrying failures and dead-lettering the dead ones
The log-based queue itself: partitioned, replicated, replayable, consumed by independent consumer groups
A Cassandra compaction storm caused read latency to spike, backing up the message fanout queue until it overflowed.
Discord's message fanout pipeline copies messages to every online member's session via a Kafka-backed queue consumed by workers reading from Cassandra. During a Cassandra compaction event, read latency on that node spiked from single-digit milliseconds to hundreds. Workers waiting on Cassandra acks started piling up. The Kafka consumer group fell behind. Lag grew faster than workers could drain it. Discord's queue had a max-lag threshold: once crossed, older events were dropped to keep the pipeline from stalling permanently. Over 1 million message-delivery events were dropped. Users in large servers saw their friends' messages but not the server's activity feed. The lesson: consumer lag needs a circuit breaker, not a silent overflow. Treat Cassandra compaction like a planned partial-degradation, not a background task.
Message queues come up whenever you have a slow or unreliable downstream, or whenever you need to absorb a traffic spike. The thing to say is 'this decouples the producer from the consumer and gives us natural backpressure.' Then immediately address the reliability model: at-least-once delivery means consumers must be idempotent. Candidates lose points by proposing a queue without addressing what happens when a message fails repeatedly, because without a dead-letter queue, a single bad message can block the whole consumer.