What an interviewer expects you to nail down before drawing a single box.
WS send: { conversation_id, client_msg_id, body } → ack { server_msg_id, ts }WS recv: server pushes { conversation_id, server_msg_id, sender, body, ts }GET /api/v1/messages?since={cursor} → { messages[], next_cursor }Send a WhatsApp message and it lands on your friend's phone in well under a second, or waits patiently if their phone is off. Doing that for billions of accounts is the core real-time messaging problem.
Unlike a normal website that answers a request and then hangs up, messaging needs the connection to stay open. Each online phone holds a long-lived WebSocket to a gateway server, and a single well-tuned Linux box can keep about a million of these open at once.
Here is the path of a message. The chat service writes it to durable storage first, usually Cassandra, before it even tries to deliver it. Durability comes before delivery, so a crash can never lose a message you already sent. Then it asks a presence service (Redis) which gateway your friend is connected to and pushes the message down that WebSocket.
What if they are offline? The message goes onto a queue (Kafka), and a push notification through APNs or FCM nudges their phone. When the phone wakes and reconnects, it pulls down everything waiting for it. This system teaches persistent-connection architecture, the gateway tier, durable message logs, presence tracking, and offline delivery, which are the building blocks of any real-time system.