Chat (Slack-style): System Design

Requirements & API: Chat (Slack-style)

The first move in any interview: define requirements and sketch the API before drawing a single box.

Functional requirements

•Post a message to a channel and fan it out to every online member in real time.
•Members of one channel may be connected to different gateway boxes, so delivery has to cross boxes.
•Persist every message durably; let a reconnecting client catch up from its last cursor.
•Track presence and typing indicators per channel.

Non-functional requirements

•The design point is connection volume (~10M concurrent WebSockets), not message throughput.
•Sub-second live fan-out; real-time path may be best-effort as long as the durable store backstops it.
•Gateway tier and chat logic must scale and deploy independently, so a chat deploy can't drop every socket.
•Real-time fan-out (Pub/Sub) is at-most-once; durability + cursor catch-up guarantees no message is lost.

API contract

WS send: { channel_id, client_msg_id, body } → ack { msg_id, ts }

Routed to the Chat Service, which validates membership and persists before fan-out.

Pub/Sub publish: channel:{id} → { msg_id, sender, body, ts }

Every gateway holding a subscriber to that channel receives it and pushes to local sockets.

GET /api/v1/channels/{id}/messages?since={cursor} → { messages[], next_cursor }

Reconnect catch-up from Postgres, the safety net behind at-most-once Pub/Sub.

About Chat (Slack-style)

Picture a busy Slack channel with a few hundred people in it. Someone types a message, and it has to show up on everyone's screen at once. That fan-out is the whole problem. A chat app is real-time messaging with a twist: instead of sending one message to one person, you send it to everyone in a room, and those people are connected to many different servers.

The surprising part is what the system is sized around. It is not message volume; most workplaces don't chat that much. It is connection volume. Every online user holds an open WebSocket the entire time the app is running, so you are building for millions of live connections, not millions of messages per second.

Here is how a message travels. You post to a channel. The chat service needs to reach every member's connection, but those connections are scattered across many gateway servers. So it publishes the message to the channel's topic on a pub/sub layer (Redis pub/sub or a message bus). Think of it like a radio station: the chat service broadcasts once, and every gateway tuned to that channel hears it and forwards it to the users connected to it. Presence (who's online, who's typing) and read receipts ride along the same path.

One design choice pays off again and again: keep the connection tier separate from the chat logic. Then you can deploy or scale the chat service without dropping everyone's WebSocket. This system covers room-based pub/sub fan-out, the gateway tier, and why scaling for connections is a different problem from scaling for messages.