What an interviewer expects you to nail down before drawing a single box.
WS connect /docs/{doc_id} → stream of {op, rev}send_op { doc_id, base_rev, op } → { transformed_op, new_rev }GET /api/v1/docs/{doc_id} → { snapshot, rev, acl }send_presence { doc_id, cursor, selection }Five people open the same Google Doc and all start typing at once, and somehow nobody's words get lost or scrambled. Every keystroke shows up instantly on your own screen and a fraction of a second later on everyone else's. Making concurrent edits merge into one consistent document, while keeping typing feel instant, is the entire challenge here.
Here is the flow in plain terms. Your browser holds the editor and an OT engine, so each keystroke is applied locally right away and then sent over a long-lived WebSocket. A WebSocket Gateway holds that connection and forwards your edit to the OT Server that owns this document. That server keeps the document's canonical sequence of operations in memory, transforms your edit against any concurrent edits, broadcasts the result to everyone else, and appends it to a durable op log.
The subtle idea is Operational Transform itself. Say you type 'X' at position 10 at the same moment a colleague types 'Y' at position 10. Both edits reference 'position 10', but those positions mean different things once the other edit lands. It's like two people giving directions from 'the third house on the left' after a new house has been built on the street: the count has shifted. OT rewrites the second edit so it still points at the right spot, and the document converges instead of corrupting.
Notice how durability and load are handled. The op log is the real document, an append-only history that lets the OT server rebuild its in-memory state after a crash and powers undo and time-travel. To avoid replaying every edit ever made when someone opens a long doc, the system saves a snapshot every N ops and replays only the deltas since, the same snapshot-plus-log trick Git and databases use. Presence, the colored cursors of who's editing, runs as its own service so it doesn't clutter the op stream.
This system teaches operational transforms and CRDTs as two answers to merging concurrent edits, why each document is sticky-routed to a single OT server for a totally-ordered op log, the snapshot-plus-delta pattern for fast loads with full history, and why the most-collaborated documents are the architectural hot spot.