TikTok (For You Feed): System Design

Requirements & API: TikTok (For You Feed)

The first move in any interview: define requirements and sketch the API before drawing a single box.

Functional requirements

•Serve a personalized For You feed (not a follower-graph feed) as an infinite scroll.
•Accept video uploads, return fast, and transcode each into ~5 bitrate variants async.
•Capture every view/like/share/watch-time event to feed the ranking model.
•Regenerate each active user's candidate list periodically from recent activity.

Non-functional requirements

•Serving hot path must be sub-100ms. It is a fast list lookup, never online ML inference.
•ML inference is the real cost driver (~580K inferences/sec), so ranking runs offline at a slower cadence.
•Prefetch + adaptive bitrate so scrolling never shows a loading spinner.
•Ingest billions of activity events/day via Kafka without touching the serving path; candidates may lag minutes.

API contract

GET /api/v1/foryou?cursor={cursor}&limit=10 → { videos[], next_cursor }

Reads the precomputed candidate list from Cassandra and hydrates metadata. The hot path.

POST /api/v1/videos { upload_id, caption } → { video_id, status: 'processing' }

Returns immediately; transcode runs async via Kafka, video goes live when done.

POST /api/v1/events { video_id, type, watch_ms } → 202

Firehose of view/like/watch-time signals published to Kafka for the ranking pipeline.

About TikTok (For You Feed)

Open TikTok and the very first video already feels chosen for you, and so does the next one, and the one after that. There is no follow graph deciding what you see. A ranking model picks each clip from the entire catalog based on what you have watched, liked, and skipped. The surprising part is that almost none of that machine learning happens while you scroll. The scroll itself is a fast lookup, and that split is what makes the feed feel instant.

Here is the whole thing in plain steps. When you open the app, the Feed Service grabs a list of video ids that were already picked for you and stored in Cassandra, then fills in the metadata and hands back a batch. Your client immediately starts pulling those video bytes from a CDN, and it prefetches the next two or three clips so the next swipe never shows a spinner. Meanwhile every view, like, and watch-time signal you produce gets fired off to Kafka.

The heavy work runs on its own clock. A Ranking Worker wakes up every few minutes, reads your recent activity off the Kafka stream, runs the ML model, and writes a fresh candidate list back to the store. By the time you scroll again, new picks are waiting. This is the same idea as a kitchen that preps ingredients before the dinner rush instead of starting from raw vegetables when each order arrives, so the line stays fast even when it is busy.

Uploads follow the same decoupling. When a creator posts, the Upload Service stores the raw file, drops a job on Kafka, and returns right away, so the upload feels instant even though a Transcode Worker spends minutes turning the video into about five bitrate variants for adaptive playback. The lesson TikTok teaches is the split between heavy offline ranking and a light online serving path, plus how a single event log can feed both a recommendation pipeline and a transcoding pipeline at billions of events a day.