What an interviewer expects you to nail down before drawing a single box.
GET /search?q={query}&page={n} → { results: [{ url, title, snippet }], total }internal: shard.search(canonical_query, k) → top-K candidates with local scoresinternal: index(doc): crawl/index pipeline appends to the inverted indexThink about typing 'distributed systems' into a search box and getting ten good results back in a fraction of a second, chosen from roughly a hundred billion web pages. That speed at that scale is the whole problem. The index is far too big for one machine, so the work has to be spread out, and the latency budget is tiny.
Here is the whole thing in plain terms. A query frontend first normalizes what you typed: lowercase it, stem the words, fix spelling, expand synonyms, so every part of the system sees the same canonical query. Then it fans that query out to every index shard at once. Each shard holds the inverted index for a slice of the web, searches its own slice, and returns just its top matches. A ranker collects everyone's top matches, re-ranks them with global signals like PageRank and freshness, hydrates the survivors with titles and snippets from a doc store, and sends back the final ten.
Why ask every shard instead of routing to one? Because documents are split across shards by ID, not by word, so a match for your query could live anywhere. That makes every query a scatter-gather, and the total time is set by the slowest shard to answer. It is like asking a hundred librarians to each check their own room and waiting for the last one to report back, so bounding stragglers matters more than raw average speed.
The index does not build itself on the query path. A separate offline pipeline crawls the web continuously, using a Bloom filter to cheaply remember which billions of URLs it has already seen, then batches the crawled pages into new index segments that get shipped to the live shards. New pages become searchable on a predictable lag rather than instantly.
This system teaches document-sharding versus term-sharding and why hot terms force the former, the scatter-gather pattern and the straggler problem, two-stage ranking that runs expensive global scoring only on candidates the shards already filtered, and the Bloom filter as a space-for-accuracy trade in the crawler.