Search Engine: System Design

Requirements & API: Search Engine

The first move in any interview: define requirements and sketch the API before drawing a single box.

Functional requirements

•Given a text query, return the globally top-ranked matching documents.
•Normalize the query (lowercase, stem, spell-correct, synonym-expand) before searching.
•Scatter the query to every document-sharded index, gather each shard's top-K, then merge and re-rank.
•Continuously crawl and (re)index the web so results reflect new and changed documents.

Non-functional requirements

•Low query latency at huge fan-out. Total latency is roughly the slowest shard, so stragglers must be bounded.
•Index freshness. Newly crawled documents must become searchable on a predictable lag.
•Ranking quality: global signals (PageRank, freshness, personalization) applied centrally, easy to A/B.
•Massive read scale; the index doesn't fit on one machine, so sharding and parallelism are mandatory.

API contract

GET /search?q={query}&page={n} → { results: [{ url, title, snippet }], total }

The hot path. Fans out to all index shards, then the ranker merges.

internal: shard.search(canonical_query, k) → top-K candidates with local scores

Each document-sharded index returns its best K; the ranker does the rest.

internal: index(doc): crawl/index pipeline appends to the inverted index

Offline path; not on the query critical path.

About Search Engine

Think about typing 'distributed systems' into a search box and getting ten good results back in a fraction of a second, chosen from roughly a hundred billion web pages. That speed at that scale is the whole problem. The index is far too big for one machine, so the work has to be spread out, and the latency budget is tiny.

Here is the whole thing in plain terms. A query frontend first normalizes what you typed: lowercase it, stem the words, fix spelling, expand synonyms, so every part of the system sees the same canonical query. Then it fans that query out to every index shard at once. Each shard holds the inverted index for a slice of the web, searches its own slice, and returns just its top matches. A ranker collects everyone's top matches, re-ranks them with global signals like PageRank and freshness, hydrates the survivors with titles and snippets from a doc store, and sends back the final ten.

Why ask every shard instead of routing to one? Because documents are split across shards by ID, not by word, so a match for your query could live anywhere. That makes every query a scatter-gather, and the total time is set by the slowest shard to answer. It is like asking a hundred librarians to each check their own room and waiting for the last one to report back, so bounding stragglers matters more than raw average speed.

The index does not build itself on the query path. A separate offline pipeline crawls the web continuously, using a Bloom filter to cheaply remember which billions of URLs it has already seen, then batches the crawled pages into new index segments that get shipped to the live shards. New pages become searchable on a predictable lag rather than instantly.

This system teaches document-sharding versus term-sharding and why hot terms force the former, the scatter-gather pattern and the straggler problem, two-stage ranking that runs expensive global scoring only on candidates the shards already filtered, and the Bloom filter as a space-for-accuracy trade in the crawler.