Typeahead / Autocomplete: System Design

Requirements & API: Typeahead / Autocomplete

The first move in any interview: define requirements and sketch the API before drawing a single box.

Functional requirements

•Return top-K autocomplete suggestions for a given search prefix.
•Suggestions should be ranked by query frequency (global) with optional personalization.
•Update suggestions to reflect trending queries within minutes, not days.
•Support multiple locales and languages.

Non-functional requirements

•End-to-end latency under 100ms (suggestions should feel instant).
•High availability: a broken autocomplete is annoying; a broken search box is catastrophic.
•Read-heavy: billions of prefix queries per day, few writes (only from the aggregation pipeline).
•Stale suggestions (a few minutes old) are acceptable.

API contract

GET /autocomplete?q=netfl&limit=5&locale=en → { suggestions: [{ text, score }] }

The hot path. Served from Redis for ~99% of queries.

POST /ingest/queries (internal) { queries: [{ text, userId, timestamp }][] }

Internal endpoint for query log ingestion from search services.

About Typeahead / Autocomplete

The search bar on Google, Twitter, or Amazon that shows suggestions as you type is a typeahead system. It looks instantaneous, but behind it is an architecture that serves hundreds of thousands of prefix queries per second with under 100ms end-to-end latency, across a dataset of billions of historical queries.

The core data structure is a trie: a prefix tree where each node represents a character and branches to every possible next character. To find all queries matching the prefix 'netfl', you walk the trie to the node for 'l' and return the highest-frequency completions below it. In practice, a pure in-memory trie for all of Google's queries would be enormous, so the production solution is smarter: precompute the top-K completions for each prefix, store them directly in Redis, and update them from an offline data pipeline that aggregates query logs.

The result is a system that's almost entirely a cache read. The suggestion service looks up the prefix in Redis and returns a pre-ranked list in microseconds. The interesting engineering is all offline: how you aggregate query frequency, how you handle trending topics (real-time signals on top of batch frequency), and how you personalize suggestions per user.