The first move in any interview: define requirements and sketch the API before drawing a single box.
GET /autocomplete?q=netfl&limit=5&locale=en → { suggestions: [{ text, score }] }POST /ingest/queries (internal) { queries: [{ text, userId, timestamp }][] }The search bar on Google, Twitter, or Amazon that shows suggestions as you type is a typeahead system. It looks instantaneous, but behind it is an architecture that serves hundreds of thousands of prefix queries per second with under 100ms end-to-end latency, across a dataset of billions of historical queries.
The core data structure is a trie: a prefix tree where each node represents a character and branches to every possible next character. To find all queries matching the prefix 'netfl', you walk the trie to the node for 'l' and return the highest-frequency completions below it. In practice, a pure in-memory trie for all of Google's queries would be enormous, so the production solution is smarter: precompute the top-K completions for each prefix, store them directly in Redis, and update them from an offline data pipeline that aggregates query logs.
The result is a system that's almost entirely a cache read. The suggestion service looks up the prefix in Redis and returns a pre-ranked list in microseconds. The interesting engineering is all offline: how you aggregate query frequency, how you handle trending topics (real-time signals on top of batch frequency), and how you personalize suggestions per user.