Uber (Driver Matching): System Design

Requirements & API: Uber (Driver Matching)

The first move in any interview: define requirements and sketch the API before drawing a single box.

Functional requirements

•Ingest driver GPS pings (every few seconds) and keep a live spatial index of available drivers.
•On a ride request, find the K nearest available drivers, rank them, and send offers.
•Atomically reserve a driver on match so two riders can't be offered the same car.
•Track the trip lifecycle (requested → accepted → in_progress → completed) and compute fare.

Non-functional requirements

•Dispatch is latency-bound. The nearest-driver match must return in <2s while the rider waits.
•Geo lookups must be sub-millisecond, so the index lives in memory (Redis + H3 cells).
•Absorb ~250K location pings/sec; the index is rebuildable from the Kafka stream, so RAM-only is safe.
•Matching is sharded by region. Most matches are geographically local, and NYC peak is a hot shard.

API contract

POST /api/v1/rides { origin, destination } → { trip_id, status: 'searching' }

Kicks off dispatch; rider then polls or receives a push when a driver is assigned.

POST /api/v1/drivers/location { lat, lng, ts } → 202

High-volume fire-and-forget ping; published to Kafka, not written to the index synchronously.

POST /api/v1/offers/{offer_id}/accept → { trip_id }

Driver accepts; the reservation becomes a trip. Reject re-adds the driver to the pool.

About Uber (Driver Matching)

You tap Request, and within a couple of seconds the app names a driver and starts drawing their car creeping toward you on the map. To pull that off, the system has to know, right now, where every nearby available driver is, and pick the best one before you lose patience. Two forces shape the whole design: a steady stream of driver location pings coming in, and a strict latency budget on the match going out.

The supply side sends a lot of data. A million drivers at peak each send a GPS ping every few seconds, which is roughly 250,000 pings per second. Writing those straight into a database would melt it, so every ping goes onto Kafka first. From there one consumer updates the driver's position in an in-memory geo index, and another appends it to Cassandra for history and replays. Because the index can be rebuilt from the Kafka stream at any time, it is safe to keep it purely in RAM.

That geo index is what makes dispatch work. It stores online drivers by H3 hexagonal cell, so finding nearby cars becomes 'give me the drivers in these few neighboring cells' instead of measuring distance to every driver in the city. It works like the grid squares on a paper map. You do not scan the whole map to find a coffee shop, you jump to square G7 and look only there, which is why the lookup stays sub-millisecond even at rush hour.

On a ride request, the Dispatch service queries that index for the nearest available drivers, ranks them, and sends an offer over a websocket. One subtlety it has to get right is reservation: two riders can match to the same driver in the same instant, so dispatch atomically removes a driver from the pool before offering and re-adds them on a reject, which prevents double-booking. This system teaches geo-indexing with hex grids, using a queue as a shock absorber for a write firehose, region sharding for hot cities, and the race condition hiding inside real-time matching.