The first move in any interview: define requirements and sketch the API before drawing a single box.
POST /events/click { click_id, ad_id, advertiser_id, user_id, timestamp, ip, user_agent } → 202GET /aggregates?ad_id=X&start=T1&end=T2&granularity=hour → [{ window, clicks, spend }]GET /aggregates/realtime?ad_id=X&window=5m → { clicks, spend }Every time you click an ad, a system somewhere counts it. That count is the basis for billing an advertiser billions of dollars per year. The system must be accurate: under-counting loses revenue, over-counting triggers advertiser disputes and regulatory risk. It must be fast: advertisers need near-real-time dashboards to optimize their campaigns. And it must handle 100,000+ click events per second at peak.
The fundamental challenge is exactly-once aggregation at high throughput. If a click is counted twice (due to duplicate event delivery or a processing retry), the advertiser overpays. If it's dropped (due to a crashed worker), the publisher loses revenue. The design must guarantee every valid click is counted exactly once, with a clear audit trail.
Two architectural patterns compete here. The Lambda architecture runs a real-time stream processing layer (Flink, Spark Streaming) for low-latency approximate counts, and a batch processing layer (Spark, MapReduce) for accurate historical reconciliation. The two results are merged at query time. The Kappa architecture simplifies this by using only the stream layer, making the stream log replayable enough to serve as the source of truth for both real-time and historical queries.
The data model is deliberately simple: each event is a (click_id, ad_id, advertiser_id, user_id, timestamp, metadata) tuple. The aggregations are counts and spend rollups grouped by (ad_id, time_window). The complexity isn't in the data model. It's in ensuring those counts are correct under every failure mode.