Store the answer to expensive questions so you don't pay to compute it again.
Send a read. Watch the first one go all the way to the database.
The first read pays the full database cost and warms the cache. Every read after is served from memory, often 50–100x faster. The hard part is not storing the value, it is knowing when to throw it away.
Plain English: when an answer is expensive to compute, save it somewhere fast (like memory) so the next person asking gets it instantly. The hard part isn't saving; it's knowing when to throw the saved copy away.
Keeping frequently-accessed data in a faster store (RAM, edge, in-process) so subsequent requests don't have to recompute or refetch from the slow source of truth.
Reading from disk or computing a complex query is orders of magnitude slower than reading from memory. For read-heavy workloads, most reads can be served from cache, cutting load on the slow store and improving p99 latency. The reason it works: a few items get requested constantly while most sit untouched, the way a few songs get played millions of times while the rest are barely touched. Keep the popular ones in memory and the slow store barely sees the traffic.
Choose a cache pattern: cache-aside (app reads cache, falls through to DB on miss, populates cache), write-through (every write goes through the cache to the DB), write-behind (writes hit cache; DB is updated async), refresh-ahead (proactively refresh hot entries before they expire). Set a TTL appropriate to staleness tolerance.
Cache-aside pattern: Read Service checks Redis, falls through to Postgres on miss
Precomputed timelines in Redis, so readers almost never hit the DB
Hot (geohash, query) pairs cached with short TTL
The whole system IS a cache
A cache misconfiguration caused a load spike that overwhelmed Slack's databases in sequence.
Slack deployed a Memcached configuration change that accidentally reduced the effective cache size. Requests that would have hit cache started hitting the database. The database absorbed the initial surge but latency crept up. Slower DB responses caused app servers to hold connections longer, exhausting their connection pools. Exhausted pools caused requests to queue. Queued requests timed out and clients retried, amplifying the load. The database load balancer fell over. Slack was effectively down for 5 hours for most users. The lesson: cache and database tiers aren't independent. A cache miss rate increase of just 5-10% can mean 10x database load on a busy system. Monitor cache hit rate as a first-class operational metric and have a circuit breaker for cache degradation.
A replication topology change made Workers KV return data that was hours old globally.
Cloudflare Workers KV is a globally distributed key-value store built on eventual consistency: writes propagate to all edges within roughly 60 seconds. During a maintenance operation, an engineer changed the replication topology, specifically which nodes a region's reads fall back to on cache miss. The change accidentally routed reads for a subset of keys to a secondary tier that had stopped receiving updates. Edge nodes across all regions started serving stale values that were hours old, not seconds old. Feature flags, A/B test configs, and auth tokens stored in KV returned wrong results for 35 minutes. The lesson: eventually consistent systems have a defined propagation bound. Any change to the replication topology must be validated against that bound. Breaking propagation doesn't produce errors; it produces silent staleness that can persist indefinitely.
Prime Day 2018 opened with Amazon's own landing page returning errors for the first 90 minutes.
Prime Day 2018 launched with a load spike Amazon had anticipated and prepared for, but not quite enough. The front-end tier scaled horizontally via auto-scaling groups. The recommendation service underneath did not: it depended on a Redis cluster sized for projected peak, not actual peak. The Redis cluster hit its connection limit within minutes of launch. Backend services queuing for Redis connections started timing out. The front-end returned errors. The recommendation service's circuit breaker was supposed to fail open (show a degraded UI without personalization), but configuration drift meant it was set to fail closed instead. Customers saw error dogs on Amazon.com for 90 minutes. The lesson: auto-scaling the frontend while leaving stateful dependencies unscaled is the most common Prime-Day-class mistake. Circuit breakers also need to be exercised in production, not just configured and forgotten.
A corrupted database update took down Azure AD MFA globally, locking millions of users out of Microsoft services.
In September 2023, a routine Azure Active Directory update introduced a corrupted data entry into the authentication service's configuration store. That store is read on every MFA request, so within minutes of deployment, MFA was failing globally. Services that depend on Azure AD for login, including Microsoft 365, Teams, Azure Portal, and Xbox, all started rejecting multi-factor auth. Because MFA was broken, engineers trying to reach the management plane to roll back had to use break-glass procedures. The update was eventually rolled back, but re-validation and cache clearing across global infrastructure took 14 hours. The lesson: configuration stores are critical path for every request. Changes to them must be validated on live traffic via canary before global rollout. Break-glass procedures must not depend on the service they're trying to fix.
A single Redis cluster used for rate limiting became a cascading single point of failure during peak dinner hours.
DoorDash used a central Redis cluster to store rate limiting counters. During a high-traffic event, the Redis cluster started showing elevated latency. Services calling the rate limiter were waiting on Redis responses and holding threads. Thread pools exhausted. Services started returning 503s. Upstream services receiving errors started retrying, amplifying the load. The failure cascaded horizontally: order placement, merchant dashboards, and driver assignment all went down because they all shared the same rate-limiter Redis cluster. This dependency didn't appear in any single team's architecture diagram. The lesson: shared infrastructure like rate limiters must be treated as SLO-critical with blast-radius isolation. If rate limiting fails, it should fail open, not block the entire request path.
When you propose caching in an interview, the follow-up will always be 'what's your invalidation strategy?' That's the real test. Know the three patterns by name (cache-aside, write-through, write-behind), be able to say which you'd pick and why, and proactively mention thundering herd. Candidates who say 'just cache it' without addressing invalidation or TTL strategy look like they've never debugged a cache consistency bug.