Performance·3 min read

Caching

Store the answer to expensive questions so you don't pay to compute it again.

Try it

Send a read. Watch the first one go all the way to the database.

Client

Cache

empty

Database

Cache is cold. First read will miss.

The first read pays the full database cost and warms the cache. Every read after is served from memory, often 50–100x faster. The hard part is not storing the value, it is knowing when to throw it away.

First time reading this? Start here

Plain English: when an answer is expensive to compute, save it somewhere fast (like memory) so the next person asking gets it instantly. The hard part isn't saving; it's knowing when to throw the saved copy away.

Used in:URL Shortener Instagram Feed Yelp Distributed Cache

What it is

Keeping frequently-accessed data in a faster store (RAM, edge, in-process) so subsequent requests don't have to recompute or refetch from the slow source of truth.

The problem it solves

Reading from disk or computing a complex query is orders of magnitude slower than reading from memory. For read-heavy workloads, most reads can be served from cache, cutting load on the slow store and improving p99 latency. The reason it works: a few items get requested constantly while most sit untouched, the way a few songs get played millions of times while the rest are barely touched. Keep the popular ones in memory and the slow store barely sees the traffic.

How it works

Choose a cache pattern: cache-aside (app reads cache, falls through to DB on miss, populates cache), write-through (every write goes through the cache to the DB), write-behind (writes hit cache; DB is updated async), refresh-ahead (proactively refresh hot entries before they expire). Set a TTL appropriate to staleness tolerance.

Why use it

Large latency reduction (sub-ms vs tens of ms)
Reduces load on the backing store, often by 10×+
Cheap to add at the edge (CDN), in-process (LRU), or as a tier (Redis)

What it costs you

Cache invalidation is one of the hard problems in computer science: when does cached data go stale?
Adds a moving part: failure handling, monitoring, hit-rate tracking
Hot keys can overwhelm a single cache node (see consistent hashing gotcha)

Where it shows up in our architectures

URL Shortener →
Cache-aside pattern: Read Service checks Redis, falls through to Postgres on miss
Instagram Feed →
Precomputed timelines in Redis, so readers almost never hit the DB
Yelp →
Hot (geohash, query) pairs cached with short TTL
Distributed Cache →
The whole system IS a cache

Gotchas

TTLs are your friend. Short TTLs sidestep most invalidation problems, and stale-by-a-minute is usually fine.
Cache-aside is the default. Write-through and write-behind have more failure modes than people expect.
Always measure your hit rate. A cache with a 5% hit rate is just added latency.
Hot keys are not solved by consistent hashing alone; they need explicit replication or key sharding.

When this went wrong in production

Slack's 5-hour outage from a cascading cache failure · 2022

Postmortem ↗

A cache misconfiguration caused a load spike that overwhelmed Slack's databases in sequence.

Slack deployed a Memcached configuration change that accidentally reduced the effective cache size. Requests that would have hit cache started hitting the database. The database absorbed the initial surge but latency crept up. Slower DB responses caused app servers to hold connections longer, exhausting their connection pools. Exhausted pools caused requests to queue. Queued requests timed out and clients retried, amplifying the load. The database load balancer fell over. Slack was effectively down for 5 hours for most users. The lesson: cache and database tiers aren't independent. A cache miss rate increase of just 5-10% can mean 10x database load on a busy system. Monitor cache hit rate as a first-class operational metric and have a circuit breaker for cache degradation.

Cloudflare Workers KV stale reads for 35 minutes · 2023

Postmortem ↗

A replication topology change made Workers KV return data that was hours old globally.

Cloudflare Workers KV is a globally distributed key-value store built on eventual consistency: writes propagate to all edges within roughly 60 seconds. During a maintenance operation, an engineer changed the replication topology, specifically which nodes a region's reads fall back to on cache miss. The change accidentally routed reads for a subset of keys to a secondary tier that had stopped receiving updates. Edge nodes across all regions started serving stale values that were hours old, not seconds old. Feature flags, A/B test configs, and auth tokens stored in KV returned wrong results for 35 minutes. The lesson: eventually consistent systems have a defined propagation bound. Any change to the replication topology must be validated against that bound. Breaking propagation doesn't produce errors; it produces silent staleness that can persist indefinitely.

Amazon Prime Day collapses under its own launch load · 2018

Postmortem ↗

Prime Day 2018 opened with Amazon's own landing page returning errors for the first 90 minutes.

Prime Day 2018 launched with a load spike Amazon had anticipated and prepared for, but not quite enough. The front-end tier scaled horizontally via auto-scaling groups. The recommendation service underneath did not: it depended on a Redis cluster sized for projected peak, not actual peak. The Redis cluster hit its connection limit within minutes of launch. Backend services queuing for Redis connections started timing out. The front-end returned errors. The recommendation service's circuit breaker was supposed to fail open (show a degraded UI without personalization), but configuration drift meant it was set to fail closed instead. Customers saw error dogs on Amazon.com for 90 minutes. The lesson: auto-scaling the frontend while leaving stateful dependencies unscaled is the most common Prime-Day-class mistake. Circuit breakers also need to be exercised in production, not just configured and forgotten.

Azure Active Directory outage: MFA breaks for 14 hours · 2023

Postmortem ↗

A corrupted database update took down Azure AD MFA globally, locking millions of users out of Microsoft services.

In September 2023, a routine Azure Active Directory update introduced a corrupted data entry into the authentication service's configuration store. That store is read on every MFA request, so within minutes of deployment, MFA was failing globally. Services that depend on Azure AD for login, including Microsoft 365, Teams, Azure Portal, and Xbox, all started rejecting multi-factor auth. Because MFA was broken, engineers trying to reach the management plane to roll back had to use break-glass procedures. The update was eventually rolled back, but re-validation and cache clearing across global infrastructure took 14 hours. The lesson: configuration stores are critical path for every request. Changes to them must be validated on live traffic via canary before global rollout. Break-glass procedures must not depend on the service they're trying to fix.

DoorDash Redis cluster overload cascades to full outage · 2021

Postmortem ↗

A single Redis cluster used for rate limiting became a cascading single point of failure during peak dinner hours.

DoorDash used a central Redis cluster to store rate limiting counters. During a high-traffic event, the Redis cluster started showing elevated latency. Services calling the rate limiter were waiting on Redis responses and holding threads. Thread pools exhausted. Services started returning 503s. Upstream services receiving errors started retrying, amplifying the load. The failure cascaded horizontally: order placement, merchant dashboards, and driver assignment all went down because they all shared the same rate-limiter Redis cluster. This dependency didn't appear in any single team's architecture diagram. The lesson: shared infrastructure like rate limiters must be treated as SLO-critical with blast-radius isolation. If rate limiting fails, it should fail open, not block the entire request path.

All war stories →

Interview angle

When you propose caching in an interview, the follow-up will always be 'what's your invalidation strategy?' That's the real test. Know the three patterns by name (cache-aside, write-through, write-behind), be able to say which you'd pick and why, and proactively mention thundering herd. Candidates who say 'just cache it' without addressing invalidation or TTL strategy look like they've never debugged a cache consistency bug.

Your notes

Private to you