← Concepts
Performance·3 min read

Caching

Store the answer to expensive questions so you don't pay to compute it again.

First time reading this? Start here

Plain English: when an answer is expensive to compute, save it somewhere fast (like memory) so the next person asking gets it instantly. The hard part isn't saving; it's knowing when to throw the saved copy away.

Used in:URL ShortenerInstagram FeedYelpDistributed Cache
What it is

Keeping frequently-accessed data in a faster store (RAM, edge, in-process) so subsequent requests don't have to recompute or refetch from the slow source of truth.

The problem it solves

Reading from disk or computing a complex query is orders of magnitude slower than reading from memory. For read-heavy workloads, most reads can be served from cache, cutting load on the slow store and improving p99 latency. The reason it works: a few items get requested constantly while most sit untouched, the way a few songs get played millions of times while the rest are barely touched. Keep the popular ones in memory and the slow store barely sees the traffic.

How it works

Choose a cache pattern: cache-aside (app reads cache, falls through to DB on miss, populates cache), write-through (every write goes through the cache to the DB), write-behind (writes hit cache; DB is updated async), refresh-ahead (proactively refresh hot entries before they expire). Set a TTL appropriate to staleness tolerance.

Why use it
What it costs you
Where it shows up in our architectures
Gotchas

Your notes

Private to you