Reqflow
← All concepts
Networking·3 min read

CDN (Content Delivery Network)

Serve static and semi-static content from edge servers physically close to the user.

Try it

Request a file. The first trip goes to the far origin; the edge caches it.

User
Tokyo
Edge
nearby, empty
Origin
Virginia (far)
Edge is empty. First request must reach the origin.

A CDN keeps copies of your static files on edge servers around the world. A user in Tokyo hits a nearby edge instead of your origin in Virginia, so after the first request the file loads in milliseconds. Closer equals faster.

First time reading this? Start here

Plain English: copy your images and videos onto servers in cities all over the world, so a user in Tokyo doesn't wait for the file to come from your data center in Virginia. Closer = faster.

What it is

A global network of edge servers that cache content from your origin and serve it to nearby users. Same content, but served from milliseconds away instead of half a continent.

The problem it solves

The speed of light is slow when your users are 8,000 km from your origin. CDNs put a cached copy of your content within ~50ms of any user on the planet. They also offload large amounts of bandwidth from your origin, which cuts cost.

How it works

User requests an asset (image, video chunk, JS bundle). DNS routes them to the nearest CDN POP. If the POP has the asset cached, it serves immediately. If not, it fetches from origin, caches locally with a TTL, and serves. Subsequent requests in that region hit the cache.

Why use it

  • Sub-100ms first-byte for any user globally
  • Origin offload of 90%+ for cacheable content
  • Built-in DDoS absorption: CDNs handle attacks you couldn't afford to

What it costs you

  • Per-byte egress cost (significant for video at scale, which is why Netflix built their own to avoid it)
  • Cache invalidation across hundreds of POPs is slower than invalidating a single cache
  • Only helps for cacheable content; personalized per-user responses don't benefit

Where it shows up in our architectures

  • Instagram Feed

    Photos and videos served from CDN edge; origin is S3

  • Netflix

    Open Connect, Netflix's own CDN, with appliances placed inside ISPs

  • TikTok

    Video chunks served from CDN; client prefetches the next 2-3 videos in parallel

Gotchas

  • CDNs serve immutable content best. Cache busting via URL fingerprinting (e.g. /app.a1b2c3.js) avoids most invalidation problems.
  • Pay attention to cache-control headers. The CDN obeys them; misconfigured headers mean either no caching (defeats the point) or stale forever (worse).
  • For dynamic content, you can still use the CDN for TLS termination and DDoS protection even if you can't cache responses.
When this went wrong in production

Fastly takes down the internet · 2021

Postmortem ↗

A customer config trigger crashed Fastly globally: 49 minutes, half the modern web dark.

Fastly had pushed a config update weeks earlier that introduced a latent bug, only triggered by a specific customer configuration pattern. When that customer eventually applied their config, the bug fired across Fastly's global edge fleet within 12 seconds. Reddit, the NYT, Amazon, the UK Gov website: all 503ing simultaneously. Recovery took 49 minutes because the rollback procedure itself depended on healthy edge nodes. The lesson: latent bugs triggered by customer input are essentially production bombs. Canary deployments must rotate, and your incident-response paths must work even when your data plane is on fire.

Cloudflare Workers KV stale reads for 35 minutes · 2023

Postmortem ↗

A replication topology change made Workers KV return data that was hours old globally.

Cloudflare Workers KV is a globally distributed key-value store built on eventual consistency: writes propagate to all edges within roughly 60 seconds. During a maintenance operation, an engineer changed the replication topology, specifically which nodes a region's reads fall back to on cache miss. The change accidentally routed reads for a subset of keys to a secondary tier that had stopped receiving updates. Edge nodes across all regions started serving stale values that were hours old, not seconds old. Feature flags, A/B test configs, and auth tokens stored in KV returned wrong results for 35 minutes. The lesson: eventually consistent systems have a defined propagation bound. Any change to the replication topology must be validated against that bound. Breaking propagation doesn't produce errors; it produces silent staleness that can persist indefinitely.

Interview angle

CDNs are expected in any system serving media or global users, so the signal isn't proposing one but explaining what it actually buys you. Say it reduces latency by moving bytes physically closer to the user and offloads origin traffic. The nuance to show is around cache-control headers and invalidation: a misconfigured TTL means stale content globally and you can't easily fix it. Candidates lose points by treating the CDN as a magic fix for all latency without addressing what happens when the edge cache is stale or cold.

Your notes

Private to you