← Concepts
Performance·3 min read

Load Balancing

Spread incoming requests across a pool of identical servers so no single one melts.

First time reading this? Start here

Plain English: instead of sending every request to one server, put a traffic cop in front of N identical servers and have it spread the load. The traffic cop is the load balancer.

Used in:API Rate LimiterUber
What it is

A component that sits in front of a pool of backend servers and distributes incoming requests across them. The pool looks like one big server to clients; behind the scenes, the load balancer picks which actual instance handles each request.

The problem it solves

Any service with more traffic than a single box can handle needs to spread that traffic somehow. Without a load balancer, you'd hard-code multiple endpoints into clients (terrible for ops) or rely on DNS round-robin (poor failure handling, no real-time adjustment).

How it works

Common strategies: round-robin (next-in-line), least-connections (newest request to the box with fewest open connections), weighted (bigger boxes get more), consistent hashing (sticky routing for sessions or caches). The LB also health-checks each backend and routes around failures.

Why use it
What it costs you
Where it shows up in our architectures
Gotchas
When this went wrong in production

Cloudflare regex CPU-bomb · 2019

Postmortem ↗

A single bad regex took down ~all Cloudflare-fronted sites globally for 27 minutes.

Cloudflare's WAF (web application firewall) deployed a new rule containing a regex that exhibited catastrophic backtracking. On any HTTP request with the right pattern, the regex would run for seconds at 100% CPU on every CPU core globally. Within seconds, Cloudflare's edge fleet was CPU-saturated and unable to serve traffic. ~all Cloudflare-fronted sites went down. Rollback took 27 minutes because the deploy mechanism itself was struggling against the saturation. Lessons: never deploy untrusted regex globally without timeouts; staged rollout for any rule that runs on every request; the safety mechanism is only as good as your ability to actually deploy a rollback.

Your notes

Private to you