Buy a bigger machine (simple, capped, SPOF) or add more machines (unlimited, but now you're a distributed system).
Pick a scenario below and see which option fits, and why.
There is no universal winner here, only the right fit for a given situation. Each scenario above pushes the decision a different way, which is exactly how this tradeoff shows up in real design questions.
Plain English: vertical scaling means making one server bigger (more CPU/RAM). Horizontal scaling means adding more servers. Vertical is dead simple but hits a ceiling and is a single point of failure; horizontal scales forever but forces you to handle distribution.
Two ways to add capacity. Vertical scaling (scaling up) means giving a single node more resources: more CPU cores, RAM, faster disks. Horizontal scaling (scaling out) means adding more nodes and spreading load across them via a load balancer or partitioning scheme.
Your service is out of capacity. Vertical scaling is the no-code answer (resize the box, done), and it's the right first move because it requires zero architectural change. But every machine has a ceiling (and the biggest boxes cost wildly more per unit). Horizontal scaling removes the ceiling entirely, at the price of making your app stateless and distribution-aware.
Vertical: stop the instance, pick a bigger instance type, restart, or hot-add resources where supported. Limited by the largest available hardware. Horizontal: run N identical stateless instances behind a load balancer; for stateful systems, shard the data across nodes (see sharding) and replicate for durability. Auto-scaling automates horizontal scaling in response to load.
Scales vertically by design: the per-symbol matching engine is single-threaded and in-memory, so a bigger, faster box is the only lever for that hot path
Stateless read/write services scale horizontally behind a load balancer; Postgres scales vertically first, then with read replicas
Feed-serving tier scales out horizontally; the Redis timeline cache scales out via sharding
Vertical vs horizontal scaling is a framing question. When an interviewer asks how you'd handle 10x growth, the right answer starts with 'scale vertically first because it's zero code change, then move to horizontal when you hit the hardware ceiling or need redundancy.' The thing they're checking is whether you know horizontal scaling only works when the service is stateless. Candidates lose points by jumping straight to 'add more servers' without addressing session state or data partitioning.