Latency is how long one request takes; throughput is how many you handle per second. Optimizing one often hurts the other.
Pick a scenario below and see which option fits, and why.
There is no universal winner here, only the right fit for a given situation. Each scenario above pushes the decision a different way, which is exactly how this tradeoff shows up in real design questions.
Plain English: latency is the wait for a single request (how fast). Throughput is how many requests you handle per second (how much). They're different: a system can be high-throughput but high-latency (like batching) or low-latency but low-throughput. You usually have to trade one for the other.
Two distinct performance dimensions. Latency is the time to complete a single operation, end to end, measured in percentiles (p50, p99, p999), not averages. Throughput is the rate of operations the system sustains: requests/sec, MB/sec, messages/sec. They are related but independent: a highway's speed limit is latency; its number of lanes is throughput.
Forces you to be precise about what 'fast' means. A user staring at a spinner cares about latency. A batch pipeline processing a billion rows overnight cares about throughput. Conflating them leads to the wrong optimization: batching improves throughput but adds latency; tiny per-request work minimizes latency but wastes throughput.
Latency is reduced by caching, doing less work per request, moving computation closer to the user (CDN), and avoiding serial round-trips. Throughput is increased by parallelism, batching, pipelining, and adding nodes (horizontal scaling). The tension: batching N requests amortizes fixed costs (great for throughput) but each request now waits for the batch to fill (worse latency). Little's Law ties them together: concurrency = throughput × latency.
Microsecond latency is the entire product; they sacrifice almost everything else to shave tail latency off order matching
Built for throughput: batches and pipelines records, accepting some per-message latency to push millions of messages/sec
CDN edges cut video start latency; the encoding pipeline is throughput-optimized batch work where latency doesn't matter
Latency vs throughput questions test whether you know what you're actually optimizing for. The first thing to ask is 'what does the user feel?' because if it's an interactive request, latency matters, and if it's a batch pipeline, throughput matters. Show you measure latency at p99, not the average, because tail latency is what real users experience. Candidates lose points by conflating throughput with performance and proposing batching for a real-time user-facing request where it would make response time worse.