Latency is how long one request takes; throughput is how many you handle per second. Optimizing one often hurts the other.
Plain English: latency is the wait for a single request (how fast). Throughput is how many requests you handle per second (how much). They're different: a system can be high-throughput but high-latency (like batching) or low-latency but low-throughput. You usually have to trade one for the other.
Two distinct performance dimensions. Latency is the time to complete a single operation, end to end, measured in percentiles (p50, p99, p999), not averages. Throughput is the rate of operations the system sustains: requests/sec, MB/sec, messages/sec. They are related but independent: a highway's speed limit is latency; its number of lanes is throughput.
Forces you to be precise about what 'fast' means. A user staring at a spinner cares about latency. A batch pipeline processing a billion rows overnight cares about throughput. Conflating them leads to the wrong optimization: batching improves throughput but adds latency; tiny per-request work minimizes latency but wastes throughput.
Latency is reduced by caching, doing less work per request, moving computation closer to the user (CDN), and avoiding serial round-trips. Throughput is increased by parallelism, batching, pipelining, and adding nodes (horizontal scaling). The tension: batching N requests amortizes fixed costs (great for throughput) but each request now waits for the batch to fill (worse latency). Little's Law ties them together: concurrency = throughput × latency.
Microsecond latency is the entire product; they sacrifice almost everything else to shave tail latency off order matching
Built for throughput: batches and pipelines records, accepting some per-message latency to push millions of messages/sec
CDN edges cut video start latency; the encoding pipeline is throughput-optimized batch work where latency doesn't matter