Automatically add and remove instances based on load so you pay for what you need and survive spikes without a pager.
Plain English: instead of manually adding servers when traffic rises and removing them when it falls, you set rules ('keep CPU around 60%') and the platform spins instances up and down for you. Saves money off-peak and absorbs spikes, as long as you've sized the limits and warm-up right.
A control loop that adjusts the number of running instances (or container replicas) in response to observed load. It watches metrics (CPU, request rate, queue depth, custom signals), compares them to a target, and scales out (add instances) or in (remove instances) within configured min/max bounds.
Traffic is rarely flat; it has daily peaks, weekly cycles, and unpredictable spikes. Provisioning for peak wastes money off-peak; provisioning for average means falling over during spikes. Auto-scaling matches capacity to demand automatically, controlling cost during quiet periods while absorbing surges without a human in the loop.
Target-tracking: set a target (e.g. 60% CPU or 1000 RPS/instance); the controller adds/removes instances to hold the metric near target. Step/threshold: define rules ('+2 instances if CPU > 80% for 5 min'). Scheduled: scale ahead of known events (Black Friday, market open). Predictive: ML forecasts demand and pre-warms capacity. New instances must pass health checks before the load balancer routes to them; scale-in respects cooldowns and connection draining to avoid killing in-flight work.
The stateless feed-serving tier auto-scales on request rate to ride out daily traffic peaks
Worker pools auto-scale on queue depth: a burst of send requests spins up more workers, then scales back down
Scheduled and predictive scaling ahead of a hot on-sale; the virtual queue smooths the spike auto-scaling can't react to fast enough