← Concepts
Operations·3 min read

Auto-Scaling

Automatically add and remove instances based on load so you pay for what you need and survive spikes without a pager.

First time reading this? Start here

Plain English: instead of manually adding servers when traffic rises and removing them when it falls, you set rules ('keep CPU around 60%') and the platform spins instances up and down for you. Saves money off-peak and absorbs spikes, as long as you've sized the limits and warm-up right.

Used in:Instagram FeedNotification SystemTicketmaster (Seat Booking)
What it is

A control loop that adjusts the number of running instances (or container replicas) in response to observed load. It watches metrics (CPU, request rate, queue depth, custom signals), compares them to a target, and scales out (add instances) or in (remove instances) within configured min/max bounds.

The problem it solves

Traffic is rarely flat; it has daily peaks, weekly cycles, and unpredictable spikes. Provisioning for peak wastes money off-peak; provisioning for average means falling over during spikes. Auto-scaling matches capacity to demand automatically, controlling cost during quiet periods while absorbing surges without a human in the loop.

How it works

Target-tracking: set a target (e.g. 60% CPU or 1000 RPS/instance); the controller adds/removes instances to hold the metric near target. Step/threshold: define rules ('+2 instances if CPU > 80% for 5 min'). Scheduled: scale ahead of known events (Black Friday, market open). Predictive: ML forecasts demand and pre-warms capacity. New instances must pass health checks before the load balancer routes to them; scale-in respects cooldowns and connection draining to avoid killing in-flight work.

Why use it
What it costs you
Where it shows up in our architectures
Gotchas

Your notes

Private to you