The distributed phone book that turns hostnames into IP addresses.
Resolve example.com. The first lookup walks the hierarchy; the second is cached.
DNS turns a name into an IP by walking a hierarchy: root points to the .com servers, which point to example.com's nameserver, which holds the real address. Resolvers cache the answer so the next lookup skips the whole walk, which is why DNS changes take time to propagate.
Plain English: computers route by numbers (IP addresses), but humans type names (google.com). DNS is the global lookup system that translates between them. Every site visit starts with a DNS query.
A hierarchical, distributed key-value store mapping human-readable names (api.example.com) to IP addresses. Lookups go through a chain: local cache → resolver → root → TLD (.com) → authoritative server.
Users type names, computers route to IPs. DNS bridges the gap, and does it at a scale where the entire internet's name lookups happen in milliseconds.
Your OS asks a resolver (often your ISP or 8.8.8.8). The resolver checks its cache; if miss, it walks the hierarchy. Result is cached for the TTL specified in the DNS record. Record types: A (IPv4), AAAA (IPv6), CNAME (alias), MX (mail), TXT (arbitrary, often for verification), NS (delegates to another nameserver).
A BGP misconfig wiped Facebook from the internet for 6 hours, including their badge access.
A routine command intended to assess global backbone capacity was issued, but a bug in Facebook's audit tool failed to stop it. The command withdrew all Facebook BGP routes, taking the company off the internet. Worse: the same DNS infrastructure that announced their existence to the world also gated their internal tools, including the badge-access system at the datacenters. Engineers couldn't VPN in, couldn't open the doors, couldn't even reach the management plane to roll back. Recovery required physically driving engineers to the datacenter floor. The lesson: never let your control plane depend on your data plane. Out-of-band access has to actually be out-of-band.
A config push to the backbone control plane caused packet loss across three GCP regions for four hours.
Google pushed a config update to the network control plane managing inter-region backbone routing. The config included software that consumed far more memory than expected under production conditions, causing the control plane to crash on a large fraction of routers. Each restarting router needed to re-establish BGP peering, which consumed network capacity. Restarting routers and network traffic competing for bandwidth created a feedback loop: routers trying to recover caused more congestion, which slowed recovery further. Three GCP regions (us-east1, us-central1, europe-west1) experienced 30-87% packet loss for services using the Google backbone. The lesson: stage control plane changes and validate memory/resource usage before the push. A control plane change should never be able to create a data plane feedback loop.
DNS comes up in global system design and in 'how does a request reach your server?' questions. The thing interviewers want to hear is that DNS has TTL-based caching, which means changes are slow to propagate and you cannot use DNS alone for fast failover. Show you know the difference between DNS-based routing (GeoDNS for global load balancing) and load-balancer routing (for fast failover within a region). Candidates lose points by treating DNS as a simple lookup that resolves instantly.