Keep spare capacity standing by and switch to it automatically when something dies, so a component failure isn't an outage.
Plain English: redundancy means having backups (extra servers, replicas, data centers). Failover is automatically switching to a backup when the primary dies. Together they're how you turn 'a server crashed' into a non-event instead of a 3am outage.
Redundancy is having more capacity or copies than the minimum needed, so the loss of one doesn't cause failure: redundant servers, replicated data, multiple availability zones. Failover is the act of detecting a failure and shifting traffic or responsibility to a healthy standby. Together they're the backbone of high availability.
Everything fails eventually: disks, machines, racks, whole data centers. Without redundancy, every component is a single point of failure and every failure is an outage. Redundancy removes the single points; failover makes recovery automatic and fast instead of a manual scramble. The combination is what lets a service advertise four or five nines of availability.
Redundancy comes in modes: active-active (all replicas serve traffic; losing one just reduces capacity) and active-passive (a standby waits, promoted on primary failure). Failover detects death via health checks/heartbeats, then redirects: a load balancer drops the dead instance from rotation, a follower DB is promoted to primary, or DNS/anycast shifts a region. Geographic redundancy spreads across availability zones and regions so a whole-datacenter loss is survivable. Quorum and leader election keep failover from causing split-brain.
Cassandra replication factor 3 across nodes means losing one replica triggers no downtime, so reads/writes continue against the survivors
The Postgres ledger runs primary + standby with automatic promotion; money writes must survive a primary failure without loss
Erasure coding across multiple availability zones means object durability survives whole-AZ failure, not just disk failure