Periodic 'I'm alive' signals between nodes so failures are detected within seconds, not minutes.
Plain English: every few seconds, each server pings its peers to say 'I'm still here.' If the pings stop, the others assume that server died and stop sending it work. Without heartbeats, failures take minutes to detect.
A small periodic message sent from one node to another (or to a coordinator) signalling that the sender is alive and healthy. Absence of heartbeats for some threshold triggers failover, removal from a load-balancer pool, or alerts.
Distributed systems need to know when a peer has failed. TCP connections can stay 'open' long after the other side has crashed, so without explicit heartbeats, you'd discover failures only when the next request times out, potentially minutes later.
Each node sends a heartbeat every T seconds (typically 1-5s). The receiver tracks the last-seen timestamp. If no heartbeat arrives within K × T (typically K=3), the receiver declares the sender dead. Action: leader election kicks off, the dead node is removed from rotation, alerts fire.
Implicit in the WebSocket layer, where ping/pong frames detect dropped connections
ZooKeeper monitors cache node liveness via session heartbeats