Stitch one request's journey across many services into a single timeline so you can see exactly where the time went.
Plain English: when one user request bounces through ten microservices, distributed tracing tags it with a shared ID and records how long each hop took. The result is a waterfall chart showing exactly which service made the request slow, instead of guessing.
A technique for tracking a single request as it propagates through a distributed system. A trace is the whole request's journey; it's made of spans, one per unit of work (a service call, a DB query), each with a start/end time, parent link, and metadata. A trace ID is propagated across every hop so the pieces can be reassembled into one end-to-end timeline.
In microservices, one user action triggers a cascade of internal calls. When it's slow or failing, per-service metrics tell you each service's health but not how they compose for this request. Distributed tracing reconstructs the exact path and timing across all services, so you can see that the 800ms latency was 700ms waiting on one downstream call, turning 'somewhere it's slow' into 'here it's slow.'
When a request enters the system, a trace ID and root span are created. Each service that handles it creates a child span (with a span ID and parent ID) and propagates the trace context (typically via HTTP headers like W3C traceparent, or message metadata) to every downstream call. Each service reports its spans to a tracing backend (Jaeger, Tempo, Zipkin), which joins them by trace ID into a waterfall. Sampling decides which traces to keep; OpenTelemetry is the standard instrumentation layer.
Tracing across hundreds of services is the only practical way to localize a slow API call to the responsible downstream
A trace follows a ride request through dispatch, geo-index, and trip services to show which tier added latency during a surge
A charge's trace spans the gateway, fraud check, ledger write, and async webhook so a stuck payment can be located precisely