Reqflow
← All concepts
Distributed Systems·3 min read

Saga Pattern (Distributed Transactions)

Replace a cross-service ACID transaction with a sequence of local transactions plus compensating actions to undo on failure.

Try it

Run the booking saga. Make a step fail and watch it roll back.

Book flight
Book hotel
Book car
fail at:

You can't hold one database transaction across separate services. A saga runs each step as its own local transaction and, if a later step fails, fires compensating actions to undo the earlier ones, so the system ends up all-booked or fully-cancelled, never half-done.

First time reading this? Start here

Plain English: you can't run one database transaction across multiple microservices. A saga breaks a multi-step operation (reserve seat, charge card, send ticket) into separate local steps, and if a later step fails, it runs 'undo' actions (refund the card, release the seat) for the earlier ones. It's how you fake a transaction across services.

What it is

A pattern for managing data consistency across multiple services without a distributed ACID transaction. A saga is a sequence of local transactions, each in a single service; each step has a corresponding compensating transaction that semantically undoes it. If any step fails, the saga executes the compensations for the already-completed steps in reverse, leaving the system in a consistent (rolled-back) state.

The problem it solves

Microservices each own their database, so a single business operation spanning several of them can't use one BEGIN/COMMIT, since there's no shared transaction. Two-phase commit across services exists but is slow, locks resources, and couples availability to every participant. Sagas provide consistency without distributed locks: each service commits locally and independently, and failure is handled by compensating actions rather than a global rollback.

How it works

Two coordination styles. Choreography: each service listens for events and emits the next one ('SeatReserved' → Payment service charges → 'PaymentFailed' triggers seat release). No central coordinator; simple for short flows, tangled for long ones. Orchestration: a central orchestrator explicitly tells each service what to do and what to compensate, tracking saga state. On failure at step N, the orchestrator (or the event chain) invokes compensating transactions for steps N-1…1 in reverse. Compensations must be idempotent and semantic ('refund' rather than literal rollback, since the original commit is permanent).

Why use it

  • Achieves cross-service consistency without distributed locks or two-phase commit
  • Each service commits locally and stays available independently, with no global lock holding everyone hostage
  • Orchestration centralizes the flow logic, making complex multi-step processes explicit and observable

What it costs you

  • No isolation: intermediate states are visible to others mid-saga (a seat looks booked before payment clears)
  • Compensating transactions are extra code for every step and are themselves error-prone (what if the refund fails?)
  • Only eventual consistency, and reasoning about all partial-failure orderings is genuinely hard

Where it shows up in our architectures

  • Ticketmaster (Seat Booking)

    Hold seat → charge card → issue ticket as a saga; a payment timeout triggers compensation that releases the held seat

  • Airbnb (Marketplace)

    Hold dates → authorize payment → confirm booking; if any step fails, compensations release the hold and void the authorization

  • Payment Gateway

    Multi-step charge flows (authorize, capture, notify) use compensating actions (void/refund) when a downstream step fails

Gotchas

  • Sagas give you no isolation: other transactions can see intermediate state mid-flow (a seat shows held before payment confirms). You must design for these visible in-between states, not pretend they don't exist.
  • Compensating transactions can themselves fail. Make them idempotent and retriable, and have a plan (alerting, manual intervention) for compensations that exhaust retries.
  • Compensation is semantic, not a literal rollback, because the original local commit is permanent. 'Refund' undoes a charge in effect, but the charge event still happened and is in the history.
  • Prefer orchestration for anything beyond a couple of steps. Choreography's implicit event chains become impossible to follow and debug as the flow grows; a central orchestrator keeps saga state observable.
Interview angle

Saga pattern questions come up in any multi-service transactional scenario, like booking a seat and charging a card. The key thing to say is that you can't use a distributed transaction across microservices, so you need compensating actions that semantically undo each step on failure. Explain the choice between choreography (event-driven, simple for short flows) and orchestration (central coordinator, better for complex flows where you need visibility). Candidates lose points by saying 'just use a transaction' across services, which signals they don't understand why microservices don't share a database.

Your notes

Private to you