Reqflow
← All learning paths
Learning path
Advanced~93 min·4 systems

Reliability & Resilience

Systems where failure is not an option: rate limiters that protect your service, schedulers that never drop a job, and payment flows that cannot charge twice.

After this path you will be able to

Design for failure as a first-class requirement: apply rate limiting, idempotency keys, CAS-based dedup, circuit breakers, and dead-letter queues to eliminate silent data loss and double-execution.

Interview approach for this path
  1. 1.Open by identifying the failure modes. For each component, ask 'what breaks if this dies?' before asking 'how do we scale it?'
  2. 2.Apply rate limiting at the API layer first and explain the algorithm: token bucket for burst tolerance, sliding window log for precision.
  3. 3.For any operation that crosses service boundaries or a network, apply idempotency. Say 'idempotency key' and explain the dedup window.
  4. 4.Wrap every downstream call that can be slow or flaky in a circuit breaker. Name the state machine: closed, open, half-open.
  5. 5.For multi-step operations (book and charge), explain the saga pattern and the compensating transaction you'd run on failure.
  6. 6.Add a dead-letter queue to every consumer so poison messages don't block the whole queue indefinitely.
  7. 7.Describe how you'd validate all of this: chaos experiments that kill dependencies and verify the circuit breakers actually trip.

Systems in this path

4 total
  1. 1
    API Rate Limiter
    Intermediate·18 min

    Distributed counters, consistent hashing, fail-open vs fail-closed.

  2. 2
    Distributed Job Scheduler
    Advanced·25 min

    Cron at scale: leader election, CAS dispatch, at-least-once delivery, exactly-once via dedup, catch-up execution.

  3. 3
    Payment Gateway
    Advanced·25 min

    Idempotency, fraud, async webhooks, ledger.

  4. 4
    Ticketmaster (Seat Booking)
    Advanced·25 min

    Seat holds, payment timeouts, virtual queue for hot events.

Concepts reinforced throughout

Up next

Large-Scale Infrastructure

The systems underneath the systems: unique ID generation, distributed object storage, event streaming, and a federated social protocol — the plumbing the internet runs on.