Blog

System design, explained in depth

DDIA-style deep dives and summaries of real engineering decisions from Netflix, Uber, Discord, and others. Every post connects back to a concept or system you can explore interactively.

Deep Dive·May 23, 2026·9 min read

How Rate Limiting Actually Works at Scale

Token buckets, sliding windows, Redis counters, and the distributed rate limiting problem nobody talks about in interviews.

rate limitingalgorithmsredisdistributed systemsapi design

Read article →

Deep Dive

May 23, 2026·9 min read

The Hidden Cost of Microservices

Microservices give you independent deployability and fault isolation. They also give you distributed transactions, network latency, and a debugging problem that gets exponentially harder as you add services.

microservicesarchitecturedistributed systems

May 16, 2026·9 min read

Building for Idempotency: Practical Patterns for Every Engineer

Idempotency is not a theoretical property. It's the difference between a payment that charges once and one that charges three times when the client retries. Here's how to build it correctly.

idempotencyapi designpayments

May 16, 2026·8 min read

Consistent Hashing: How Distributed Caches Scale Without Losing Everything

Adding one server to a cache cluster shouldn't invalidate 90% of your cache. Consistent hashing is why it doesn't, and the math is simpler than you think.

consistent hashingdistributed systemscaching

May 9, 2026·10 min read

MVCC: How Databases Let Thousands of Transactions Run Simultaneously

Every modern database handles concurrent reads and writes without locking readers out. The mechanism is Multi-Version Concurrency Control, and it's one of the most elegant ideas in database engineering.

databasesconcurrencymvcc

May 2, 2026·8 min read

The CAP Theorem Is Not What Your Interviewer Thinks

CAP gets cited in every system design interview and misunderstood in most. Here's what it actually says, what it doesn't say, and what it means for your design decisions.

distributed systemscap theoremconsistency

Apr 18, 2026·9 min read

Why Distributed Locks Are Harder Than They Look

A distributed lock seems simple: one process holds it, others wait. In practice, clocks drift, processes pause, and networks lie, which makes every simple lock scheme subtly broken.

distributed systemslocksconcurrency

Apr 4, 2026·10 min read

How Database Indexes Actually Work

Most engineers know indexes make queries faster. Few know why, or when they make things slower. Here's what's happening inside the storage engine.

databasesindexesb-tree

Mar 21, 2026·9 min read

Two-Phase Commit: Why Distributed Transactions Are Hard and What People Use Instead

2PC is the textbook solution for distributed transactions. It's also why most distributed systems avoid distributed transactions entirely.

distributed systemstransactionsconsistency

Mar 7, 2026·11 min read

LSM Trees vs B-Trees: Why Your Database's Storage Engine Is a Design Decision

Every database makes a fundamental choice between write-optimized and read-optimized storage. Here's what that means for your workload.

databasesstorage engineslsm-tree

Feb 21, 2026·10 min read

Replication Lag: What It Is, Why It Bites You, and How to Tame It

Every replicated database has replication lag. Most engineers don't fully understand what happens when reads hit a stale replica, until production teaches them.

replicationdatabasesconsistency

Big Tech

Feb 7, 2026·9 min read·Discord Blog

How Discord Stores Trillions of Messages

Discord migrated from MongoDB to Cassandra to ScyllaDB as their message store grew from millions to trillions. Here's what they learned.

databasescassandrascylladb

Jan 24, 2026·7 min read·Uber Engineering Blog

How Uber Computes Surge Pricing in Real Time Across Every City

Uber's dispatch and pricing systems need sub-second latency while reading live supply/demand across millions of driver and rider events per minute.

real-timegeospatialpricing

Jan 10, 2026·8 min read·Netflix Tech Blog

How Netflix Serves 700M Hours of Video Without a Single Region Taking It Down

A deep look at Netflix's Active-Active multi-region architecture and what it actually took to get there.

availabilitymulti-regionreplication