Reqflow
← All articles
Big TechFebruary 7, 2026·9 min read·Original: Discord Blog

How Discord Stores Trillions of Messages

Discord migrated from MongoDB to Cassandra to ScyllaDB as their message store grew from millions to trillions. Here's what they learned.

databasescassandrascylladbscalemigration

In 2023, Discord published a detailed write-up of how they migrated their message storage from Cassandra to ScyllaDB, handling trillions of messages, billions of reads per day, and latency requirements in the low double-digit milliseconds. The post is a rare honest account of what database scale looks like when you actually hit it.

Why they originally chose Cassandra

Discord's message storage pattern is almost perfectly suited to Cassandra: writes are always appends (new messages), reads are by channel and time range (give me the last 50 messages in channel X), and the data is naturally partitioned by channel_id. Cassandra's wide-row model maps directly to this: each partition key is a (channel_id, bucket) pair (bucket = a time window to keep partitions manageable), and each row within the partition is a message ordered by message_id. At launch, Cassandra gave Discord horizontal scalability and high write throughput at low ops cost.

What broke at scale

By 2022 Discord had 177 nodes in their Cassandra cluster and were hitting JVM garbage collection pauses that caused latency spikes during peak hours. Hot partitions (Discord's largest channels can receive millions of messages) caused GC pressure on specific nodes. Routine maintenance operations like repairs (Cassandra's anti-entropy process) took hours and put the cluster under strain. And compaction, the background process that merges SSTables, was causing periodic latency spikes that showed up for users as message load delays. These aren't Cassandra bugs; they're the predictable behavior of a JVM-based storage engine at scale.

The migration to ScyllaDB

ScyllaDB is a C++ rewrite of Cassandra's wire protocol and storage model. It's API-compatible (Discord's queries didn't change) but eliminates JVM GC pauses entirely. ScyllaDB uses a shard-per-core architecture: each CPU core owns a subset of the keyspace and processes its own I/O without locking, which maps better to modern NVMe SSDs. Discord migrated their 177-node Cassandra cluster to a 72-node ScyllaDB cluster while serving live traffic, using a dual-write migration pattern: new writes go to both, reads move to ScyllaDB once backfill catches up, then Cassandra is decommissioned.

The data service layer: shielding the database

Discord built a Rust data service layer between their application code and ScyllaDB. This service handles request coalescing (10 concurrent requests for the same channel's recent messages collapse into one database read), in-process caching of hot channels, and backpressure. The service layer is the key architectural lesson: by the time you reach trillions of rows, a raw database driver in every application service is too fragile. You need a tier that can enforce access patterns, absorb thundering herds, and give you a single place to tune behavior.

Takeaways

Discord's story illustrates that the right database choice at 1M messages is rarely still the right choice at 1T. Cassandra was genuinely correct for them early on. The migration to ScyllaDB wasn't fixing a mistake. It was the expected next step. Design for data models and access patterns, not vendor loyalty. And when you can, put a service layer in front of your database before you need it, not after.

Explore the concepts

← Back to all articles