Deep DiveApril 4, 2026·10 min read

How Database Indexes Actually Work

Most engineers know indexes make queries faster. Few know why, or when they make things slower. Here's what's happening inside the storage engine.

databasesindexesb-treeperformanceinternals

An index is one of the most powerful tools in a database and one of the most misunderstood. Engineers add indexes to slow queries and watch them speed up, which works. But without understanding the mechanism, they also add indexes that slow down writes, bloat storage, and occasionally make reads slower. Here's what actually happens.

What an index is

An index is a separate data structure that the database maintains alongside your table. It stores a copy of one or more columns, sorted by value, with a pointer back to the original row. When you query WHERE email = 'alice@example.com', without an index the database reads every row in the table (a full table scan) and checks each one. With an index on email, it jumps directly to the matching entry in the sorted index structure and follows the pointer. That's O(log n) instead of O(n).

The B-tree: what's inside

Most relational database indexes (PostgreSQL, MySQL, SQLite) use a B-tree. A B-tree is a balanced, sorted tree of pages. Each internal node holds ordered keys and pointers to child nodes. Leaf nodes hold the actual index entries (key + row pointer). To find a value, the database traverses from the root to a leaf: ~3–4 page reads for a billion-row table. Because the tree is always balanced, every search takes the same number of hops regardless of which value you're searching for. This is why B-tree indexes give consistent, predictable performance for point lookups and range queries.

Why indexes slow down writes

Every INSERT, UPDATE, or DELETE that touches an indexed column requires updating the index too. For a table with five indexes, a single row insert writes to six places: the table and each index. This is write amplification. For read-heavy tables (user profiles, product catalog) this cost is negligible. For write-heavy tables (event logs, audit tables, time-series data) it can be severe. High-frequency inserts into a heavily-indexed table can saturate I/O as the database spends more time maintaining indexes than storing data. This is why write-optimized storage engines (like LSM trees in Cassandra and RocksDB) never update data in-place, and why columnar stores deliberately avoid row-level indexes.

Column order in composite indexes

A composite index on (last_name, first_name) can serve queries filtering by last_name, or by last_name AND first_name, but not by first_name alone. The index is sorted by last_name first; within each last_name, entries are sorted by first_name. A query for first_name = 'Alice' with no last_name filter can't use this index. It would need to scan every entry across all last names. The rule is: put the most selective column first (the one that narrows the result set most), and put columns used in range comparisons last. A query like WHERE status = 'active' AND created_at > '2026-01-01' benefits from (status, created_at), not (created_at, status). Status eliminates most rows up front.

Index-only scans and covering indexes

The fastest queries are ones where the database never touches the main table at all, because every column needed is already in the index. A covering index includes all columns referenced in a query. If you frequently run SELECT email, name FROM users WHERE company_id = 5, an index on (company_id, email, name) means the database answers the query entirely from the index without reading any table pages. This matters most for queries on large tables where the row data is on disk. Avoiding the table read eliminates the most expensive I/O.

When indexes make things worse

Adding an index to a low-selectivity column (one with few distinct values, like a boolean is_active or a status enum) often makes things worse. If 95% of rows have is_active = true, the index for that value points to 95% of the table. The database may decide it's cheaper to do a full table scan (sequential reads) than follow 950,000 individual row pointers (random reads). Random I/O is expensive; sequential I/O is cheap. The query planner should detect this and ignore the index, but under certain statistics conditions it won't. If you see a query with an index running slower than without one, check the explain plan for an unexpected index scan.

Explore the concepts

Database Indexes →Caching →

See it in action

Search Engine →Distributed Cache →

← Back to all articles