Data·3 min read

Database Indexes

Pre-computed lookup structures that turn O(n) table scans into O(log n) or O(1) lookups.

Try it

Find 33. Toggle the index and watch how many rows get checked.

Rows checked: 0

Use an index on this column

Without an index the database scans every row until it finds a match. An index is a sorted lookup structure (usually a B-tree) that jumps almost straight to the row. The cost: indexes take space and slow down writes, since every insert must update them too.

First time reading this? Start here

Plain English: a database index is like the index at the back of a book. Without it, finding a topic means flipping through every page. With it, you jump straight to the right page. Same idea, applied to rows.

Used in:URL Shortener Search Engine Yelp

What it is

An auxiliary data structure that maps column values to row locations. The most common type is a B-tree (or B+-tree): sorted, balanced, range-query-friendly. LSM-trees are the alternative used by write-heavy stores (Cassandra, RocksDB).

The problem it solves

Without an index, finding a row by some column means scanning every row, an O(n) operation. On a table with billions of rows, that's intolerable. An index turns that into a fast lookup.

How it works

B-tree: sorted by indexed column; lookups walk the tree (log n). Updates are in-place but require rebalancing. LSM-tree: writes go to a memory buffer, periodically flushed to immutable sorted files; reads check multiple files and a Bloom filter to skip ones that don't have the key. Writes are much faster (append-only) but reads are amortized.

Why use it

Reads on indexed columns are dramatically faster
Range scans (ORDER BY, BETWEEN) work efficiently on B-trees
LSM-trees give very high write throughput for time-series workloads

What it costs you

Every write must update every index, causing write amplification
Indexes take storage (often as much as the table itself for heavily-indexed tables)
Indexes only help if the query planner uses them, so composite index order matters

Where it shows up in our architectures

URL Shortener →
Postgres index on short_key turns redirect lookups into O(log n)
Search Engine →
Inverted index: the entire data structure that makes 'find documents containing X' fast
Yelp →
Geo index (geohash), a specialized spatial index

Gotchas

Don't index every column. Each index slows writes and takes space. Add indexes for actual query patterns.
Composite index column order matters. INDEX (a, b) supports queries on a or (a, b), not on b alone.
B-trees for read-heavy, balanced workloads. LSM-trees for write-heavy time-series. The choice is at the storage engine level.

Interview angle

Index questions come up whenever you propose a database table. The interviewer wants you to proactively say which columns you'd index and why, not wait to be asked. The key signal is knowing that indexes have a cost: every index slows writes and adds storage, so you only add them for columns that appear in WHERE or JOIN conditions of frequent queries. Candidates lose points by either missing indexes entirely or proposing indexes on every column.

Your notes

Private to you