← Concepts
Performance·3 min read

Bloom Filter

A probabilistic 'have I seen this?' check that uses tiny memory at the cost of occasional false positives.

First time reading this? Start here

Plain English: a tiny memory structure that answers 'have I seen this before?' very fast. It works like a bouncer with a guest list who can say 'definitely not on the list' for sure, but only 'maybe' when he thinks you are. So it says either 'definitely no' or 'probably yes,' and you use it when 'probably yes' is okay because you'll double-check after.

Used in:Search Engine
What it is

A bit array plus several hash functions. To add an item, hash it K times and set those K bits. To check membership, hash it the same way: if any of those K bits is 0, the item is definitely NOT in the set. If all K are 1, it's probably in the set (with a tunable false-positive rate).

The problem it solves

Some applications need to check 'have I seen this?' billions of times against a huge set. Storing the full set in a hash table costs gigabytes; a database lookup is too slow. A Bloom filter answers the question in constant time and a few bits per item, which is perfect when false positives are tolerable but you can't afford the storage.

How it works

Allocate an array of M bits. For each item, compute K hash values modulo M and set those bits to 1. On lookup, recompute the same K hash values and check those bits. If any are 0 → definitely not in the set. If all are 1 → probably in the set (false positive rate is configurable via M and K).

Why use it
What it costs you
Where it shows up in our architectures
Gotchas

Your notes

Private to you