A probabilistic 'have I seen this?' check that uses tiny memory at the cost of occasional false positives.
Plain English: a tiny memory structure that answers 'have I seen this before?' very fast. It works like a bouncer with a guest list who can say 'definitely not on the list' for sure, but only 'maybe' when he thinks you are. So it says either 'definitely no' or 'probably yes,' and you use it when 'probably yes' is okay because you'll double-check after.
A bit array plus several hash functions. To add an item, hash it K times and set those K bits. To check membership, hash it the same way: if any of those K bits is 0, the item is definitely NOT in the set. If all K are 1, it's probably in the set (with a tunable false-positive rate).
Some applications need to check 'have I seen this?' billions of times against a huge set. Storing the full set in a hash table costs gigabytes; a database lookup is too slow. A Bloom filter answers the question in constant time and a few bits per item, which is perfect when false positives are tolerable but you can't afford the storage.
Allocate an array of M bits. For each item, compute K hash values modulo M and set those bits to 1. On lookup, recompute the same K hash values and check those bits. If any are 0 → definitely not in the set. If all are 1 → probably in the set (false positive rate is configurable via M and K).
The crawler uses a Bloom filter to check 'have I already crawled this URL?' A false positive means we occasionally skip a URL (acceptable), while a false negative would mean re-crawling everything (unacceptable)