A probabilistic set that says 'maybe' or 'definitely no' to membership queries.
A probabilistic 'have I seen this?' check that uses tiny memory at the cost of occasional false positives.
The crawler uses a Bloom filter to check 'have I already crawled this URL?' A false positive means we occasionally skip a URL (acceptable), while a false negative would mean re-crawling everything (unacceptable)