Bloom filters are a great tool for membership checks when disk is the enemy and a probabilistic answer is acceptable. But production reality has a sharp edge: a standard Bloom filter is append-only. As soon as you need deletions, many engineers reach for a Counting Bloom Filter (also called a deletable Bloom filter). On paper it fixes deletions. In practice, it often pushes the structure out of CPU cache and turns each lookup into a string of cache misses.
This log explains when Bloom filters work, why counting variants are often a trap, and what to do instead.
Bloom filter refresher: what it guarantees (and what it does not)
A Bloom filter answers one question: is x definitely not in the set, or maybe in the set?
- No false negatives: if the filter says “not present”, it is correct.
- False positives exist: if the filter says “present”, it means “maybe”.
The structure:
- A bit array of size
m. khash functions.- Insert sets
kbits. - Lookup checks those same
kbits.
The operational problem: standard Bloom filters do not support deletion
Standard Bloom filters are append-only because bits are shared across items:
- Item
Asets bits at positionsp1, p2, p3. - Item
Balso sets some overlapping bit, sayp2.
If you delete A and try to unset p2, you might break B and introduce a false negative. That is unacceptable for the Bloom filter contract.
Counting Bloom filters: how the deletion trick works
Counting Bloom filters replace each bit with a small counter.
| Structure | Bucket type | Insert | Delete |
|---|---|---|---|
| Bloom filter | 1-bit | set bit to 1 | not supported |
| Counting Bloom filter | 4-bit or 8-bit counter | increment counter | decrement counter |
Conceptually:
- Insert:
counter[i] += 1for each hashed bucket. - Delete:
counter[i] -= 1for each hashed bucket. - Membership: treat
counter[i] > 0as “bit is set”.
It is a clean solution at the algorithm level.
Mechanical sympathy: why counting Bloom filters often lose in production
The performance story changes when you look at cache residency and random memory access.
Cache residency is the whole game
A well-tuned Bloom filter for millions of items can be only a few megabytes.
That matters because:
- A small filter can sit in
L3 cache. L3access is on the order of10to15 nson modern CPUs.
When you move from 1-bit buckets to 4-bit or 8-bit counters, the footprint multiplies:
| Variant | Bucket size | Relative memory |
|---|---|---|
| Bloom | 1 bit | 1x |
| Counting Bloom (4-bit) | 4 bits | 4x |
| Counting Bloom (8-bit) | 8 bits | 8x |
Cross the cache size boundary and the filter falls into main memory.
Random access becomes expensive
Bloom filters depend on hash functions that behave like uniform random mappings.
So a single lookup touches k non-contiguous buckets:
- Typical
k:3to5 - Typical pattern:
krandom reads and comparisons
If the structure is in RAM, not L3, each random read is likely a cache miss.
Rough order-of-magnitude latencies:
| Location | Typical access time |
|---|---|
L3 cache | 10 to 15 ns |
| RAM | ~100 ns |
Example: k = 4 hashes and the filter lives in RAM.
4 random loads * 100 ns ≈ 400 ns
That is 400 ns to return a probabilistic “maybe.”
At scale, this becomes a CPU stall factory:
- You burn cycles waiting on memory.
- The benefit of avoiding disk I/O is real, but the cache miss rate can dominate CPU time.
This is where counting Bloom filters quietly become a throughput bottleneck.
When not to use a Counting Bloom filter
Avoid counting Bloom filters when any of these are true:
- You need high QPS membership checks and the filter no longer fits in cache.
- Your hot path budget is tight (e.g., low millisecond or sub-millisecond tail latency targets).
- Deletions are frequent and you are constantly touching random counters.
- You only used the Bloom filter to avoid a read, but now the filter itself is your CPU bottleneck.
What to do instead
Option 1: use an exact hash table if memory is not the real constraint
If you need:
- exact membership, and
- true deletion,
then a real hash set is often the simplest answer.
It costs more memory, but it can win in practice because:
- you avoid
krandom accesses across a large array, - and modern hash tables can be tuned for locality and load factor.
If RAM is available, do not over-optimize with a probabilistic structure that becomes slower.
Option 2: use a Cuckoo filter for deletions with better locality
If you truly need a probabilistic structure with deletions and good performance, consider Cuckoo filters:
- They store small fingerprints in buckets.
- They support insertion, lookup, and deletion.
- Their memory access pattern can be more cache-friendly than counting Bloom filters.
They are not a universal replacement, but they often behave better when cache pressure is the dominant cost.
Architecture view: where the time goes
graph TD
Q["Membership<br/>query"] --> H["Compute<br/>k hashes"]
H --> BF{"Bloom filter<br/>fits in cache?"}
BF -->|"Yes"| L3["L3 hits:<br/>fast random reads"]
BF -->|"No"| RAM["RAM misses:<br/>slow random reads"]
RAM --> STALL["CPU stall and<br/>throughput loss"]
L3 --> FAST["Low overhead<br/>probabilistic check"]
The same algorithm can be fast or slow depending on one question: does the structure fit in cache?
Key takeaways
- Bloom filters are great when they stay small and cache-resident. The primary win is fast probabilistic membership without touching disk.
- Standard Bloom filters do not support deletion because unsetting bits breaks the no-false-negative guarantee.
- Counting Bloom filters make deletion possible but can multiply memory footprint by
4xto8x, pushing the filter out of cache. - Once the filter lives in RAM,
krandom accesses can cost~k * 100 ns, which adds up quickly at high QPS. - If you need deletions, consider:
- an exact hash table (when memory is acceptable), or
- a Cuckoo filter (when you need probabilistic membership with deletion and better locality).
// SPONSORSHIP
If this research saved you time or improved your architecture, consider sponsoring my work on GitHub. All sponsorships go directly toward infrastructure and further technical research.
[ Become a Sponsor ]