Bloom Filters vs Counting Bloom Filters: When Deletions Kill Performance

> $ stat metadata
Date: 2026.03.28
Time: 5 min read
Tags: [bloom-filter, cache, performance, data-structures, distributed-systems, systems]

Bloom filters are a great tool for membership checks when disk is the enemy and a probabilistic answer is acceptable. But production reality has a sharp edge: a standard Bloom filter is append-only. As soon as you need deletions, many engineers reach for a Counting Bloom Filter (also called a deletable Bloom filter). On paper it fixes deletions. In practice, it often pushes the structure out of CPU cache and turns each lookup into a string of cache misses.

This log explains when Bloom filters work, why counting variants are often a trap, and what to do instead.


Bloom filter refresher: what it guarantees (and what it does not)

A Bloom filter answers one question: is x definitely not in the set, or maybe in the set?

  • No false negatives: if the filter says “not present”, it is correct.
  • False positives exist: if the filter says “present”, it means “maybe”.

The structure:

  • A bit array of size m.
  • k hash functions.
  • Insert sets k bits.
  • Lookup checks those same k bits.

The operational problem: standard Bloom filters do not support deletion

Standard Bloom filters are append-only because bits are shared across items:

  • Item A sets bits at positions p1, p2, p3.
  • Item B also sets some overlapping bit, say p2.

If you delete A and try to unset p2, you might break B and introduce a false negative. That is unacceptable for the Bloom filter contract.


Counting Bloom filters: how the deletion trick works

Counting Bloom filters replace each bit with a small counter.

StructureBucket typeInsertDelete
Bloom filter1-bitset bit to 1not supported
Counting Bloom filter4-bit or 8-bit counterincrement counterdecrement counter

Conceptually:

  • Insert: counter[i] += 1 for each hashed bucket.
  • Delete: counter[i] -= 1 for each hashed bucket.
  • Membership: treat counter[i] > 0 as “bit is set”.

It is a clean solution at the algorithm level.


Mechanical sympathy: why counting Bloom filters often lose in production

The performance story changes when you look at cache residency and random memory access.

Cache residency is the whole game

A well-tuned Bloom filter for millions of items can be only a few megabytes.

That matters because:

  • A small filter can sit in L3 cache.
  • L3 access is on the order of 10 to 15 ns on modern CPUs.

When you move from 1-bit buckets to 4-bit or 8-bit counters, the footprint multiplies:

VariantBucket sizeRelative memory
Bloom1 bit1x
Counting Bloom (4-bit)4 bits4x
Counting Bloom (8-bit)8 bits8x

Cross the cache size boundary and the filter falls into main memory.

Random access becomes expensive

Bloom filters depend on hash functions that behave like uniform random mappings.

So a single lookup touches k non-contiguous buckets:

  • Typical k: 3 to 5
  • Typical pattern: k random reads and comparisons

If the structure is in RAM, not L3, each random read is likely a cache miss.

Rough order-of-magnitude latencies:

LocationTypical access time
L3 cache10 to 15 ns
RAM~100 ns

Example: k = 4 hashes and the filter lives in RAM.

4 random loads * 100 ns ≈ 400 ns

That is 400 ns to return a probabilistic “maybe.”

At scale, this becomes a CPU stall factory:

  • You burn cycles waiting on memory.
  • The benefit of avoiding disk I/O is real, but the cache miss rate can dominate CPU time.

This is where counting Bloom filters quietly become a throughput bottleneck.


When not to use a Counting Bloom filter

Avoid counting Bloom filters when any of these are true:

  • You need high QPS membership checks and the filter no longer fits in cache.
  • Your hot path budget is tight (e.g., low millisecond or sub-millisecond tail latency targets).
  • Deletions are frequent and you are constantly touching random counters.
  • You only used the Bloom filter to avoid a read, but now the filter itself is your CPU bottleneck.

What to do instead

Option 1: use an exact hash table if memory is not the real constraint

If you need:

  • exact membership, and
  • true deletion,

then a real hash set is often the simplest answer.

It costs more memory, but it can win in practice because:

  • you avoid k random accesses across a large array,
  • and modern hash tables can be tuned for locality and load factor.

If RAM is available, do not over-optimize with a probabilistic structure that becomes slower.

Option 2: use a Cuckoo filter for deletions with better locality

If you truly need a probabilistic structure with deletions and good performance, consider Cuckoo filters:

  • They store small fingerprints in buckets.
  • They support insertion, lookup, and deletion.
  • Their memory access pattern can be more cache-friendly than counting Bloom filters.

They are not a universal replacement, but they often behave better when cache pressure is the dominant cost.


Architecture view: where the time goes

graph TD
    Q["Membership<br/>query"] --> H["Compute<br/>k hashes"]
    H --> BF{"Bloom filter<br/>fits in cache?"}
    BF -->|"Yes"| L3["L3 hits:<br/>fast random reads"]
    BF -->|"No"| RAM["RAM misses:<br/>slow random reads"]
    RAM --> STALL["CPU stall and<br/>throughput loss"]
    L3 --> FAST["Low overhead<br/>probabilistic check"]

The same algorithm can be fast or slow depending on one question: does the structure fit in cache?


Key takeaways

  • Bloom filters are great when they stay small and cache-resident. The primary win is fast probabilistic membership without touching disk.
  • Standard Bloom filters do not support deletion because unsetting bits breaks the no-false-negative guarantee.
  • Counting Bloom filters make deletion possible but can multiply memory footprint by 4x to 8x, pushing the filter out of cache.
  • Once the filter lives in RAM, k random accesses can cost ~k * 100 ns, which adds up quickly at high QPS.
  • If you need deletions, consider:
    • an exact hash table (when memory is acceptable), or
    • a Cuckoo filter (when you need probabilistic membership with deletion and better locality).

[ RELATED_LOGS ]

TTFB: -- ms LOAD: -- s PAYLOAD: -- kb