Cuckoo filters are a modern alternative to counting Bloom filters when you need deletions and you care about CPU cache behavior. Counting Bloom filters fix deletion on paper, but in production they often become a cache-miss factory once the structure grows. Cuckoo filters trade that for a tighter access pattern: lookups touch two specific buckets, not k random locations across a huge array.
This log explains how Cuckoo filters work, why they tend to be faster on real CPUs, and where they break.
The fingerprint strategy
Instead of flipping bits in a global array, a Cuckoo filter uses a hash table of buckets.
You do not store the full key. You store a small fingerprint, typically 1 byte to 2 bytes, derived from hashing the item.
Lookup: two buckets, that is it
For an item x:
- Compute fingerprint
f = fp(x). - Compute primary bucket index
i1 = h(x). - Compute alternate bucket index
i2 = i1 XOR h(f). - Check buckets
i1andi2for fingerprintf.
Membership result:
- If
fis found in either bucket, return “maybe present”. - If not found, return “definitely not present”.
Mechanical sympathy:
- Bloom filter lookup touches
krandom bits across a large array. - Cuckoo filter lookup touches two nearby buckets in a hash table.
That spatial locality is the whole win on modern CPUs.
Architecture: lookup path
graph TD
A[Lookup item x] --> B[Compute fingerprint f]
B --> C[Compute index i1]
C --> D[Compute index i2 using XOR]
D --> E[Read Buckets i1 and i2]
E --> F{Is f in i1 or i2?}
F -- Yes --> G[Maybe Present]
F -- No --> H[Definitely Not Present]
The kick-out insertion: handling collisions
The name comes from the cuckoo bird pushing eggs out of the nest. Insertions work like that.
Insertion for item x:
- Compute
f = fp(x),i1,i2. - If bucket
i1has an empty slot, insertfthere. - Else if bucket
i2has an empty slot, insertfthere. - Else, pick one of the two buckets, evict an existing fingerprint, insert
f, and move the evicted fingerprint to its alternate bucket. - Repeat evictions until an empty slot appears or a kick limit is hit.
This shuffling is why Cuckoo filters can reach high occupancy. You can push load factors up to ~95% without the false-positive rate blowing up the same way it does for an overfilled Bloom filter.
The critical tradeoff: insertion failure
Bloom filters do not fail inserts. Accuracy just degrades as the bit array gets saturated.
Cuckoo filters have a hard edge:
- If the kick-out chain loops or exceeds a threshold (often
~500kicks), the filter reports: insert failed.
At a systems level, this means you must design for capacity:
- Resize the filter (rehash into a bigger table), or
- Scale out (multiple filters, sharded by keyspace), or
- Fall back to a slower exact structure for overflow.
If you ignore this, the first time the filter fills up you will discover it via a production incident.
When to choose a Cuckoo filter
Use a Cuckoo filter when:
- You need deletions.
- You need high-QPS membership checks.
- You care about cache-friendly lookups more than a “never fail insert” guarantee.
Examples of environments where that tradeoff makes sense:
- Authorization and policy systems (Zanzibar-style access checks).
- Low-latency pipelines where the filter is on the hot path.
Prefer a classic Bloom filter when:
- Data is strictly append-only.
- You cannot tolerate insert failure at runtime.
- A steady, predictable false-positive rate is acceptable.
Key takeaways
- Cuckoo filters store compact fingerprints in a bucketed hash table, not bits in a global array.
- A lookup touches only two buckets (
i1andi2), which is why it tends to stay cache-friendly. - Insertions use kick-out shuffling to maintain high occupancy (
~95%load factor). - The non-negotiable tradeoff is insertion failure once the structure is saturated or the kick chain cannot resolve.
- If you choose Cuckoo filters, design the system around capacity management: resize, shard, or overflow handling.
// SPONSORSHIP
If this research saved you time or improved your architecture, consider sponsoring my work on GitHub. All sponsorships go directly toward infrastructure and further technical research.
[ Become a Sponsor ]