CPU Caches and Spatial Locality: Why an Array is 3x Faster Than a Linked List for the Exact Same Big-O Complexity

> $ stat metadata
Date: 2026.04.10
Time: 3 min read
Tags: [cpu-caches, spatial-locality, data-structures, performance, systems, memory-hierarchy]

Big-O says arrays and linked lists are both O(N). Real hardware does not. On modern CPUs, an array can be ~3x faster than a linked list for the same traversal because of spatial locality and cache line behavior.


Big-O is the beginning, not the answer

Big-O assumes:

  • every memory access costs the same time
  • the CPU never stalls waiting for data

That assumption breaks on the memory wall.

Latency: cache vs RAM

LocationApprox access timeWhat it means
L1 cache~1 nsfast enough to keep the pipeline fed
Main RAM~100 nsif you miss, the CPU stalls

If your data is not in cache, your CPU waits. This is why “same Big-O” does not imply “same runtime.”


The secret unit of transfer: the cache line

A CPU does not fetch a single integer from RAM. It fetches cache lines:

  • common cache line size: 64 bytes
  • memory is fetched as contiguous chunks

So when you read array[0]:

  • the CPU fetches the cache line containing array[0]
  • that same line likely includes adjacent values (e.g. array[0] through array[15] if integers are 4 bytes)

That turns later iterations into cache hits.

Why arrays are fast: spatial locality

Given sequential access:

  • the CPU sees the access pattern as predictable
  • the hardware prefetcher starts loading the next cache line early

Result:

  • first iteration pays the cache miss cost
  • the next ~15 iterations execute near-cache speed (~1 ns each)

Arrays win because they turn random “wait for RAM” into predictable cache hits.


The linked list penalty: pointer chasing

Linked lists break spatial locality.

What happens in the linked list loop:

  1. You follow a pointer from node to node.
  2. Each node may live anywhere on the heap.
  3. The next node address is not predictable.

So the CPU:

  • fetches one cache line for node A
  • then follows a pointer to node B
  • often finds node B is not in the fetched cache line
  • goes back to RAM (~100 ns) for the next node

This is pointer chasing, and it defeats the hardware prefetcher.

Every iteration can turn into a cache miss.


Cache behavior in the real world (rule-of-thumb)

The point is not exact percentages. The point is the direction:

Data structureCache hit tendency (typical)Hardware effect
Array traversalcache-friendly access patternhigh hit rate, prefetch works
Linked list traversalrandom access patternlow hit rate, prefetch fails

In practice you often see:

  • arrays closer to high cache residency (e.g. ~94% hits)
  • linked lists significantly lower (e.g. ~70% misses)

That matches the benchmark gap you can observe.


Benchmark intuition: same algorithm, different silicon outcome

For summing 100,000 integers:

  • array traversal: ~68,000 ns
  • linked list traversal: ~181,000 ns

The algorithm and Big-O are equal. The cost model is not.


Engineering takeaway: choose data layout for the hardware

When you pick a data structure, you are choosing:

  • how memory is laid out (contiguous vs scattered)
  • whether accesses are sequential or random
  • whether prefetching and cache line utilization work for you

Practical rules

  • Prefer arrays or tightly packed structures for hot loops.
  • Process data in the order it sits in memory (sequential access first).
  • Avoid pointer-heavy designs on latency-critical paths.
  • Remember: pointers are not free. Memory indirection is a hardware tax.

Mermaid: cache line utilization vs pointer chasing

graph TD
  Q[CPU asks for next element] --> A{Is data contiguous}
  A -- Yes Array --> M1[Cache line fetch 64 bytes]
  M1 --> H1[L1 hits for next 15 ints]
  H1 --> F1[Fast loop execution]

  A -- No Linked List --> M2[Cache line fetch for node A]
  M2 --> P[Follow pointer to node B]
  P --> R2{Is node B in cache line}
  R2 -- No --> RAM[RAM fetch 100 ns]
  RAM --> S[CPU stalls]
  R2 -- Yes --> H2[L1 hit]

Key takeaways

  • Big-O hides the memory wall. Real runtime is dominated by cache residency and stalls.
  • Arrays benefit from spatial locality: sequential reads use cache lines and hardware prefetchers.
  • Linked lists suffer from pointer chasing, which makes the next access unpredictable and cache-unfriendly.
  • In performance-critical code, choose data structures that match the hardware’s strengths: contiguous memory + sequential iteration.

[ RELATED_LOGS ]

TTFB: -- ms LOAD: -- s PAYLOAD: -- kb