CPU Caches and Spatial Locality: Why an Array is 3x Faster Than a Linked List for the Exact Same Big-O Complexity

Big-O says arrays and linked lists are both O(N). Real hardware does not. On modern CPUs, an array can be ~3x faster than a linked list for the same traversal because of spatial locality and cache line behavior.

Big-O is the beginning, not the answer

Big-O assumes:

every memory access costs the same time
the CPU never stalls waiting for data

That assumption breaks on the memory wall.

Latency: cache vs RAM

Location	Approx access time	What it means
`L1 cache`	`~1 ns`	fast enough to keep the pipeline fed
`Main RAM`	`~100 ns`	if you miss, the CPU stalls

If your data is not in cache, your CPU waits. This is why “same Big-O” does not imply “same runtime.”

The secret unit of transfer: the cache line

A CPU does not fetch a single integer from RAM. It fetches cache lines:

common cache line size: 64 bytes
memory is fetched as contiguous chunks

So when you read array[0]:

the CPU fetches the cache line containing array[0]
that same line likely includes adjacent values (e.g. array[0] through array[15] if integers are 4 bytes)

That turns later iterations into cache hits.

Why arrays are fast: spatial locality

Given sequential access:

the CPU sees the access pattern as predictable
the hardware prefetcher starts loading the next cache line early

Result:

first iteration pays the cache miss cost
the next ~15 iterations execute near-cache speed (~1 ns each)

Arrays win because they turn random “wait for RAM” into predictable cache hits.

The linked list penalty: pointer chasing

Linked lists break spatial locality.

What happens in the linked list loop:

You follow a pointer from node to node.
Each node may live anywhere on the heap.
The next node address is not predictable.

So the CPU:

fetches one cache line for node A
then follows a pointer to node B
often finds node B is not in the fetched cache line
goes back to RAM (~100 ns) for the next node

This is pointer chasing, and it defeats the hardware prefetcher.

Every iteration can turn into a cache miss.

Cache behavior in the real world (rule-of-thumb)

The point is not exact percentages. The point is the direction:

Data structure	Cache hit tendency (typical)	Hardware effect
Array traversal	cache-friendly access pattern	high hit rate, prefetch works
Linked list traversal	random access pattern	low hit rate, prefetch fails

In practice you often see:

arrays closer to high cache residency (e.g. ~94% hits)
linked lists significantly lower (e.g. ~70% misses)

That matches the benchmark gap you can observe.

Benchmark intuition: same algorithm, different silicon outcome

For summing 100,000 integers:

array traversal: ~68,000 ns
linked list traversal: ~181,000 ns

The algorithm and Big-O are equal. The cost model is not.

Engineering takeaway: choose data layout for the hardware

When you pick a data structure, you are choosing:

how memory is laid out (contiguous vs scattered)
whether accesses are sequential or random
whether prefetching and cache line utilization work for you

Practical rules

Prefer arrays or tightly packed structures for hot loops.
Process data in the order it sits in memory (sequential access first).
Avoid pointer-heavy designs on latency-critical paths.
Remember: pointers are not free. Memory indirection is a hardware tax.

Mermaid: cache line utilization vs pointer chasing

graph TD
  Q[CPU asks for next element] --> A{Is data contiguous}
  A -- Yes Array --> M1[Cache line fetch 64 bytes]
  M1 --> H1[L1 hits for next 15 ints]
  H1 --> F1[Fast loop execution]

  A -- No Linked List --> M2[Cache line fetch for node A]
  M2 --> P[Follow pointer to node B]
  P --> R2{Is node B in cache line}
  R2 -- No --> RAM[RAM fetch 100 ns]
  RAM --> S[CPU stalls]
  R2 -- Yes --> H2[L1 hit]

Key takeaways

Big-O hides the memory wall. Real runtime is dominated by cache residency and stalls.
Arrays benefit from spatial locality: sequential reads use cache lines and hardware prefetchers.
Linked lists suffer from pointer chasing, which makes the next access unpredictable and cache-unfriendly.
In performance-critical code, choose data structures that match the hardware’s strengths: contiguous memory + sequential iteration.