The RUM Conjecture: You Cannot Optimize Reads, Updates, and Memory at Once

As engineers, we all want the impossible database: infinite write throughput, constant-time reads, and negligible memory usage. The RUM Conjecture formalizes why that system cannot exist on real hardware.

When you design an index or access method, you are trading between three forces:

Read Overhead (R) – extra work per lookup.
Update Overhead (U) – extra work per insert, update, or delete.
Memory Overhead (M) – extra space for indexes, pointers, and metadata.

The conjecture is blunt: you can strictly optimize at most two of R, U, and M; the third will be expensive.

1. The RUM Conjecture in Practice

The RUM Conjecture comes from work at the Harvard DASLab (Data Systems Laboratory) on indexing and storage engines. It captures a constraint imposed by hardware realities:

Disks and SSDs have finite bandwidth and non-zero seek costs.
CPUs and caches have limited capacity and latency per access.
DRAM is expensive relative to disk and must be used carefully.

Any index that appears to beat these constraints is simply hiding a cost in one corner of the R/U/M triangle.

At a high level:

You can minimize:
  - Read Overhead (`R`)
  - Update Overhead (`U`)
  - Memory Overhead (`M`)
but not all three simultaneously.

2. RUM Tradeoffs: B-Trees, LSM-Trees, and Hash Indexes

To make the conjecture concrete, map it to the engines you already run in production.

2.1 R + M optimized: B-Trees (writes pay the price)

Relational databases such as MySQL and PostgreSQL default to B-Tree indexes for primary and secondary keys.

Reads (R)
- B-Trees minimize the height of the tree.
- Lookups require O(log n) page reads, which is small when pages are large and highly packed.
Memory (M)
- Pages store keys in sorted order with tight packing.
- Metadata overhead per record is low; cache lines are used efficiently.
Updates (U)
- Random inserts require:
  - Traversing the tree from root to leaf.
  - Potential page splits when a node is full.
  - Cascading updates to parent nodes.
- This makes writes expensive and fragmented, especially with random keys.

RUM corner: B-Trees are effectively R + M optimized, while U is expensive.

2.2 U + M optimized: LSM-Trees (reads pay the price)

Write-heavy systems such as Cassandra and RocksDB use Log-Structured Merge Trees (LSM-Trees).

Updates (U)
- Writes append to an in-memory memtable, then flush to disk as immutable SSTables.
- Disk I/O is sequential, which is close to optimal for SSDs and HDDs.
Memory (M)
- Data is stored in sorted runs with strong compression.
- Bloom filters and sparse indexes reduce in-memory footprint per record.
Reads (R)
- A single key lookup may require:
  - Checking the memtable.
  - Consulting Bloom filters for multiple SSTables.
  - Searching several on-disk files to find the latest version.
- Compaction can reduce this cost, but at the expense of background write I/O.

RUM corner: LSM-Trees are U + M optimized, while R is expensive.

2.3 R + U optimized: Hash indexes (memory pays the price)

If you need instant reads and writes for point lookups, you can use a hash index or a direct-addressing table.

Reads (R)
- Lookup is O(1) on average: compute hash, jump to bucket, compare a few entries.
Updates (U)
- Inserts and updates are also O(1) on average, with small constant factors.
Memory (M)
- To keep collision rates low, you must over-allocate buckets.
- Structures are not naturally compressible and often require contiguous RAM.
- Hash tables are useless for range scans, so you may need additional indexes, further increasing M.

RUM corner: Hash-based structures are R + U optimized, while M is expensive.

2.4 RUM comparison table

Structure	Read Overhead (`R`)	Update Overhead (`U`)	Memory Overhead (`M`)	Typical systems
B-Tree	Low	High	Low	MySQL, PostgreSQL
LSM-Tree	High	Low	Low	Cassandra, RocksDB
Hash index	Low	Low	High	In-memory caches, key–value

You are not choosing a good or bad engine; you are choosing where to pay.

3. Visualizing the RUM Triangle

The RUM Conjecture is easiest to reason about as a triangle: each access method sits near one edge, emphasizing two dimensions and compromising on the third.

graph TD
  R["Low Read Overhead (R)"]
  U["Low Update Overhead (U)"]
  M["Low Memory Overhead (M)"]

  R --- U
  U --- M
  M --- R

  BT["B-Tree<br/>(MySQL, Postgres)"] --> R
  BT --> M

  LSM["LSM-Tree<br/>(Cassandra, RocksDB)"] --> U
  LSM --> M

  HASH["Hash Index"] --> R
  HASH --> U

  %% Styling to match the Brutalist dark theme
  style R fill:#111,stroke:#4ADE80,stroke-width:2px,color:#fff
  style U fill:#111,stroke:#4ADE80,stroke-width:2px,color:#fff
  style M fill:#111,stroke:#4ADE80,stroke-width:2px,color:#fff
  
  style BT fill:#111,stroke:#60A5FA,stroke-width:1px,color:#fff
  style LSM fill:#111,stroke:#60A5FA,stroke-width:1px,color:#fff
  style HASH fill:#111,stroke:#60A5FA,stroke-width:1px,color:#fff

If a vendor claims they are “optimal” at all three corners, the diagram tells you what to ask next: Where is the hidden cost?

4. Engineering Takeaways

4.1 Stop asking “Is this database good?”

The RUM Conjecture reframes the question. Instead of:

“Is database X good or bad?”

Ask:

“Which dimension is X optimizing, and what am I willing to pay for the other two?”

Examples:

Analytics / OLAP
- Prioritize reads and compression (R + M).
- Accept slower, batched writes (U).
Write-heavy telemetry / logging
- Prioritize write throughput and compression (U + M).
- Accept slower, more complex reads (R).
Low-latency key–value cache
- Prioritize reads and writes (R + U).
- Accept high memory bills (M).

4.2 Design with explicit RUM budgets

When designing a storage subsystem or selecting a database:

Quantify your workload
- Target reads/sec, writes/sec, and acceptable p99 latencies.
- Estimate data volume and growth rate in GB or TB.
Allocate a RUM budget
- Decide which dimension you are willing to make expensive:
  - Higher read latency (R)
  - Higher write amplification (U)
  - Higher memory / storage cost (M)
Choose structures accordingly
- B-Trees for mixed workloads with strong point and range query needs.
- LSM-Trees when ingest rate dominates and you can tolerate heavier read paths.
- Hash indexes when single-key latency is everything and memory is abundant.

4.3 Use RUM as a vendor litmus test

The next time you see claims like:

“Sub-millisecond reads and writes at petabyte scale, with minimal resource usage.”

Run the RUM checklist:

If R is low and U is low, M must be high (or hidden in a separate tier).
If R is low and M is low, U is paying via compaction, page splits, or background jobs.
If U is low and M is low, R is paying with multi-level lookups and higher tail latency.

Physics guarantees that somewhere, the cost shows up.

5. Why the RUM Conjecture Matters

The RUM Conjecture is not just an academic curiosity. It is a design tool:

It explains why your existing systems behave the way they do under load.
It guides which storage engine to use per workload instead of defaulting to one database for everything.
It arms you with the right questions when reviewing proposals, whitepapers, or vendor pitches.

You cannot beat RUM, but you can choose your corner deliberately. The systems that age well are the ones whose engineers knew exactly which trade-offs they were making—and why.