HTTP/2 System Design: How It Fixes HTTP/1.1

HTTP/2 is the first serious architectural upgrade to the web’s application protocol in decades. HTTP/1.1 carried the web for more than 15 years, but modern, asset-heavy pages exposed hard limits in connection management, head-of-line blocking, and protocol overhead. This log walks through how HTTP/2 changes the wire behavior and why it matters for performance, reliability, and security.

HTTP/2 also shows up outside the browser: gRPC uses HTTP/2 streams as its transport layer—see Transitioning from REST to gRPC: System Design and Tradeoffs.

From HTTP/0.9 to HTTP/2: why the protocol had to evolve

Brief evolution

HTTP/0.9 (1991): single command (GET), simple HTML responses.
HTTP/1.0 (1996): headers, status codes, more methods, still connection-per-request behavior.
HTTP/1.1 (1997): persistent connections, chunked transfer, caching improvements.
HTTP/2 (2015): standardized after Google SPDY, focuses on latency, multiplexing, and security.

The core semantics (GET, POST, headers, status codes) stay the same. What changes is how those semantics are encoded and transported over a single TCP connection.

What a protocol actually is

At a basic level, a protocol defines:

Header: metadata such as source, destination, content-type, content-length.
Payload: application data such as HTML, JSON, images, CSS, JS.
Footer: integrity or control information (checksums, terminators, etc., depending on layer).

In HTTP’s case:

The header is the HTTP request or response header block.
The payload is the body.
Framing and validation live partly in HTTP, partly in underlying layers (TCP, TLS).

HTTP/2 keeps the same semantics but completely redesigns how headers and payloads are framed and scheduled on the wire.

Why HTTP/1.1 stopped scaling

Under HTTP/1.1:

Each TCP connection effectively handles one outstanding request at a time.
Browsers open multiple parallel connections (typically 6 per origin) to increase concurrency.

This leads to:

Connection thrash and congestion from many short-lived or underutilized connections.
Head-of-line blocking at the HTTP layer when one slow response stalls others.
Inefficient hacks to squeeze performance:
- Domain sharding
- Asset concatenation
- Image spriting
- Data inlining

These hacks fight the protocol rather than working with it.

HTTP/1.1 vs HTTP/2 at a glance

graph TD
    subgraph HTTP_2["HTTP/2"]
      C2[Client] -->|1 TCP connection with multiplexed streams| S2[Server]
    end
    subgraph HTTP_1_1["HTTP/1.1"]
      C1[Client] -->|6 TCP connections| S1[Server]
    
    end

Aspect	HTTP/1.1	HTTP/2
Connections per origin	Many parallel `TCP` connections	One persistent `TCP` connection with many logical streams
Request concurrency	Limited by connections, head-of-line blocking	Multiplexed streams over a single connection
Framing	Text based	Binary framing layer
Header size	Repeated verbose headers	HPACK header compression
Server-initiated responses	Not part of core spec	Server Push for anticipated resources
Performance tuning	Relies on hacks (sharding, spriting, concatenation)	Built-in primitives (multiplexing, prioritization, push, compression)

HTTP/2 is about fixing structural inefficiencies, not changing what GET /index.html means.

Core HTTP/2 design goals

HTTP/2 aims to deliver three properties within the protocol itself:

Simplicity at the semantic level (existing HTTP concepts preserved).
High performance by reducing application layer latency, minimizing connection count, and compressing headers.
Robustness under real-world traffic: lossy networks, mobile devices, and heavy pages.

Mechanisms used to hit these goals:

Multiplexed streams
Binary framing
Header compression (HPACK)
Server Push
Stream prioritization
Better integration with TLS

Multiplexed streams: many requests, one TCP connection

In HTTP/1.1, one request per connection at a time means:

Slow responses block fast ones on the same connection.
Browsers overcompensate by opening multiple connections.

HTTP/2 introduces a binary framing layer that:

Splits each HTTP message into small frames.
Tags frames with a stream ID (streamId).
Interleaves frames from multiple streams on a single TCP connection.
Reassembles them per stream on the other side.

How multiplexing behaves

Client opens one TCP connection.
Client sends multiple requests:
- Stream 1: GET /index.html
- Stream 3: GET /style.css
- Stream 5: GET /app.js
Server responds on all streams concurrently; frames interleave on the wire.

sequenceDiagram
    participant Client
    participant Conn as HTTP/2 Connection
    participant Server

    Client->>Conn: Open TCP + HTTP/2
    Client->>Conn: HEADERS(streamId=1, GET /index.html)
    Client->>Conn: HEADERS(streamId=3, GET /style.css)
    Client->>Conn: HEADERS(streamId=5, GET /app.js)

    Conn->>Server: Interleaved frames (1,3,5)
    Server-->>Conn: DATA(streamId=3, CSS...)
    Server-->>Conn: DATA(streamId=1, HTML...)
    Server-->>Conn: DATA(streamId=5, JS...)
    Conn-->>Client: Interleaved frames, reassembled per stream

Benefits:

Parallel requests and responses do not block each other.
A single TCP connection carries all streams, improving congestion behavior.
No need for domain sharding, spriting, or asset concatenation just to work around protocol limits.

Server Push: sending what the browser will need

Server Push lets the server send additional cacheable resources that the client has not explicitly requested yet but is very likely to need.

Example:

Client requests GET /index.html.
Server knows /index.html references /style.css and /app.js.
Server responds with:
- index.html on the original stream.
- Pushes /style.css and /app.js on server-initiated streams.

High-level behavior:

Server uses pseudo-headers such as :path to describe pushed resources.
Client caches pushed resources and can reuse them across pages.
Client can cancel specific pushes or disable push entirely.

Why it matters:

Saves request–response round trips.
Reduces page load time when the server has accurate knowledge of dependencies.
Works well with multiplexing because all pushes and responses share the same connection.

Binary framing instead of text

HTTP/1.x uses text-based framing:

Lines like GET / HTTP/1.1\r\nHost: example.com\r\n....
Parsing requires careful delimiter handling and is error-prone.

HTTP/2 switches to a binary protocol:

Fixed, well-defined frame headers.
Clear separation of control versus data.
Easier for machines to generate, parse, and validate.

Why binary helps:

More compact on the wire.
Less ambiguous parsing.
Simpler to extend (SETTINGS frames, new frame types).
Safer against text-based attacks that rely on header injection.

Semantics stay the same: browsers still operate with familiar request/response concepts; only the wire format changes.

Stream prioritization: telling the server what to care about

HTTP/2 lets clients express relative importance of streams:

Each stream has a weight in the range 1 to 256.
Streams can depend on other streams.

Practical usage:

HTML document and critical CSS or JS get higher priority.
Images, analytics, or non-critical resources get lower priority.

The server is not strictly bound to follow priorities, but good implementations use them to schedule frames and improve perceived load time.

Security and authorization aspects

HTTP/2 aligns strongly with TLS:

Browsers typically only enable HTTP/2 over TLS using ALPN (h2).
This unlocks newer TLS features and better cipher suites.

Additional pieces:

HPACK header compression:
- Compresses repeated header patterns.
- Reduces overhead for cookies and long header sets.
Authorization requirements around Server Push:
- The server must only push resources it is authorized to serve.
- Pushed resources must be cacheable and tied to a valid origin.

Binary framing and HPACK do not eliminate all security issues, but they reduce some classes of HTTP/1.x text-based attacks.

Visual HTTP/1.1 vs HTTP/2 behavior with images

Diagram comparing HTTP/1.1 multi-connection behavior with HTTP/2 single multiplexed connection between client and server. — HTTP/1.1 uses several TCP connections per client to the same server; HTTP/2 uses a single multiplexed connection with many logical streams.

Animated graphic showing improved throughput with HTTP/2 compared to HTTP/1.1. — Conceptual throughput improvement when many assets are fetched over a single optimized HTTP/2 connection.

These figures complement the request-flow and architecture diagrams by giving an at-a-glance sense of why HTTP/2 matters for performance.

Mermaid: HTTP/1.1 vs HTTP/2 request flow

graph TD
    subgraph H2["HTTP/2 page load"]
      C2[Client] -->|single TCP conn| S2[Server]
      S2 -->|multiplexed HTML/CSS/JS/images| C2
    end
      subgraph H1["HTTP/1.1 page load"]
      C1[Client] -->|TCP conn #1| S1[Server]
      C1 -->|TCP conn #2| S1
      C1 -->|TCP conn #3| S1
      S1 -->|HTML| C1
      S1 -->|CSS| C1
      S1 -->|JS + images| C1
    end

This captures the key structural difference: multiple short-lived connections with serialized responses versus one long-lived multiplexed connection with interleaved responses.

Key takeaways

HTTP/1.1 hit hard limits around concurrency and latency; hacks like domain sharding and spriting were symptoms of protocol constraints.
HTTP/2 keeps HTTP semantics but changes the wire format with binary framing, multiplexed streams, and header compression.
Multiplexing over a single TCP connection eliminates most head-of-line blocking at the HTTP layer and removes the need for many parallel connections.
Server Push and stream prioritization let the server proactively send likely-needed resources and schedule what matters most for perceived performance.
Binary framing and HPACK reduce ambiguity, overhead, and some classes of HTTP/1.x text-based attacks, pairing well with modern TLS.
For real systems, adopting HTTP/2 is about letting the protocol handle concurrency and prioritization so you can delete performance hacks and focus on clean, cacheable asset design.