System Design : Consistent Hashing

Consistent hashing is an important concept one should understand in order work with scalable system. Consistent hashing is used for distributing traffic evenly across servers. Lets understand in this article what consistent hashing is and how to design it.

Rehashing Problem

Consistent Hashing is a distributed hashing scheme that operates independently of the number of servers in a distributed hash table. A hash function is a function that maps one piece of data—typically describing some kind of object, often of arbitrary size—to another piece of data, typically an integer, known as hash code, or simply hash.

Common hash function used to balance load in n cache servers is:
cacheServerIndex = hash(key)%N, N is the size of server pool

For Example:
If there are 3 servers : A, B, C then as per the above hash function hash table would look like:

KEY	HASH	HASH mod 3
“ankit”	1633428562	2
“james”	7594634739	0
“akhil”	5000799124	1
“tanya”	9787173343	0
“zoya”	3421657995	2

Now if user wants to retrieve server for key “ankit” then HASH%3 would be 2 i.e. server C. Similarly for others also these values can be retrieved.

This hash function is simple and distribution scheme also works fine until servers work properly and there’s no addition or deletion of servers. But in real world what if anyone of the servers crashes or goes down? Keys need to be redistributed in this case and account for the missing server. Same will be the case if new servers are added to the pool in this scenario keys need to be redistributed and account for newly added servers. This case will mostly be there for any distribution scheme but the problem here is with hash function hash modulo N which is dependent upon N. So, even if a single server changes, all keys will likely be rehashed into a different server.

In the previous above example if server C goes down then hash table would look like:

KEY	HASH	HASH mod 2
“ankit”	1633428562	0
“james”	7594634739	1
“akhil”	5000799124	0
“tanya”	9787173343	1
“zoya”	3421657995	1

Now if you retrieve the server again for key “ankit” then HASH%2 will be 0 i.e. Server A, which no longer the same. Not just server “C” but all key locations have now changed. usually hashtable is stored inside cache, and as soon as the server count changes then on cache misses will occur as keys won’t be present in their old location and cache would need to retrieve data again from source to be rehashed, this would create load on the server and ultimately degrade the performance.

Solution : Consistent Hashing

How can we solve above problem?

In order to solve the above problem we need distribution scheme that is not dependent on the number of servers. So that when server count changes then number of keys to be relocated is minimum.
This definition will now make more sense: “Consistent hashing is a distributed hashing scheme that operates independently of the number of servers or objects in a distributed hash table by assigning them a position on a hash ring.”

If we map the hash output range of above example onto the edge of the circle then this means minimum possible hash value, zero would correspond to angle of zero and maximum possible value(INT_MAX) would correspond to maximum angle of 2𝝅 radians i.e. 360 degrees and all the other hash values would fit somewhere in between. Assuming INT_MAX of 1010 then keys of our above example would look like:

KEY	HASH	ANGLE (DEG)
“ankit“	1633428562	58.8
“james“	7594634739	273.4
“akhil“	5000799124	180
“tanya“	9787173343	352.3
“zoya“	3421657995	123.2

Now if we also placed servers on the edges of the circles by pseudo-randomly assigning them angles too. Done in a repeatable way or in a way that all clients agree on the server’s angles. One of the ways to do it would be using server name. With this change we have the following:

KEY	HASH	ANGLE (DEG)
“ankit”	1633428562	58.8
“james”	7594634739	273.4
“akhil”	5000799124	180
“tanya”	9787173343	352.3
“zoya”	3421657995	123.2
“A”	5572014558	200.6
“B”	8077113362	290.8
“C”	2269549488	81.7

As we have both keys for objects and servers in the same circle we can define a simple rule which will associate objects with servers. Each object key will belong in the server which is closest, in clockwise/counter-clockwise depending upon the convention used. Now, if you need to find out which which server to ask for a given key, you need to locate the key on the circle and move in ascending angle (counter-clockwise in this case) direction until a server is found.

KEY	HASH	ANGLE (DEG)
“ankit”	1633428562	58.7
“C”	2269549488	81.7
“zoya”	3421657995	123.1
“akhil”	5000799124	180
“A”	5572014557	200.5
“james”	7594634739	273.4
“B”	8077113361	290.7
“tanya”	787173343	352.3

KEY	HASH	ANGLE (DEG)	LABEL	SERVER
“ankit”	1632929716	58.7	“C”	C
“zoya”	3421831276	123.1	“A”	A
“akhil”	5000648311	180	“A”	A
“james”	7594873884	273.4	“B”	B
“tanya”	9786437450	352.3	“C”	C

How can you implement it? From implementation point of view what can be done is keeping a sorted list of server values (angles/numbers) and then search in this list the first server value greater than equal to the value of desired key. If no such element is found take the first value from the server list.

How can you ensure keys are evenly distributed among the servers?

Instead of giving one angle to each server you can provide series of angle to a server this would ensure that keys are evenly distributed. Now, you will not just have A,B,C servers on the circle instead will have something like A0….A9 , B0…..B9, C0…..C9 all of them interspersed along the circle. Weight is the factor by which you increase the labels/server keys. Weight depends upon multiple factors and different servers can have different weights depending upon the number of requests that can be handled by the server.

In the following example let’s assume all the servers are have equal weight -> 10.

KEY	HASH	ANGLE (DEG)
“C6”	408965526	14.7
“A1”	473914830	17
“A2”	548798874	19.7
“A3”	1466730567	52.8
“C4”	1493080938	53.7
“ankit”	1633428562	58.7
“B2”	1808009038	65
“C0”	1982701318	71.3
“B3”	2058758486	74.1
“A7”	2162578920	77.8
“B4”	2660265921	95.7
“C9”	3359725419	120.9
“zoya”	3421657995	123.1
“A5”	3434972143	123.6
“C1”	3672205973	132.1
“C8”	3750588567	135
“B0”	4049028775	145.7
“B8”	4755525684	171.1
“A9”	4769549830	171.7
“akhil”	5000799124	180
“C7”	5014097839	180.5
“B1”	5444659173	196
“A6”	6210502707	223.5
“A0”	6511384141	234.4
“B9”	7292819872	262.5
“C3”	7330467663	263.8
“C5”	7502566333	270
“james”	7594634739	273.4
“A4”	8047401090	289.7
“C2”	8605012288	309.7
“A8”	8997397092	323.9
“B7”	9038880553	325.3
“B5”	9368225254	337.2
“B6”	9379713761	337.6
“tanya”	9787173343	352.3

KEY	HASH	ANGLE (DEG)	LABEL	SERVER
“ankit”	1632929716	58.7	“B2”	B
“zoya”	3421831276	123.1	“A5”	A
“akhil”	5000648311	180	“C7”	C
“james”	7594873884	273.4	“A4”	A
“tanya”	9786437450	352.3	“C6”	C

How is this approach going to help when the number of servers change?

Suppose server “c” goes down, so all the labels C0…C8 will be removed from table and re-labelled & assigned to servers A and B randomly (Ax, Bx). All other keys of A and B servers will remain as it is and only the removed server keys will be reassigned to existing servers, thus existing keys are not disturbed in this approach.

Similar thing will happen if instead of removing a server you added a new server. Now roughly 1/number of existing server keys will be redistributed to the new server.

This is how consistent hashing works and how we solved rehashing problem with it.

Applications of Consistent Hashing:

Maglev Network Load Balancer
Discord Chat Application
Akamai Content Delivery Network
Data Partitioning across the cluster.

Rehashing Problem

Solution : Consistent Hashing

How can you ensure keys are evenly distributed among the servers?

Related Posts

Leave a Comment Cancel Reply