System Design : Consistent Hashing

Consistent Hashing

Consistent hashing is an important concept one should understand in order work with scalable system. Consistent hashing is used for distributing traffic evenly across servers. Lets understand in this article what consistent hashing is and how to design it.

Rehashing Problem

Consistent Hashing is a distributed hashing scheme that operates independently of the number of servers in a distributed hash table. A hash function is a function that maps one piece of data—typically describing some kind of object, often of arbitrary size—to another piece of data, typically an integer, known as hash code, or simply hash.

Common hash function used to balance load in n cache servers is:
cacheServerIndex = hash(key)%N,  N is the size of server pool
For Example:
If there are 3 servers : A, B, C then as per the above hash function hash table would look like:
KEYHASHHASH mod 3
“ankit”16334285622
“james”75946347390
“akhil”50007991241
“tanya”97871733430
“zoya”34216579952

Now if user wants to retrieve server for key “ankit” then HASH%3 would be 2 i.e. server C. Similarly for others also these values can be retrieved.

This hash function is simple and distribution scheme also works fine until servers work properly and there’s no addition or deletion of servers. But in real world what if anyone of the servers crashes or goes down? Keys need to be redistributed in this case and account for the missing server. Same will be the case if new servers are added to the pool in this scenario keys need to be redistributed and account for newly added servers. This case will mostly be there for any distribution scheme but the problem here is with hash function hash modulo N   which is dependent upon N. So, even if a single server changes, all keys will likely be rehashed into a different server.

In the previous above example if server C goes down then hash table would look like:

KEYHASHHASH mod 2
“ankit”16334285620
“james”75946347391
“akhil”50007991240
“tanya”97871733431
“zoya”34216579951

Now if you retrieve the server again for key “ankit” then HASH%2 will be 0 i.e. Server A, which no  longer the same. Not just server “C” but all key locations have now changed. usually hashtable is stored inside cache, and as soon as the server count changes then on cache misses will occur as keys won’t be present in their old location and cache would need to retrieve data again from source to be rehashed, this would create load on the server and ultimately degrade the performance.

Solution : Consistent Hashing

How can we solve above problem?
In order to solve the above problem we need distribution scheme that is not dependent on the number of servers. So that when server count changes then number of keys to be relocated is minimum. 
This definition will now make more sense: “Consistent hashing is a distributed hashing scheme that operates independently of the number of servers or objects in a distributed hash table by assigning them a position on a hash ring.”

If we map the hash output range of above example onto the edge of the circle then this means minimum possible hash value, zero would correspond to angle of zero and maximum possible value(INT_MAX) would correspond to maximum angle of 2𝝅 radians i.e. 360 degrees and all the other hash values would fit somewhere in between. Assuming INT_MAX of 1010 then keys of our above example would look like:

KEYHASHANGLE (DEG)
ankit163342856258.8
james7594634739273.4
akhil5000799124180
tanya9787173343352.3
zoya3421657995123.2

Now if we also placed servers on the edges of the circles by pseudo-randomly assigning them angles too. Done in a repeatable way or in a way that all clients agree on the server’s angles. One of the ways to do it would be using server name. With this change we have the following:

KEYHASHANGLE (DEG)
“ankit”163342856258.8
“james”7594634739273.4
“akhil”5000799124180
“tanya”9787173343352.3
“zoya”3421657995123.2
“A”5572014558200.6
“B”8077113362290.8
“C”226954948881.7

 

As we have both keys for objects and servers in the same circle we can define a simple rule which will associate objects with servers. Each object key will belong in the server which is closest, in clockwise/counter-clockwise depending upon the convention used. Now, if you need to find out which which server to ask for a given key, you need to locate the key on the circle and move in ascending angle (counter-clockwise in this case) direction until a server is found.

KEYHASHANGLE (DEG)
“ankit”163342856258.7
“C”226954948881.7
“zoya”3421657995123.1
“akhil”5000799124180
“A”5572014557200.5
“james”7594634739273.4
“B”8077113361290.7
“tanya”787173343352.3

KEYHASHANGLE (DEG)LABELSERVER
“ankit”163292971658.7“C”C
“zoya”3421831276123.1“A”A
“akhil”5000648311180“A”A
“james”7594873884273.4“B”B
“tanya”9786437450352.3“C”C

How can you implement it? From implementation point of view what can be done is keeping a sorted list of server values (angles/numbers) and then search in this list the first server value greater than equal to the value of desired key. If no such element is found take the first value from the server list.

How can you ensure keys are evenly distributed among the servers?
 Instead of giving one angle to each server you can provide series of angle to a server this would ensure that keys are evenly distributed. Now, you will not just have A,B,C servers on the circle instead will have something like A0….A9 , B0…..B9, C0…..C9 all of them interspersed along the circle. Weight is the factor by which you increase the labels/server keys. Weight depends upon multiple factors and different servers can have different weights depending upon the number of requests that can be handled by the server. 
In the following example let’s assume all the servers are have equal weight -> 10.

 

KEYHASHANGLE (DEG)
“C6”40896552614.7
“A1”47391483017
“A2”54879887419.7
“A3”146673056752.8
“C4”149308093853.7
“ankit”163342856258.7
“B2”180800903865
“C0”198270131871.3
“B3”205875848674.1
“A7”216257892077.8
“B4”266026592195.7
“C9”3359725419120.9
“zoya”3421657995123.1
“A5”3434972143123.6
“C1”3672205973132.1
“C8”3750588567135
“B0”4049028775145.7
“B8”4755525684171.1
“A9”4769549830171.7
“akhil”5000799124180
“C7”5014097839180.5
“B1”5444659173196
“A6”6210502707223.5
“A0”6511384141234.4
“B9”7292819872262.5
“C3”7330467663263.8
“C5”7502566333270
“james”7594634739273.4
“A4”8047401090289.7
“C2”8605012288309.7
“A8”8997397092323.9
“B7”9038880553325.3
“B5”9368225254337.2
“B6”9379713761337.6
“tanya”9787173343352.3

KEYHASHANGLE (DEG)LABELSERVER
“ankit”163292971658.7“B2”B
“zoya”3421831276123.1“A5”A
“akhil”5000648311180“C7”C
“james”7594873884273.4“A4”A
“tanya”9786437450352.3“C6”C


How is this approach going to help when the number of servers change?

Suppose server “c” goes down, so all the labels C0…C8 will be removed from table and re-labelled & assigned to servers A and B randomly (Ax, Bx). All other keys of A and B servers will remain as it is and only the removed  server keys will be reassigned to existing servers, thus existing keys are not disturbed in this approach.

Similar thing will happen if instead of removing a server you added a new server. Now roughly 1/number of existing server keys will be redistributed to the new server.

This is how consistent hashing works and how we solved rehashing problem with it.

Applications of Consistent Hashing:

  •  Maglev Network Load Balancer
  • Discord Chat Application
  • Akamai Content Delivery Network
  • Data Partitioning across the cluster.

Leave a Comment

Your email address will not be published. Required fields are marked *