System Design: Unique ID Generator

In this article we are going to talk about designing a unique ID generator for distributed systems. 

Unique ID Generator we are going to design will have following characteristis:

  •  IDs will only consist numerical values
  • IDs must fit in 64-bit
  • IDs should be ordered by date
  • System should be able to generate 10k IDs/second
Various options to generate unique IDs inside distributed system:
  •  Multi-master replication
  • UUID (Universally Unique Identifier)
  • Ticket Server
  • Twitter snowflake approach
 

Multi-master replication method

Multi-master method uses auto_increment feature and id will be increased by n i.e. number of database servers. So every id generated will have difference of n. This method can not be scaled with multiple data centers.  IDs are not dependent upon time as per our requirement. It can not scale well when number of servers change.

 

UUID (Universally Unique Identifier)

UUID is a 128-bit number used to identify information inside computer systems. The probability of getting collisions inside UUID is very low. UUID example:  123e4567-e89b-12d3-a456-426614174000 UUIDs can be independently generated by multiple servers without having dependence on each other. But this doesn’t work for our use case we need to generate Unique ID in 64-bit and UUID is 128-bit. UUID can be non-numeric but our use case it should be numeric. UUID doesn’t go up with time which is one of our requirements.
 

Ticket Server

Ticket server was developed by Flicker to generate distributed primary keys. It consist of a centralized auto_increment feature inside a single database server.  Though ticket servers provide numeric IDs and is also easy to implement but it has a single point of failure as all the dependency is on ticket server and if it goes down then all other dependent servers will face issue. You can have multiple ticket servers but that will require additional complexity of data synchronization.

Twitter’s Snowflake approach

Twitter’s unique ID generation system is known as snowflake and it meets our unique ID generator’s requirements. Istead of generating ID directly, ID is divided into different sections. 

Timestamp: Milliseconds since the epoch or custom epoch.

Worker ID: we can use the unique worker ID assigned to each computer for distinction. The number of bits is determined by the maximum number of workers in the cluster.

Thread ID: To maximize throughput, multi-threading is used to leverage the parallelism offered by modern multi-core CPU machines. Why not process ID? Inter-process communication is more expensive, we want to put all generated ID into a shared in-memory buffer that can be accessed by the RPC thread.

Local counter: even on a 20-year-old computer, a single thread can generate two timestamps in one millisecond. Hence, we need a local counter that further distinguishes two UIDs. When the counter is full, the thread goes to sleep throughout the current millisecond.

For our use case we can also use the above mentioned design which consist of following:

  • Sign bit: Will store either 0 or 1 1-bit
  • Timestamp: Milliseconds 41-bits
  • Data center Id: ID or the data center. 5-bits i.e. 2^5 = 32 datacenters
  • MachineId : It of the machine 5-bits i.e 2^5 = 32 machines per datacenter
  • Sequence Number: For every ID generated on machine/process, the sequence number is incremented by 1. The number reset to 0 every millisecond.
DatacenterIds and machineIds are chosen at the startup time and any changes inside these Ids need to be reviewed carefully as it can lead to id conflict if not reviewed properly.  This how our unique ID generator looks and it meets all our requirements.

Leave a Comment

Your email address will not be published. Required fields are marked *