In this article we are going to talk about designing a unique ID generator for distributed systems.
Unique ID Generator we are going to design will have following characteristis:
- IDs will only consist numerical values
- IDs must fit in 64-bit
- IDs should be ordered by date
- System should be able to generate 10k IDs/second
- Multi-master replication
- UUID (Universally Unique Identifier)
- Ticket Server
- Twitter snowflake approach
Multi-master replication method
UUID (Universally Unique Identifier)
Ticket Server
Twitter’s Snowflake approach
Twitter’s unique ID generation system is known as snowflake and it meets our unique ID generator’s requirements. Istead of generating ID directly, ID is divided into different sections.
Timestamp: Milliseconds since the epoch or custom epoch.
Worker ID: we can use the unique worker ID assigned to each computer for distinction. The number of bits is determined by the maximum number of workers in the cluster.
Thread ID: To maximize throughput, multi-threading is used to leverage the parallelism offered by modern multi-core CPU machines. Why not process ID? Inter-process communication is more expensive, we want to put all generated ID into a shared in-memory buffer that can be accessed by the RPC thread.
Local counter: even on a 20-year-old computer, a single thread can generate two timestamps in one millisecond. Hence, we need a local counter that further distinguishes two UIDs. When the counter is full, the thread goes to sleep throughout the current millisecond.
For our use case we can also use the above mentioned design which consist of following:
- Sign bit: Will store either 0 or 1 1-bit
- Timestamp: Milliseconds 41-bits
- Data center Id: ID or the data center. 5-bits i.e. 2^5 = 32 datacenters
- MachineId : It of the machine 5-bits i.e 2^5 = 32 machines per datacenter
- Sequence Number: For every ID generated on machine/process, the sequence number is incremented by 1. The number reset to 0 every millisecond.