Multi-Data Center Setup:
User requests are geo-routed depending upon the closest data center to their geo location. For example there are 2 data centers US-Scus and US-WEST then requests will be distributed as x% in US-SCUS and remaining (100-x%) in US-West. This process of geo-routing is done by geoDNS service which allows domain names to be resolved to IP addresses based upon geo location of the user.
Incase, anyone of the data center goes down then all the requests will be served by other data center.
Things to consider in mind to achieve multi-data center setup:
- In order to redirect traffic to the correct data center tools like GeoDNS are required. Which will help in sending traffic from the nearest data center to the user.
- Data must be replicated across multiple data centers so that users from different regions can use different DBs in data centers without any synchronization issues.
- Automated tools should installed for the smooth deployment of applications and applications should be tested in different regions under various conditions like recreating failover situation under controlled conditions.
Async Communication Using Messaging Queue
Message queue is stored in memory and used for supporting async communication. It will act as a buffer for async requests. There are two components in a messaging queue architecture producer and a consumer.
Services which insert messages are called producers/publishers they create messages and insert it into a message queue. Services which listen to these messages are called listeners/consumers/subscribers.
One of ways to understand publisher/subscriber model in messaging queues is to work with Kafka. Decoupling message queue makes a system more scalable and reliable. Now, publisher has no dependency on consumer to send message at a particular time. Publisher can publish the message any time and whenever consumer is available then it can consume that message. Messages will stay in the queue for given retention time set by the admin. Also, both producers and consumers can be scaled independently.
Monitoring & Alerting
As you grow your business and number of users increase significantly then it is important to invest in alerting and monitoring tools.
Proper logging, alerting and monitoring is important as it helps to identify errors and problems in the system. With increase in the number of services it becomes difficult to be at the top of each component in the system and what is the current status of the system but if you have tools which not just tells you about the system health but also the data related to users which you can further analyze to understand user behavior and take right decisions for them. Some of the commonly used tools include: Grafana, Prometheus, Splunk, ELK, etc.