“Back-of-the-envelope calculations are the estimates you create using a combination of thought experiments and common performance numbers to get a good feel for which designs will meet the requirements”
– Jeff Dean, Google Senior Fellow
So, what are the requirements to carryout “back-of-the-envelope calculations” effectively?
- Understanding latency numbers
- Understanding Availability Numbers
- Understanding Growing Data Size
Latency Numbers
Notes & Info:
ns = nanosecond, µs = microsecond, ms = millisecond
1 ns = 10^-9 seconds
1 µs= 10^-6 seconds = 1,000 ns
1 ms = 10^-3 seconds = 1,000 µs = 1,000,000 ns
What to we understand from the above numbers?
- Compression algorithms are fast.
- Data should always be compressed before sending over the internet if possible.
- Disk is slower than memory.
- Disk seeks have huge latency it should be avoided as much as possible.
Availability Numbers
Usually there’s an agreement between the service provider and the user which is known as SLA (Service Level Agreement). This agreement defines the level of uptime the service will deliver. Usually availability in real world is between 99%-100%. For example cloud providers such as Microsoft, Amazon, Google have set their SLAs at 99.9% or above.
Lets understand what high availability percentages mean in terms of max. expected downtime:
Availability % | Downtime/day | Downtime/month | Downtime/year |
99% | 14.40 mins | 7.31 hours | 3.65 days |
99.99% | 8.64 secs | 4.38 mins | 52.60 mins |
99.999% | 864 ms | 26.30 secs | 5.26 mins |
99.9999% | 86.40 ms | 2.63 secs | 31.56 secs |
Understanding Growing Data Size
Data Unit Table:
Power of 2 | Approx. Value (Byte) | Unit |
10 | 1 Thousand | 1 Kilobyte / KB |
20 | 1 Million | 1 Megabyte / MB |
30 | 1 Billion | 1 Gigabyte / GB |
40 | 1 Trillion | 1 Terabyte / TB |
50 | 1 Quadrillion | 1 Petabyte / PB |
Tips and Tricks
So far we understood what are the various things to do this calculation effectively in this section will be covering the tips you should follow in your interview:
- Instead of working on the exact values its always a good idea to round off and do the approximation. This will save your valuable time and interviewer doesn’t expect precision.
- Always mention the assumptions by writing them down, this way interviewer would be on the same page as you are.
- Always add labels to your units this will avoid the unwanted confusion to you and the interviewer and will bring up your skill of paying attention to every detail.
- Always practice commonly asked back-of-the-envelope estimations, for example: Storage, Cache, QPS, servers count, etc.
References
J. Dean.Google Pro Tip: Use Back-Of-The-Envelope-Calculations To Choose The Best Design:
http://highscalability.com/blog/2011/1/26/google-pro-tip-use-back-of-the-envelope-calculations-to-choo.html
Latency Numbers Every Programmer Should Know:
https://colin-scott.github.io/personal_website/research/interactive_latency.html