Let's say you have a web service. On a typical day, it receives about 10 requests per second. Your application, and the hardware it's deployed on, can serve 50 requests per second easily. How many machines do you need?
Now, let's change it up a bit. You expect to get monthly bursts of traffic where, for a six hour period, you'll receive 400 requests per second. How many machines do you need?
If your answer is one machine, or eight machines, you should read about auto scaling.
High availability
Let's take the first case. There are no bursts in requests. You know with 100% certainty that you will never receive more than 10 requests per second, and also with 100% certainty that your app/hardware combo can serve 50 requests per second. Is one machine sufficient? If you want high availability, the answer is no!
Machine failure is not a statistical anamoly. It is a real fact of life that must be planned for. Hardware can break, a cloud provider may have a configuration glitch, or something else.
If you have a service where downtime is acceptable, one machine is sufficient. If you have a good monitoring and alerting system, you'll get a 2am SMS, roll out of bed, and spin up a new machine.
But if that 2am SMS doesn't sound pleasant, or you don't want to have that downtime, you need at least two machines. On a typical cloud provider, you'll want to ensure that the two machines are living in two different availability zones, so that if one zone experiences an outage, you'll still have a server running in another zone.
Does this eliminate the possibility of downtime? No. But you've drastically reduced the likelihood of it happening, since with two machines you now have to have two statistically unlikely events occur simultaneously. And if you bump this up to three, you really are now talking about a statistical anamoly.
Bursty workloads
The big innovation of cloud computing was on demand resources. In the example above, where you have a 40-fold increase in traffic for only six hours a month, the on-demand solution would be "get extra machines for just those six hours."
The cost difference for this is significant. If one of these machines costs $20 per month, renting eight for the entire month costs $160. By contrast, renting eight for six hours, and two for the rest of the month, comes out to $41.
Will an extra $119 break the bank for a project? Certainly not. But imagine a burstier workload. Or maintaining a large number of projects for a company. These costs can quickly add up.
Auto scaling to the rescue
Auto scaling is the technique of automatically adding and removing machines from your cluster based on some conditions. Some typical conditions for increasing capacity would be:
- Average CPU load on all machines in the cluster has been above 80% for over 5 minutes
- Average HTTP response time has degraded below a certain threshold for more than 5% of requests
By contrast, machines can be removed if the reverse is true, or if a machine in the cluster fails some kind of aliveness check. This solves both the bursty workload issue, and provides for auto-recovery from machine failure.
Our standard deployment workflows, based on tools like Terraform and Kubernetes creates a cluster of auto-recovering, auto-scaling machines that are able to host multiple applications.
Learn more
Learn more about FP Complete's DevOps stack on our syllabus page.