Adrian Cantrill’s SAA-C02 study course, 45 minutes: HA & Scaling section: ASG Lifecycle Hooks, ASG HealthCheck Comparison: EC2 vs ELB
ASG Lifecycle Hooks
Lifecycle hooks allow you to configure custom actions which can occur during Auto Scaling group actions. You can define actions which occur either during instance launch transitions or instance terminate transitions. What this allows you to do is when an Auto Scaling group scales out or scales in, it will either launch or terminate instances. Normally this process is completely under the control of the Auto Scaling Group. As soon as it makes a decision to provision or terminate an instance, this process happens with no ability for you to influence the outcome. What lifecycle hooks do is that when you create them instances are paused within the launch or terminate flow, and they pause or wait in this state until one of two things happen: either a configurable timeout, and when that timeout expires, which by default is 3,600 seconds, they will either continue or abandon the Auto Scaling group action. The alternative is whatever process that you perform, you can explicitly resume the process using CompleteLifeCycleAction once you’ve performed whichever activity you want to perform. In addition to this, lifecycle hooks can either be integrated with EventBridge or SNS Notifications, which allow your systems to perform event driven processing based on the launch or termination of EC2 instances within an Auto Scaling group.
Normally, if an autoscaling group gets a scale out situation, an instance will be launched and it starts off in the pending state. When it completes it will move into the InService state, but this gives us no opportunity to perform any custom activities. What we could do is define a lifecycle hook and hook into the instance launch transition. So if we do hook into this transition, the instance would move from Pending to Pending:Wait, and it would wait in this state. This allows us to perform a set of custom actions. An example might be to load or index some data, which might take some time, and during this time the instance stays in this state. Once done, it will move from a Pending:Wait state, to a Pending:Proceed state, and from there it would move into the InService state. So this is the process when configuring a lifecycle hook for this part of an EC2 instance’s lifecycle. It’s these extra steps, the wait and proceed, which allows the opportunity to run custom actions. And the same happens in reverse if we define an instance terminate hook. What would normally happen when a scaling event happens would be the instance would move from a Terminate state to a Terminated state, and we wouldn’t have the ability to perform any custom actions. What we could do is define a lifecycle hook to hook into that instead, the instance would move from Terminating to Terminating:Wait, where it would wait for a timeout. By default this is 3,600 seconds, and it would wait at this point, or until we ran the CompleteLifecycleAction operation. We could use this time period to maybe backup some data or logs or otherwise tidy up the instance prior to its termination. And once the timeout expired, or when we explicitly call CompleteLifecycleAction, then it would move from Terminating:Wait to Terminating:Proceed, and then finally through to the terminated state. Lifecycle hooks can integrate with SNS for transition notifications, and EventBridge can be used to initiate other processes based on the hooks in an event driven way.
ASG HealthCheck Comparison – EC2 vs ELB
AutoScaling groups assess the health of instances within that group using health checks. And if an instance fails a health check, then it’s replaced within the AutoScaling group, So, this is a method of automatically healing the instances within the AutoScaling group. There are three different types of health checks which can be used with AutoScaling groups: EC2, which is the default, ELB checks, which can be enabled on an AutoScaling group, and then we have custom health checks.
With EC2 checks, which are the default, Stopping, Stopped, Terminated, Shutting Down or Impaired (not 2/2 states) = unhealthy, so essentially anything but the instance running is viewed as unhealthy.
We also have the option of using load balancer health checks, and for an instance to be viewed as healthy when this option is used the instance needs to be both running and it needs to be passing the load balancer health check. This is important, because if you’re using an application load balancer, then these checks could be application aware. So you can define a specific page of that application that can be used as a health check, you can do text pattern matching, and this can be checked using an application load balancer, so when you integrate this with an AutoScaling group, the checks that that AutoScaling group is capable of performing become much more application aware.
Finally, there are custom health checks, and this is where an external system can be integrated and mark instances as healthy or unhealthy. This allows you to extend the functionality of these AutoScaling group health checks by implementing a process specific to your business or using an external tool.
There is also the concept of a health check grace period, and by default this is 300 seconds or five minutes, and essentially this is a configurable value which needs to expire before health checks will take effect on a specific instance. In this case, if you select 300 seconds, then it means that the system has five minutes to launch the system, to perform any bootstrapping, and then any application startup procedures or configuration before it can fail a health check. This is really useful if you’re performing bootstrapping with your EC2 instances which are launched by the AutoScaling group. This is an important one because it does come up on the exam, and it’s often the cause of an AutoScaling group continuously provisioning and then terminating instances. If you don’t have a sufficiently long health check grace period, then you can be in a situation where the health checks start taking effect before the applications have finished configuring. At that point it will be viewed as unhealthy, terminated, and a new instance will be provisioned, and that process will repeat over and over again. You need to know how long your application instances take to launch, bootstrap, and then perform any configuration processes, and that’s how long you need to set your health check grace period to be.