Adrian Cantrill’s SAA-C02 study course, 75 minutes: HA and Scaling section: AWS Launch Configurations and Templates & Auto Scaling Groups
Launch Configuration and Templates
Launch Configuration and Launch Templates are two features of EC2. They both perform a similar function, but launch templates came after launch configurations and include extra features and capabilities.
Launch configurations and launch templates, at a high level, perform the same task. They allow the configuration of EC2 instances to be defined in advance. They are documents which let you configure things like which AMI to use, the instance type and size, the configuration of the storage which instances use, and the key pair which is used to connect to that instance.
They also let you define the networking configuration and security groups that an instance uses. They let you configure the user data which is provided to the instance and the IAM role, which is attached to the instance, used to provide the instance with permissions.
Everything which you usually define at the point of launching an instance, you can define in launch configurations and launch templates. Both of these are not editable. You define them once, and the configuration is locked. Launch templates, as the newer of the two, allow you to have versions. But for launch configurations, versions aren’t available.
Launch templates also have additional features or allow you to control features of the newer types of instances, things like T2 or T3 Unlimited CPU options, placement groups capacity reservations, and things like Elastic Graphics. AWS recommend using launch templates at this point in time because they’re a superset of launch configuration. They provide all of the configuration that launch configuration provides and more.
Architecturally, launch templates also offer more utility. Launch configurations have one use. They’re used as part of Auto Scaling groups. Auto Scaling groups offer Auto Scaling for EC2 instances, and launch configurations provide the configuration of those EC2 instances, which will be launched by Auto Scaling groups. As a reminder, they’re not editable nor do they have any versioning capability.
If you need to adjust the configuration inside a launch configuration, you need to create a new one and use that new launch configuration.
Launch templates can also be used for the same thing, so providing EC2 configuration, which is used within Auto Scaling groups. In addition they can also be used to launch EC2 instances from the console or the CLI.
Auto Scaling Groups
EC2 Auto Scaling Groups are how we can configure EC2 to scale automatically based on demand placed on the system. Auto Scaling Groups are generally used together with Elastic Load Balancers and launch templates to deliver elastic architectures.
Auto Scaling Groups do one thing: they provide Auto-Scaling for EC2. They can also be used to implement a self-healing architecture as part of that scaling or in isolation. Auto Scaling Groups make use of configuration defined within launch templates or launch configurations. That’s how they know what to provision. An Auto Scaling Group uses one launch configuration or one specific version of a launch template which is linked to it. You can change which of those is associated but it’s one of them at a time. So, all instances launced using the Auto Scaling Group are based on the single configuration definition, either defined within a specific version of a launch template or within a single configuration.
An Auto Scaling group has three important values associated with it: Minimum Size, Desired Capacity, and the Maximum Size, which are often referred to as min, desired, and max, and can often be expressed as X, Y, or Z. For example, 1:2:4 means 1 minimum, 2 desired, 4 maximum.
An auto scaling group has one foundational job which it performs: it keeps the number of running EC2 instances the same as the desired capacity, and it does this by provisioning or terminating instances. The desired size always has to be more than the minimum capacity but less than the maximum size. If you have a desired capacity of two, but only one running EC2 instance, then the Auto Scaling Group provisions a new instance. If you have a desired capacity of two, but have three running EC2 instances, then the Auto Scaling Group will terminate an instance to make those two values match.
You can keep an Auto Scaling Group entirely manual, so there’s no automation and no intelligence. You just update values and the Auto Scaling Group performs the necessary scaling actions. Normally though, Scaling Policies are used together with Auto Scaling Groups. Scaling policies can update the desired capacity based on certain criteria, for example, CPU load, and if the desired capacity is updated, then it will provision or terminate instances
Auto Scaling Groups run within a VPC across one or more subnets. The configuration for EC2 instances is provided either using launch templates or launch configurations. And then on the Auto Scaling Group we specify a minimum value, and this means that there will always be at least one running <minimum> EC2 instance. We can also set the desired capacity, which will add more instances if the desired capacity is higher than the minimum. Finally, we could set the maximum size, which means that <maximum size> instances could be provisioned but they won’t immediately because the desired capcity is set to <desired capacity>. We could manually adjust the desired capacity up or down to automatically add or remove instances which would automatically be built based on the launch template or launch configuration. Alternatively we could use scaling policies to automate that process and scale in or out based on sets of criteria.
Architecturally Auto Scaling Groups define where instances are launched; they’re linked to a VPC and subnets within that VPC are configured on the Auto Scaling Group. Whatever subnets are configured will be used to provision instances into. When instances are provisioned there’s an attempt to keep the number of instances within each Availability Zone even. So, if the Auto Scaling Group was configured with three subnets and the desired capacity was also set to three, then it’s probably each subnet would have one EC2 instance running within it, but this isn’t always the case. The Auto Scaling Group will try and level capacity where available.
Scaling policies are essentially rules, rules which you define which can adjust the values of an Auto Scaling Group, and there are three ways that you can scale Auto Scaling Groups:
1. The first is not really a policy at all, it’s just to use Manual Scaling. This is where you manually adjust the values at any time and the Auto Scaling Group handles any provisioning or termination that’s required.
2. Scheduled Scaling: which is great for sale periods where you can scale out the group when you know there’s going to be additional demand or when you know a system won’t be used so you can scale in outside of business hours. Scheduled Scaling adjusts the desired capacity based on schedules and this is useful for any known periods of high or low usage.
3. Dynamic scaling: three sub-types: (all are rules which react to something and change the values on an Auto Scaling Group):
A. Simple Scaling: Most commonly a pair of rules: one to provision instances, and one to terminate instances. You define a rule based on a metric, something like CPU utilization. Some metrics need the CloudWatch agent to be installed and you can also use some metrics not on the EC2 instances, for instance maybe the length of an SQS queue, or a custom performance metric in your application such as response time.
B. Stepped Scaling: Similar to Simple but you define more detailed rules. This allows you to act depending on how out of normal the metric value is. Stepped scaling allows you to act quicker the more extreme the changing conditions. Stepped Scaling is almost always preferrable to simple scaling excpet when your only priority is simplicity.
C. Target Tracking: This takes a slightly different approach. It lets you define an ideal amount of something, and then the group will scale as required to stay at that level provisioning or terminating instances to maintain that desired amount or that target amount. Not all metrics work for target tracking, but some that do are average cpu utilization, average network in, average network out, and request count per target, which is relevant to application load balancers.
There’s also a configuration on an Auto Scaling Group called a cooldown period, a value in seconds. It controls how long to wait at the end of a scaling action before doing another. It allows Auto Scaling Groups to wait and review chaotic changes to a metric and can avoid costs associated with constantly adding or removing instances, because there is a minimum billable period, since you’re billed for at least the minimum time every time an instance is provisioned regardless of how long you use it for.
Auto scaling groups also monitor the health of instances they provision; by default this uses the EC2 status checks. So if an EC2 instance fails EC2 detects this, passes this on to the Auto Scaling Group, and then the Auto Scaling Group terminates the EC2 instance. Then it provisions a new EC2 instance in its’ place; this is known as self-healing and it will fix most problems isolated to a single instance. The same would happen if we terminated an instance manually: the Auto Scaling Group would simply replace it.
There’s a trick with EC2 and Auto Scaling Groups: if you create a launch template which can automatically build an instance then create an Auto Scaling Group using that template, set the Auto Scaling Group to use multiple subnets in different Availability Zones, then set the Auto Scaling Group to use a minimum of one, maximum of one, and a desired of one, then you simple instance recovery. The instance will recover if it’s terminated or if it fails, and because Auto Scaling Groups work across Availability Zoens, the instance can be re-provisioned in another Availability Zone if the original one fails. It’s cheap, simple and effective high availability.
Auto Scaling groups are great on their own, but their real power comes from their ability to integrate with Load Balancers. Instead of statically adding instances or other resources to a target group, you can use an Auto Scaling Group configured to integrate with the target group; as instances are provisioned within the Auto Scaling Group, then they’re automatically added to the target group of that load balancer. As instances are terminated by the Auto Scaling Group, then they’re removed from that target group. This is an example of elasticity, because metrics which measure load on a system can be used to adjust the number of instances. These instances are effectively added as load balancer targets and any users of the application, because they access via the Load Balancer are abstracted away from the individual instances and they can use the capacity added in a very fluid way.
What’s even better is that the Auto Scaling Group can be configured to use the Load Balancer health checks rather than EC2 status checks. Application Load Balancer checks can be much richer, they can monitor the state of HTTP or HTTPS requests. Because of this, they’re application aware which simple status checks That EC2 provides are not. You do need to use an appropriate Load Balancer health check. If your application has some complex logic within it and you’re only testing a static HTML page then the health check could respond as OK, even though the application might be in a failed state. Inversely, if your application uses databases and your health check checks a page with some database access requirements, if the database fails, then all of your health checks could fail, meaning all of your EC2 instances will be terminated and re-provisioned when the problem is with the database not the instances. So, you have to be really careful when it comes to setting up health checks.
Next we looked at scaling processes within an Auto Scaling Group. You have a number of different processes or functions performed by the Auto Scaling Group, and these can be set to either be suspended or they can be resumed. First we’ve got Launch and Terminate, and if Launch is set to suspend then the Auto Scaling Group won’t scale out if any alarms or schedule actions take place. The inverse is if Terminate is set to suspend, then the Auto Scaling Group will not terminate any instances. We’ve also got AddToLoadBalancer, and this controls whether any instances provisioned are added to the Load Balancer. Next we’ve got Alarm Notification, and this controls whether the Auto Scaling Group will react to any CloudWatch alarms. There is also AZRebalance, and this controls whether the Auto Scaling Group attempts to redistribute instances across Availability Zones. There is HealthCheck and this controls whether instance health checks across the entire group are on or off. There is also ReplaceUnhealthy, which controls whether the Auto Scaling Group will replace any instances marked as unhealthy. There is ScheduledActions, which controls whether the Auto Scaling Group will perform any scheduled actions or not. In addition to those you can set a specific instance to either be in ‘standby’ or ‘InService’ and this allows you to suspend any activities of the Auto Scaling Group on a specific instance. This is really useful if you need to perform maintenance on one or more EC2 instances, you can set them to standby and that means they won’t be affected by anything that the Auto Scaling Group does.
Some specific points of consideration:
Autoscaling Groups are free
Only the resources created are billed
Use cool downs to avoid rapid scaling
Thing about more, smaller instances – granularity
Use with ALB’s for elasticity – abstraction (Application Load Balancers)
ASG defines when and where, LT defines what