Monday 11/29/21 AWS/Cloud update:

I took a break from my studies to deal with some needed life changes. Time to get everything back on track.

Adrian Cantrill’s SAA-C02 course, 60 minutes: RDS section: ‘Aurora Architecture

Aurora Architecture:

This is the managed database product found in AWS. Aurora is officially part of RDS, but it can easily be viewed as its own distinct product. The features that it uses and the architecture it uses to deliver those features are radically different from RDS.

To start, the Aurora architecture is very different from normal RDS. At its very foundation it uses the base entity of a ‘cluster’, which is something that other engines within RDS don’t have. A cluster is made up of a number of important things. From a compute perspective it’s made of a single primary instance, and then zero or more replicas. This is very different from the primary and standby replicas found within regular RDS. The replicas within Aurora can be used for reads during normal operations. The replicas inside Aurora can provide the benefits of both RDS multi-AZ and RDS read replicas. The can be inside a cluster and can be used to improve availability but also they can be used for read operations during the normal operation of a cluster. Because of this you don’t have to choose between read scaling and availability, as replicas inside Aurora can provide both of those benefits.

The second major difference in the Aurora architecture is its storage. Aurora doesn’t local storage for the compute instances, instead an Aurora cluster has a shared cluster volume. This is storage which is shared and is available to all compute instances within a cluster. This provides a few benefits such as faster provisioning, improved availability, and better performance. The storage is SSD based and can store up to 128 tib of data, along with six replicas across multiple availability zones. When data is written to the primary DB instance, Aurora synchronously replicates that data across all of the six storage nodes which are spread across the availability zones, which are associated with your cluster. All the instances inside your cluster, primary and replica, have access to all of these storage nodes. The important thing to understand from the storage perspective is that this replication happens at the storage level, so no extra resources are consumed on the instances or the replicas during this replication process. By default the primary instance is the only instance that can write to storage and the replicas and the primary can perform read operations. Aurora maintains multiple copies of your data in three AZ’s, the chances of losing data as a result of any disk related failure is greatly minimized. Aurora automatically detects failures in the disk volumes that make up cluster shared storage. When a segment or part of a disk volume fails, Aurora immediately repairs that area of disk. When Aurora repairs that part of a disk, it uses the data inside the other storage nodes that make up the cluster volume, and then automatically recreates that data and ensures that the data is brought back into an operational state with no corruption. As a result Aurora avoids data loss and it reduces any need to perform using time resources or snapshot restores to recover from disk failures. So the storage sub-system inside Aurora is much more resilient than that which is used by the normal RDS database engines.

Another powerful difference between Aurora and the normal RDS database engines is that with Aurora you can have up to 15 replicas, and any of them can be the failover target for a failover operation, so rather than having the one primary instance and the one standby replica of the non-Aurora engines, with Aurora you’ve got 15 different replicas you can choose to failover to and that failover operation will be much quicker because it doesn’t have to make any storage modifications.

As well as the resiliency that a cluster volume provides, there are a few other key elements. The cluster shared volume is based on SSD storage by default (high IOPS, low latency), it’s high performance storage by default. You don’t get the option of using magnetic storage. The billling with this storage is very different than with normal RDS engines. With Aurora you don’t have to allocate the storage that the cluster uses. When you create an Aurora cluster, you don’t specify the amount of storage that’s needed, storage is simply based on what you consume. As you store data up to the 128 Tib limit, you’re billed on consumption. Consumption is based on a ‘high water mark’, so if you consume 50 GiB of storage, you’re billed for 50 GiB of storage, If you free up 10 GiB of data, you’re still billed for that high water mark of 50 GiB, but you can reuse is any storage that you free up. You’re billed for the ‘high water mark’, the maximum amount of storage you’ve consumed in a cluster. And if you go through a process of significantly reducing storage and you need to reduce storage costs, then you need to create a brand new cluster, and migrate data from the old cluster to the new cluster. This high water mark architecture is being changed by AWS and this no longer is applicable for the more recent versions of Aurora. For now you can assume that this high water mark architecture is being used. Because the storage is for the cluster and not for the instances, it means replicas can be added and removed without requiring storage provisioning or removal, which massively improves the speed and efficiency of any replica changes within the cluster. Having this cluster architecture also changes the access method versus RDS. Aurora clusters like RDS clusters use an endpoint, so these are DNS addresses which are used to connect to the cluster. Unlike RDS, Aurora clusters have multiple endpoints that are available for an applicaion. At a minimum you have the cluster endpoint and the reader endpoint. The cluster endpoint always points at the primary instance and that’s the endpoint that can be used for read and write operations. The reader endpoint will also point at the primary endpoint if that’s all there is, but if there are replicas the read endpoint will load balance across all of the available replicas. This can be used for read operations. This makes it much easier to manage read scaling using Aurora versus RDS, because as you add additional replicas which can be used for reads, this reader endpoint is automatically updated to load balance across these new replicas.

You can also create custom endpoints, and in addition to that each instance, so the primary and the replicas, have their own unique endpoint. So Aurora allows for a much more custom and complex architecture versus RDS,


With Aurora, one of the biggest downsides, is that there isn’t actually a free-tier option. This is because Aurora doesn’t support the micro-instances that are available inside the free-tier. But for any instance beyond an RDS singleAZ (micro) Aurora offers much better value. For any compute there is an hourly charge and you’re billed per second with a 10-minute minimum. For storage you’re billed based on a gigabyte month consumed metric, also taking in the high water mark, so this is based on the maximum amount of storage that you’ve consumed during the lifetime of that cluster. Also, there is an IO cost per request made to the cluster shared storage. In terms of backups you’re given 100% of the storage consumption for the cluster in free backup allocation. So if your database cluster is 100 GiB then you’re given 100 GiB of storage for backups as part of what you pay for the cluster. So for most situations, where there is low to medium usage, unless you’ve got high turnover in data, unless you keep the data for long retention periods, in most cases you’ll find that the backup costs are included in the charge that you pay for the database cluster itself.

Other features:

Backups in Aurora work in much the same way as in RDS. For normal backups, manual backups, manual snapshot features, these all work in the same way as any other RDS engine. Also, restores will create a brand new cluster.

There are advanced features as well, including backtrack, which needs to be enabled on a per-cluster basis, and can adjust the window of availability. It will allow you to rollback your database to a previous point in time.

Also available is the fast clone, which allows you to create a brand new database from an existing database. Importantly, it doesn’t make a one-for-one copy of the storage for that database. What it does is it references the original storage, and only stores any differences between the two. 

Published by pauldparadis

Working towards cloud networking security as a profession.

Leave a Reply

Fill in your details below or click an icon to log in: Logo

You are commenting using your account. Log Out /  Change )

Google photo

You are commenting using your Google account. Log Out /  Change )

Twitter picture

You are commenting using your Twitter account. Log Out /  Change )

Facebook photo

You are commenting using your Facebook account. Log Out /  Change )

Connecting to %s

%d bloggers like this: