Adrian Cantrill’s SAA-C02 study course, 60 minutes: Aurora Global Database & Multi-Master Writes
Aurora Global Database:
Allows to create global level replication using Aurora from a master region to up to five secondary AWS regions. Global databases introduce the idea of secondary regions, which can have up to 16 read only replicas. The replication from the primary region to secondary regions, that occurs at the storage layer, and replication is typically within one second from the primary to all of the secondaries. Applications can use the primary instance in the primary region for write operations, and then the replicas in the primary or the replicas in the secondary regions for read operations.
They are great for cross-region disaster recovery and business continuity, and because of the once-second replication time, RTO and RPO are greatly enhanced. They are also good for global read scaling (low latency performance improvements: ~1s or less replication between regions), also replications has no impact on DB performance. Secondary regions can have 16 replicas, any of which can be promoted to read/write in the event of a catastrophe. Currently, the maximum number of regions is five.
Multi-Master Writes
Allows an Aurora cluster to have multiple instances which are capable of performing reads and writes. The default mode for Aurora is one writer and many readers. To review, the default Aurora mode is known as Single-Master, and this equates to one read-write instance, plus zero or more read only replicas. An Aurora cluster running as master has a number of endpoints that interact with the database. There is the cluster endpoint, which can be used for read or write operations, and a read endpoint that’s responsibility for load balancing reads across any of the read only replicas inside the cluster. Also, in single master mode, failover takes time, a replica needs to be promoted from read-only mode to read/write mode. In multi-master mode, all the instances are Read/Write by contrast- all instances are capable of read/write operations.
A muli-master Aurora cluster has the same cluster structure as a single-master cluster, the same shared storage, and also multiple Aurora provisioned instances exist in the cluster.
The differences start with the fact that there is no cluster endpoint to use, an application being responsible for connecting to instances within the cluster. There is no load balancing across instances in a multi-master cluster. The application connects to one or all of the instances in the cluster and intiates operations directly. There is no concept of a load-balanced endpoint for the cluster; an application can initiate connections to one or both of the instances inside a multi-master cluster.
The way this works is that when one of the read-write modes inside a multi-master cluster receives a write request from the applications, it immediately proposes that data be committed to all of the storage nodes in that cluster, proposing that the data it receives to write is committed to storage. At this point each node that makes up a cluster either affirms or rejects that proposed change. It rejects it if it conflicts with something already in flight, something like another change from another application writing to another read-write instance inside the cluster. The write instances is looking for a quorum of nodes to agree, a quorum of nodes that allow it to write data. At that point it can commit the change to the shared storage. If the quorum rejects it then it cancels the change with that application; it generates an error.
Assuming it can get a quorum to agree to a write, the write is committed to storage and it’s replicated across every storage node in the cluster, just as it is with a single-master cluster. But with a multi-master cluster, that change is then replicated to other nodes in the cluster. This means those other writers can add the updated data to their in memory caches. This means that any other reads from any other instances in the cluster will be consistent with the data that’s stored on shared storage. Because instances cache data, we need to make sure in addition to committing it to disk, it’s also updated inside any in-memory caches of any other instances within the multi-master cluster. That’s what this replication does.