Adrian Cantrill’s SAA-C02 study course, 50 minutes: RDS section: ‘DMS’
Database Migration Service:
Extensively used in the database arena. Database migrations are complex things to perform. If you exclude the vendor tooling which is available, it’s a manual process end to end. It usually involves setting up replication, which is pretty complex, or taking a point in time backup and restoring this to the destination database. The question can arise regarding how to handle changes that occur between taking the backup and when the new database is live? How do you handle migrations between different databases?
These are all situations where DMS comes in handy. It’s essentially a managed database migration service. It starts with a replication instance which runs on EC2, this instance runs one or more replication tasks, you need to define a source and destination endpoints, which point at the source and target databases. One of the only real restrictions with the service is that one of the endpoints must be running within AWS. You can’t use the product for migrations between two on-premises databases.
Architecturally, you start with a source and a target database, and one of those needs to be within AWS. The databases themselves can use a range of compatible engines such as MySQL, Aurora, MSSQL, MariaDB, MongoDB, PostgreSQL, Oracle, Azure SQL, and many more. In between these, conceptually, is the database migration service known as DMS, which uses a replication instance, essentially an EC2 instance with migration software and the ability to communicate with the DMS service. On this instance you can define replication tasks, and each of these replication instances can run multiple replication tasks. Tasks define all of the operations relating to the migration, but architecturally, two of the most important things are the source and destination endpoints, which store the replication information so that the replication instance and task can access the source and target databases. So, a task essentially moves data from the source database, using the details in the source endpoint to the target database using the details stored in the destination endpoint configuration. And the value in DMS comes in how it handles those migrations.
Jobs can be one of three types. We have full load migrations and these are used to migrate existing data. So if you can afford an outage long enough to copy your existing data, then this is a good one to choose. This option simply migrates the data from your source database to your target database and it creates the tables as required. Next we have full load plus CDC, and this stands for change data capture. This migrates existing data and replicates any ongoing changes. This option performs a full load migration and at the same time it captures any changes occurring on the source. After the full load migration is complete, then captured changes are also applied to the target. Eventually the application of changes reaches a steady state, and at this point you can shut down your applications, let the remaining changes flow through to your target, and then restart your applications and point them at the new target database. Lastly, there’s CDC only, and this is designed to replicate only data changes. In some situations it might be more efficient to copy existing data using a method other than AWS DMS. And so, certain databases such as Oracle have their own export and import tools. In these cases it might be more efficient to use those tools to migrate the initial data and then use DMS simply to replicate the changes starting at the point when you do that initial bulk load. So CDC only migrations are actually really effective if you need to bulk transfer the data in some way outside of DMS. Now, lastly, DMS doesn’t natively support any form of schema conversion but there is a dedicated tool in AWS known as the Schema Conversion Tool or SCT, and the sole purpose of this tool is to perform schema modifications or schema conversions between different database versions or different database engines. So this is a really powerful tool that often goes hand in hand with migrations which are being performed by DMS. Now, DMS is a great tool for migrating databases from on-premises to AWS. It’s a tool that you will get to use for most larger database migrations. So, as a solutions architect it’s another tool which you need to understand end to end. In the exam, if you see any form of database migration scenario, as long as one of the databases is within AWS, and as long as there are no weird databases involved, which aren’t supported by the product, then you can default to using DMS. It’s always a safe, default option for any database migration questions. If the question talks about a no downtime migration, then you absolutely should default to DMS.
Let’s look at a few aspects of DMS which are important. First, let’s talk about the Schema Conversion Tool (SCT) in more detail. This is actually a standalone application, which is only used when converting from one database engine to another. It can be used as part of migrations where the engines being migrated from and to aren’t compatible. And another use case is that it can be used for larger migrations. When you need to have an alternative way of moving data between on-premises and AWS, rather than using a data link. Now, SCT is not used for movements of data between compatible database engines. For example, if you’re performing a migration from an on-premises MySQL server to an AWS based RDS MySQL server, then the engines are the same, even though the products are different, and so SCT would not be used. SCT works with OLTP DB types (MySQL. MSSSQL, Oracle), and OLAP databases (Teradata, Oracle, Vertica, Greenplum, etc…). Examples of the types of situations where the schema conversion tool would be used include things like on-premises MSSQL through to AWS RDS MySQL migrations because the engine changes from MSSQL to MySQL. And we could also use SCT for an on-premises Oracle to AWS based Aurora database migration, again, because the engines are changing.
There is another type of situation where DMS can be used in combination with SCT, and that’s for larger migrations. So DMS can often be involved with large scale database migration, things which are multi-terabytes in size, and for those types of projects it’s often not optimal to transfer the data over the network. It takes time and it consumes network capacity that might be used heavily for normal business operations. So DMS is able to utilize the Snowball range of products, which are available for bulk transfer of data into and out of AWS. So you can use DMS in combination with Snowball, and this actually uses the schema conversion tool. Here is a rundown of the process:
Step 1: Use SCT to extract data locally and move to a snowball device
Step 2: Ship the device back to AWS. They load onto an S3 bucket.
Step 3: DMS migrates from S3 into the target store
Step 4: Change Data Capture (CDC) can capture changes, and via S3 intermediary they are also written to the target databases
DMS will transfer data over the network, it can transfer data over direct connect or a VPN or even a VPC peer. But if the data volumes that you are migrating are bigger than you can practically transfer over your network link, then you can order a Snowball and use DMS together with SCT to make that transfer much quicker and more effective.
The rule to remember for the exam is that SCT is only used for migrations when the engine is changing. And the reason why SCT is used here is because you’re actually migrating a database into a generic file format, which can be moved using snowballs. And so this doesn’t break the rule of only doing it when the database engine changes because you are essentially changing the database, you’re changing it from whatever engine the source uses, and you’re storing it in a generic file format for transfer through to AWS on a snowball device.