Adrian Cantrill’s SAA-C02 study course, learn.cantrill.io, 90 minutes:
Relational Database Service (‘RDS’) section: ‘Database Refresher [Part 1]’, ‘Database Refresher [Part 2]’, ‘ACID vs. BASE’, ‘Databases on EC2’, ‘ACID vs BASE’, ‘Databases on EC2’
CSA CCSK Exam, 20 minutes: CCSK Security Guidance, domain 2, Governance and Enterprise Risk Management
Database Refresher Part 1: Database: A system which stores and manages data. There are a number of different types of databases, and there crucial differences among them between how data is physically stored on disk and how it’s managed on disk and in memory, as well as how systems retrieve data and present it to the user. Databases are broadly split into relational and non-relational. Relational are SQL databases generally. SQL (Structured Query Language) is a language used to store, retrieve, and update data, and is a feature of most relational database platforms. SQL is a different term than ‘relational management database system’, the general name for these types of databases, though the two are often used interchangeably. Relational databases employ a rigid schema to structure data. It is defined in advance before any data is put into the system, which renders them inadequate for data with rapidly changing relationships. The schema defines names of things, valid values of things, and types of data which are stored when and where, along with fixed relationships between tables. A table is the key component of any SQL-based database model. A table is comprised of attributes, which are columns in the table. This is comprised of the name of the column. There is also an attribute name, which is located within each row of that column; an attribute value, which defines every piece of data. Data that relates together is stored together. There is also the primary key, which is unique in the table, and every row in the table has to have a unique value for the primary key attribute. Because SQL systems are relational, you can generally define relationships between tables. The defined relationship is called a join table, which includes a composite key. The composite key is created by joining keys of different tables together. Each composite key must be different.
Database Refresher Part 2: NoSQL: A database model defining everything which doesn’t fit into the SQL mold. This is a general model, not one single thing, with a generally much more relaxed concept of a schema. Schema types being referred to as generally weak schemas or no schemas. Also, relationships between tables handled differently, with both types impacting situations that their model is right for. Each table in RDBMS can have different attributes, and for a particular table, every row in the table needs to have a value stored for every attribute in that table. This is known as a key-value database; it consists of sets of keys and values. For noSQL databases there is no concept of structure; as long as every single key is unique, value doesn’t matter. There is no real schema or structure. With this model you can create tables, but there is no real link between keys and values. This makes noSQL databases really fast to scale and modify and great for in-memory caching. Wide-Column store: Each row or item has one or more keys. One is called the partition key. The secondary key called the sort or range key. Every item in the table has to have the same key layout. This offers groupings of items called tables, but not the same types of tables as in relational database products. They are mostly just groupings of data. Every item in table can have attributes, but the attributes don’t have to be the same between items. Every can have any attribute, as there is no schema on the attribute side; the only requirement is that every item inside the table must use the same key structure, and the noSQL database has to have a unique key. For either the single key or composite key, the values have to be unique. Document database: This is another type of noSQL database. It is designed to store and query data as documents, and the documents are generally formatted using a structure like json or xml. Like key-value store, each document is interacted with via an id that is unique to each document, but the value- the document contents- are exposed to the database, allowing for interaction; Document databases are best for orders, collections, or contact style databases where you generally interact with the data as a document. Document databases are great for interacting with deep attributes – nested items within the document structure, great for catalogs, user profiles, lots of different content management systems, where each document is unique but changes over time. Document databases may be linked in hierarchical structures or when linking different pieces of content in a content management system. Each document has a unique ID, and each database has access to the structure inside the document. Document databases provide flexible indexing, allowing for running really powerful queries against data that could be nested. Column databases: To be able to understand column databases, we must understand limitations of row databases- what most SQL based databases use. Row-based databases interact with data based on rows: if you need to find information, you must query the entire row, sometimes going through rows and rows of data. Row databases are ideal online transaction processing databases, which are systems performing transactions: order databases, contacting databases, stock databases, things which deal in rows and items. Column databases store data on disk in columns. Column databases are very inefficient for transaction style processing, but great for reporting. Graph style databases. With Graph style databases, relationships between things are formally defined and stored in the database along with the data, so the relationships are not calculated each and every time a query is run. Graph databases are great for relationship driven data (social media or HR systems). Graph databases are comprised of nodes and edges: nodes are the things in the graph style database, and edges are the relations between them. Relationships can have attached data, which are in themselves key-value pairs. Graph databases can store a massive amount of complex relationships between data or between nodes inside a database, and are ideal for social media or systems with complex relationships.
ACID vs BASE: These are two database transaction models, and they define things about transactions to and from the database. This governs how the database system itself is architected. At the foundational level, these models are underpinned by the CAP theorem: Consistency, Availability, Partition Tolerance. This theorem states that it is widely impossible to implement a database model which implements all three, and that any database system is capable of delivering a maximum of two of these different factors. Each item is defined as follows: Consistency: Every read to a database will get the most recent write or will get an error. Availability: Every request will receive a non-error response but without guarantee that it contains the most recent write. Partition Tolerance: System can be made of multiple network partitions, and the system continues to operate even if there are a number of dropped messages or errors. If database system which has multiple nodes and a network is involved, the options are to either choose to provide consistency or availability. This brings us to the ACID and BASE database models. Broadly speaking, ACID = consistency, while BASE = availability. ACID: Atomic Consistent Isolated Durable: This generally refers to RDS databases., and limits ability of database to scale. Atomic: all parts of transaction are successful or none are. Consistent: transactions move the database from one valid state to another: nothing in between is allowed. Isolated: If multiple transactions occur at once, they don’t interfere with each other; each executes as if it’s the only one. Durable: Once committed, transactions are durable, stored on non-volatile memory, and resilient to power outages or crashes. BASE: Basically Available, Soft state, Eventually consistent. Basically Available: read and write operations are availabile ‘as much as possible’, but without any consistency guarantees. Soft state: the database doesn’t enforce consistency: this is offloaded onto the application/user. Eventually consistent: if we wait long enough, reads from the system will be consistent. BASE systems are highly scalable and can deliver high performance, because we don’t have to worry about things like consistency, which is offloaded to applications. BASE usually means noSQL-style database; ACID usually means RDS-style database. Exam tip: if noSQL or DynamoDB is mentioned with ACID, this might be referring to DynamoDB transactions.
Databases on EC2: Running Databases directly on EC2 is considered to be bad practice. There are lots of other products which provide database services. Databases on EC2 are often comprised of a single instance;, and the instance is running an application of some kind and maybe a web server, or maybe two instances, with the webserver and application on one instance, the database on the other instance. Single instance architecture always runs in one AZ. Dual EC2 architecture: either both instances run in one AZ, or each instance run in it’s own AZ. The dual instance architecture introduces dependency: reliable communication between EC2 instances. Also, for instances in multiple AZ’s there is a cost for data transfer. Why you might run database on EC2 instance: access to db instance os; advanced db option tuning (dbroot); vendor demands this; aws doesn’t provide db or db version; aws doesn’t provide a specific architecture (replication/resilience); decision makers who just ‘want it’. Why you shouldn’t run a database instance on EC2: admin overhead: managing EC2 and DBhost; backup/dr management; EC2 is single AZ; features: some of AWS db products are amazing; EC2 is on or off: no serverless, no easy scaling; replications: skills, setup time, monitoring & effectiveness; performance: aws invests time into optimisation and features;
CSA CCSK exam: CCSK security guidance, Domain 2, ‘Governance and Enterprise Risk Management’: Cloud computing affects governance: introduces 3rd party into process (public or hosted private cloud)/potentially alters internal governance structures (self-hosted private cloud). It is important to remember that ‘an organization can NEVER outsource responsibility for governance, even when using external providers’. A CSP generally leverages economics of scale (manage costs and enable capabilities), creating extremely standardized services (govt. contracts and sla’s consistent across all customers). Cloud computing changes RESPONSIBILITIES and MECHANISMS for implementing and managing governance (defined in contract). If this is not defined, a governance gap is created, and adjustments are needed to close gap.
Tools of cloud governance: Contracts: primary tool to extend governance into business partners and providers; supplier assessments: performed by potential cloud customer using available information and allowed processes/techniques; very similar to any supplier assessment; compliance reporting: includes all documentation on provider’s internal and external compliance assessments; audit of controls: third-party preferred since they provide independent validation; sometimes only available under NDA or to contracted customers; assessments and audits: based on existing standards; critical to understand scope (what is being assessed and which controls are assessed)
Enterprise Risk Management: Overall management of risk for organization, which defines roles and responsibilities for risk management between cloud provider and cloud customer. You can never outsource overall responsibility and accountability for risk management to 3rd party. This is based on the shared responsibilities model: the cloud provider accepts some level of responsibility for risk, while the rest is left to the customer to manage. This is especially relevant to service models (provider manages more risk in SaaS, consumer more risk in IaaS). The cloud user is ultimately responsible for ownership of risks, and only passes some RISK MANAGEMENT to cloud provider. This is even true with self-hosted private cloud, with some risk management being passed to internal cloud provider. This can be more clearly defined via internal slas/procedures. ERM requires good contracts/documentation to define division of responsibilities and potential for untreated risk. Governance is nearly always focused on contracts, and risk managment delves deeper into technology and processing capabilities of provider. This is generally based on documentation. Risk tolerance is defined as the amount of risk leadership and stakeholders of an organization are willing to accept, and varies based on asset. The assessment should align with value and requiremens of assets. Over time, it can be important build out matrix of cloud services along with which types of assets allowed in those services. Ultimately, moving to cloud changes how risk is managed.