Adrian Cantrill’s SAA-C02 Study course, 90 minutes: Serverless and Application services section: SQS, Kinesis product family Cognito
Simple Queue Service
– Public Fully Managed, Highly-Available Queues – Standard or FIFO
– Includes VPC’s, if they have connectivity to the services
– FIFO queues guarantee an order
– Standard queues are ‘best efforts’: messages could be received out of order
– Messages up to 256KB in size – for larger sizes, store the data somewhere and link to it inside the message
– The basic architecture is: some clients send to the queue, other clients poll the queue
– polling is the process of looking for messages
– After messages are polled and received, they aren’t deleted from the queue; they’re hidden for a period of time; this is known as a ‘visibility timeout’
– the visibility timeout is the amount of time that a client can take to process a message in some way
– this helps ensure fault tolerance: if the message is not explicitly deleted, the client handles the default action of placing the message back in the queue so it can be accessed again by a different client (in the event of client failure)
– Dead-letter queues can be used for problem messages
– ASG’s can scale and Lambdas invoke based on queue length
– Standard: at-least once, FIFO: exactly-once
– FIFO (performance) 3,000 messages per second with batching, or up to 300 messages per second without; scaling is limited
– standard queues can scale almost infinitely
– billed based on ‘requests’
– 1 request = 1-10 messages up to 256kb total
– two types of polling: short (immediate) vs Long (waitTimeSeconds: up to 20 seconds)
– Long polling is the preferred method: because the billing is based on requests, short polling can become very expensive very quickly
– Encryption at rest (KMS) & in-transit
– Access to a queue is controlled via identity policies or queue policies
– both control access to a queue from the same account
– only queue allow access to a queue from external accounts
– a queue policy is a resource policy similar to the type used on s3 buckets or sns topics
Kinesis Data Streams
– Kinesis is a scalable streaming service
– Kinesis ingests lots of data from lots of applications
– producers send data into a kinesis stream
– Streams can scale from low to near infinite data rates
– public service & highly available by design: no need to worry about replication or availability; everything is presented as a service
– streams store a rolling 24-hour window of database; storage is included
– Kinesis supports multiple producers pushing data into the stream
– Kinesis supports multiple consumers reading data from the stream
– Consumers can access the data from anywhere in the 24-hour window
– Conumers can access the data at different levels of granularity
– The Kinesis stream scales via a ‘shard architecture’
– The stream starts with a single shard
– As the data flow increases, additional shards are added to the stream
– Each shard allows for 1 MB ingestion and 2 MB consumption per second
– More shards equals higher cost and better performance
– The data window length also increases cost
– The default 24 hour window can be increased up to 7 days
– Data is stored via Kinesis data records
– Records are stored across shards; scaling is linear
SQS vs KINESIS
– Ingestion of data at scale or large throughput = KINESIS
– Worker pool decoupling or asynchronous communication = SQS
– SQS = 1 production group, 1 consumption group
– SQS = Decoupling and Asynchronous communications
– SQS = No persistence of messages, no window
– Kinesis = huge scale ingestion
– Kinesis = multiple consumers, rolling window
– Kinesis = Data ingestion, Analytics, Monitoring, App Clicks
Amazon Kinesis Data Firehose
– Designed to cope with large amounts of streaming data ingestion, consumption and management within AWS
– Kinesis does not offer any way to persist data; it’s only designed for ingestion and consumption
– Once records in Kinesis age past the end of the rolling window, they’re gone forever
– Data Firehose = fully managed service to load data for data lakes, data stores, and analytics services
– Data Firehose = lets data be persisted beyond the rolling window of Kinesis data streams
– Automatic scaling; Fully serverless; resilient
– Near real time delivery (~60 seconds)
– Supports transformation of data on the fly using Lambda; this can add latency depending on the complexity of data
– Firehose = pay as you go, billing based on data volume
– Firehose delivers data to pre-defined valid endpoints:
– HTTP
– Splunk
– Redshift
– ElasticSearch
– S3
– For Redshift, Firehose sends data to an intermediate S3 bucket and then a Redshift copy is sent to Redshift from the bucket
– For the rest, data is transmitted directly from Firehose
– Firehose can directly accept data from producers or a Kinesis data stream
Kinesis Data Analytics
– This is a real-time data processing product
– Kinesis Data Streams: allows large scale ingestion of data into AWS and the consumption of that data by other compute resources known as consumers
– Kinesis Data Firehose provides delivery services; it accepts data in, and then delivers it to supported destinations in near real time; can also use Lambda to perform data transformations as the data passes through
– Kinesis Data Analytics provides real time processing of data as the data flows through using SQL
– Data inputs in one side, queries are run on the data, and then data is output to destinations at the other side
– Kinesis Data Analytics ingests from Kinesis Data Streams or Firehose or static reference data from S3
– Supported Destinations:
– Firehose (S3, Redshift, ElasticSearch, Splunk); near real time
– AWS Lambda; real time
– Kinesis Data Streams; real time
– Destinations are external source; they exist outside Kinesis Data Analytics; sources are not modified in any way
Scenarios for using Kinesis Data Analytics
– Streaming data needing real-time SQL processing
– time-series analytics; real-time dashboards; real-time metrics
Amazon Cognito
– One of the core identity products available in AWS
– Amazon Cognito provides Authentication, Authorization, and user management for web/mobile apps
– Amazon Cognito is comprised of user pools and identity pools
User Pools
– User Pools: Sign in and get a JSON Web Token (JWT)
– The JWT can be used for authentication with applications, certain AWS products like API Gateway and can be accepted directly
– Most Amazon services cannot use JWT’s; actual AWS credentials are needed
– User Pools do not grant access to AWS services; they control sign-in and deliver a JWT (user directory management and profiles, sign-up & sign-in (customisable web UI), MFA and other security features)
– User Pools also allow social sign-in using identities provided by Facebook, Google, Amazon, Apple, and sign-in services using identity types such as SAML identity providers
– User Pools provide a joined-up user management experience
– User Pools cannot be directly used to access most AWS resources
Identity Pools
– Identity Pools: Allow you to offer access to Temporary AWS Credential
– Unauthenticated identities: guest users
– Federated identities: SWAP Google, Facebook, Twitter, SAML 2.0 & User Pool for short term AWS Credentials to access AWS resources
– User Pools and Identity Pools can work together (User Pool Identity obtaining temporary AWS credentials)
– Identity Pools assume an IAM role on behalf of the identity