Summary of my notes on S3
S3 Security
- Controlled via a combination of identity policies, bucket policies (resource policies)
and Legacy bucket and object ACLs (Access Control Lists) - Avoid ACLs whenever possible: they are legacy and discouraged by AWS
- S3 is private by default
- The account roor user is the only identity that has access to a given S3 bucket
- S3 Bucket Policy is a resource policy; similar to an identity policy but attached to a
bucket; this provides a resource persepctive on permissions - Identity policy: Controls what that identity can access (can only attach to identities
in your own account) - Resource policy: Controls who can access that resource (allows access from the same account
or a different account and can allow or deny ANONYMOUS Principals) - There can only be one bucket policy per bucket, but there can be multiple statements
- ACL (Access Control List): Applicable to both objects and bucket
- ACLs are inflexible and have limited permissions (5 total)
- Block Public Access is a newer addition to S3 security
- Summary: Identity (Controlling different resources; preference for IAM; same account)
Bucket (Just controlling S3; anonymous or cross-account)
ACL (never unless absolutely have to)
S3 Static Hosting
- Normal access is via AWS APIs
- Static website hosting enables access through https://
- Index and Error documents are needed
- A website endpoint is created
- If a custom name is needed, this must be done via route 53
- static website hosting is great for offloading and out of band pages
Object Versioning and MFA Delete
- Object versioning can be enabled on S3 buckets
- allows the bucket to store multiple versions of the same object
- with versioning a new version of an object would completely replace the old version
- modifications of an object create a new version and leave the old version in place
- this is completed through creating a new id and attaching that to the new version of the
object - objects referenced by version ID to interact directly
- objects not deleted; object delete markers attached to hide objects
- delete markers can be removed to restore access to all versions of an object concurrently
- versions of an object can be fully deleted
- Once versioning is enabled it can be suspended but not disabled
- this is because of the use of delete markers to hide but not remove objects
- MFA delete relates to object versioning
- this is required to change bucket state (versioning enabled or suspended)
- also required for object deletions with versioning enabled
S3 performance optimization
- S3 uploads usually works via AWS API PUTOBJECT calls to upload objects
- There is normally a singled data stream to S3
- If the stream fails the upload fails; this requires a full restart
- this also creates a limit of 1 stream of transfer at a time
- Multipart upload increase speed an reliability
- 100 mb minimum
- data is broken up into a maximum of 10,000 parts, 5mb -> 5gb
- the last part of the data can be smaller than 5gb
- transfer parts can fail and be restarted
- the transfer rate = speed of all parts being uploaded
- Transfer Acceleration is another optimization feature
- Uses Edge locations located globally
- s3 bucket needs to be enabled to allow Transfer Acceleration
- Edge locations transfer data over AWS global network
Encryption
- two main approaches
- encryption at rest; designed to prevent against phyical theft, like data stored on a
hard drive - encryption in transit; for protecting data when it’s travelling between two places
- key: something like a password; used by algorithms to encrypt plaintext
- Symmetric encryption: uses same key for encryption and decryption
- Asymmetric encryption: uses a different key for encryption and decryption
- signing: method of using encrypting key to verify origin of encrypted data
- steganography: method of hiding data in other data to anonymize transmission
KMS
- Key Management Service
- not a part of S3 specifically, but used for encryption in AWS
- regional & public service
- used for creation, store and manage keys
- creates symmetric and asymmetric keys
- cryptographic operations
- keys never leave KMS
- provides FIPS 140-2(L2) Encryption
- manages customer master key (CMK)
- CMK is logical – contains ID< date, policy, descritpion and state
- CMK is backed by physical material which can be generated or imported
- CMKs can be used for up to 4KB of data
- CMKs never persist in an unencrypted state; KMS encryps key before storing
- KMS allows for role separation; enables different levels of permission for different IAM users
- Data Encryption Key (DEK): generated by KMS using CMK
- works on encrypting data greater than 4KB
- KMS does not store DEK
- Encrypt data using the plaintext key
- plaintext key is discarded
- encrypted key is stored on disk with data
- Key is passed through KMS to decrypt
- Decrypted key is used to decrypt data
- CMKs are isolated to a region; never leave
- AWS manager or Customer managed CMKs: two types of keys
- Customer managed keys are more configurable (edit key policy)
- CMKs support rotation: 3 years for AWS, 1 year for customer
- CMK stores backing key and previous backing key
- Alias: shortcut to particular CMK; regional like CMK
- Key policy: starting point for security
- Every CMK has a key policy
- IAM trusted by account, account trusted by key
Object Encryption
- Buckets are not encrypted; objects in buckets are
- two main methods: server-side and client-side (both at rest)
- client-side: data encrypted by client (you own keys, process, tooling)
- server-side: unencrypted by client, encrypted by s3 endpoint (keys, process, tooling handled by s3)
- Two components to server-side: encryption and generation and management of keys
- server-side encryption: three types: SSE-C, SSE-S3, SSE-KMS
- SSE-C: Server-Side Encryption with Customer-provided keys
(customer provides keys, SSE handles encryption and decryption; customer provides object and key) - SSE-S3: Server-Side Encryption With Amazon S3-Managed Keys
(AWS handles encryption/decryption and key management) - 3 problems: lack of regulation, key rotation, and role separation
- SSE-KMS: Server-Side Encryption With Customer Master Keys Stored in AWS KMS
(AWS handles encryption process and keys; key creation handles by KMS) - Allows for S3 to handle processes but customer manages keys; overcomes three problems
- SSE-S3 uses AES256 encryption
- x-amz-server-side-encryption header invoked for server-side encryption
- default bucket encryption overrides default
S3 Object Storage Classes
- S3 standard: default – 3 availability zones – 11 9’s of durability
-md5 checksums – crc’s; gb/m fee for data stored; $ per gb transfer out, price per
1000 requests, no retrieval fee, no minimmum duration, no minimum size - no penalties, no frills
- millisecond 1st byte latency
- should be default
S3 standard-IA
- most of characteristics of s3
- much cheaper than S3 standard
- retrieval fee
- minimum duration charge 30 days
- minimum 128 kb
- long-lived data that important but infrequently accesses
S3 standard one zone-IA
- same as standarad-IA but only through one zone
- non-critical, long-lived data that is replaceable and infrequently accessed
S3 Glacier
- same 3-zone
- same durability
- 1/5th price of s3 standard
- cold objects
- retrieval process for retrieving objects
- stored in IA temporarily and then removed
- expedited: 1 to 5 minutes
- standard: 3 to 5 hours
- bulk: 5 – 12 hours
- faster = more expensive
- first byte latency = minutes or hours
- 40 kb minimum
- 90 day minimum
S3 Glacier deep archive
- 1/4th size Glacier
- standard: 12 hours
- bulk: up to 48 hours
- first byte latency : hours or days
s3 intelligent tiering
- frequent access
- infrequent access
- archive
- deep archive
- pricing corresponds to s3 similar storage
- changing storage happens automatically based on
- monitoring and automation per 1000 objects
- long-lived data (30-day)
- dynamic/irregular access patterns for objects
S3 Lifecycle Configuration
- a set of rules that consist of actions
- applied to a bucket or a group of objects
- transition actions- change storage class
- expiration actions- delete objects after period of time
- rules not based on access
- transition only flow down
- smaller objects can incur higher costs when transitioned
- minimum of 30 days in S3 standard first
- if creating a rule to transition from standard to infrequent to glacier/deep archive
S3 Replication
- configure replication of objects between source and destination s3 bucket
- cross-region CRR)
- same-region (SRR)
- architecture only differs if replication in same or different AWS accounts
- IAM role configured in creation of replication process
- in one account, both buckets trusted
- in different accounts, role created not trusted by destination account
- resource policy needed to be placed on destination buckets
- all objects or selection of them
- can pick which storage class – default to use same class on source and destination
- objects owned by source bucket account by default
- replication time control – adds 15 minute sla
- replication not retroactive
- source and destination must have versioning enabled
- one-way replication (source to destination)
- unencrypted
- encrypted SSE-S3 and KMS, but not SSE-C
- Source bucket owns objects by default
- no system events, glacier, and glacier deep archive replicated
- no deletes replicated
- SRR log aggregation, prod and test accts, resilience for sovereignty
- CRR – global resiliency
- CRR – latency
s3 PreSigned URLs
- give person/app access to an object in a bucket in a safe and secure way
- IAMADMIN sends AWS a request to generate a pre-signed url along with information
defining parameters for the presigned url - holder of presigned url interacts with AWS basically as the IAM user who created it
so with the same level of access, permissions, etc… - usable with GET and PUT APIs
- can keep bucket private and create an IAM user to access objects in that bucket
- used for offloading media and also for serverless architectures
- can be used to access an object in a private S3 bucket with the access rights of the
identity which generates them - They’re time-limited and they encode all of the authentication information needed
inside - can be used to upload and download
- can create a presigned url that you have no access to
- permissions match the identity which generated it as they are now
- don’t generate presigned urls with temp credentials
S3 Select and Glacier Select
- ways to retrieve parts of objects
- create sql-like statement to select a specific part of object
- csv, json, parquet, bzip2 compression
- up to 400% faster
- up to 80% cheaper
S3 Events
- allows to create event notifications
- when enabled a notification is generated when a thing occur
- SNS, SQS, Lambda functions
- Object Created
- Object Deleted
- Object Restore
- Replication
S3 Access Logs
- Enable logging via console UI or via put bucket logging
- bucket acl allows ‘s3 log delivery group’
- log records, newline-delimited, attributes space-delimited