AWS Solution Architect(Associate) - Topic 1: Identity Access Management and S3

  • Key Terminology For IAM
  • What is S3 ?
  • How does data consistency work for S3 ?
  • S3 - Guarantees / Features / Storage Classes / Charges / Pricing Tiers / Security & Encryption / Version Control / Object Lock / Glacier Vault Lock / Performance / Select & Glacier Select
  • Exam Tips & REMEMBER TO READ FAQ

[toc]

IAM

Key Terminology For IAM

  • Users

    End Users such as people, employees and organizations etc.

  • Groups

    A collection of users. Each users in the group will inherit the permissions of the group.

  • Policies

    Policies are made up of documents, called Policy documents. These documents are in a format called JSON and they give permissions as what a user/group/role is able to do.

  • Roles

    You create roles and then assign them to AWS Resources. For example, you might give a virtual machine inside AWS as the ability to write files to S3 which is a type of storage within the AWS.

Exam Tips

  • IAM is universal. It doesn’t apply to regions at this time.
  • Always setup Multi-factor Authentication on your root account.
  • You can create and customize your own password rotation policies.

S3

What is S3

  • It’s Object-based storage(allows you to upload files.)
    • Key
    • Value
    • Version ID(Important for versioning)
    • Metadata
    • Subresources
      • Access Control Lists
      • Torrent
  • Files can be from 0 Bytes to 5 TB.
  • Files are stored in Buckets.
  • S3 is a universal namespace. That is, names must be unique globally.
  • HTTP 200 code if the upload wad successful.
  • Web Address

How does data consistency work for S3

  • Read after Write consistency for PUTS of NEW Objects. It means you write a new file and read it immediately afterwards, you will be able to view that data.
  • Eventual consistency for overwrite PUTS and DELETES(can take some time to propagate). If you update AN EXISTING file or delete a file and read it immediately, you may get the older version, or you may not. Basically changes to objects can take a little bit of time to propagate.

Guarantees

  • 99.9% availability
  • 11 9 *durability

Features

  • Tiered Storage Available
  • Lifecycle Management
    • Tier A to Tier B to Glacier
  • Versioning
  • Encryption
  • MFA Delete
  • Secure your data using Access Control Lists and Bucket Policies

Storage Classes

  • S3 Standard
  • S3 - IA (Infrequently Accessed)
    • requires rapid access when needed
    • Charged a retrieval fee
  • S3 One Zone - IA
    • Phased out version: RRS (S3 Reduce Redundancy Storage), still exists.
  • S3 - Intelligent Tiering
    • Designed to optimize costs by automatically moving data to the most cost-effective access tier, without performance impact or operational overhead.
  • S3 Glacier
    • Retrieval times configurable from minutes to hours.
  • S3 Glacier Deep Archive
    • retrieval time of 12 hours is acceptable.

Charges

  • Storage
  • Requests
  • Storage Management Pricing
  • Data Transfer Pricing
  • Transfer Acceleration
  • Cross Region Replication Pricing

Pricing Tiers

  • What makes up the cost of S3?
    • Storage (Understand how to get the best value out of S3)
    • Requests and Data Retrievals
    • Data Transfer
    • Management & Replication

Security & Encryption

  • Encryption In Transit is achieved by

  • Encryption At Rest (Server Side) is achieved by

    • S3 Managed Keys: SSE (Server Side Encryption) - S3
    • AWS Key Management Service, Managed Keys: SSE-KMS
      • This is where you and Amazon manage the keys together.
    • Server Side Encryption With Customer Provided Keys: SSE-C
      • This is where you actually give Amazon your own keys that you manage and you can encrypt your S3 objects

    Client Side means encrypt documents before uploading to S3

Version Control

  • Using Versioning With S3
    • Stores all versions of an object
    • Great backup tools
    • Once enables, Versioning cannot be disabled, only suspended.
    • Integrates with Life-cycle rules.
    • Versioning’s MFA Delete capability, which uses multi-factor authentication, can be used to provide an additional layer of security.

Object Lock & Glacier Vault Lock

Object Lock

You can use S3 Object Lock to store objects using a write once, read many (WORM) model. It can help you prevent objects from being deleted or modified for a fixed amount of time or indefinitely.

  • Governance Mode: users can’t overwrite or delete an object version or alter its lock settings unless they have special permissions.
  • Compliance Mode: a protected object version can’t be overwritten or deleted by any user, including the root user in your aws account. Compliance mode ensures an object version can’t be overwritten or deleted for the duration of the retention period.

  • Retention Period & Legal Holds

    • Retention period: Protects an object version for a fixed amount of time.
      • After the retention period expires, the object version can be overwritten or deleted unless you also placed a legal hold on the object version.
    • Legal Holds: S3 Object Lock also enables you to place a legal hold on an object version.
      • Like a retention period, a legal hold prevents an object version from being overwritten or deleted.
      • However, a legal hold doesn’t have an associated retention period and remains in effect until removed.
      • Legal holds can be freely placed and removed by any user who has the S3:PutObjectLegalHold permission.

Glacier Vault Lock

You can easily deploy and enforce compliance controls for individual S3 glacier vaults with a Vault Lock policy. You can specify controls, such as WORM, in a Vault Lock policy and lock the policy from future edits. Once locked, the policy can no longer be changed.

Performance

You can a achieve a high number of requests: 3500 PUT/COPY/POST/DELETE and 5500 GET/HEAD requests per second per prefix.

  • You can get better performance by spreading your reads across different prefixes.
  • The more prefixed we have, the better performance we can achieve.

S3 LIMITATION WHEN USING KMS

  • Using SSE-KMS to encrypt your objects in S3, you must keep in mind the KMS limits.
  • When you upload a file ,you will call GenerateDataKey in the KMS API.
  • When you download a file, you will call Decrypt in the KMS API.
  • Uploading/downloading will count toward the KMS quota.
  • Region-specific, however, it;s either 5,500, 10,000 or 30,000 requests per second.
  • Currently, you cannot requests a quota increase for KMS.

Multi Uploads

  • Recommended for files over 100 MB
  • Required for files over 5 GB
  • Parallelize uploads (increases efficiency)

Downloads (S3 Byte-Range Fetches)

  • Parallelize downloads by specifying byte ranges.
  • If there’s a failure in the download, it’s only for a specific byte range.
  • Can be used yo just download partial amounts of the file (e.g., header information).

S3 Select & Glacier Select

  • S3 Select enables applications to retrieve only a subset of data from an object by using simple SQL expressions.
  • Could achieve drastic performance increase(Up to 400% Faster and 80% cheaper).
  • Get data by rows or columns using simple SQL expressions.

AWS Organizations & Consolidated Billing

  • AWS Organizations is an account management service that enables you to consolidate multiple AWS accounts into an organization that you create and centrally manage.

  • Consolidated Billing: The more you use S3 across the entire organization the less that you pay.

    • One bill per AWS account
    • Very easy to track charges and allocate costs
    • Volume pricing discount

S3 Cross Account Access

3 ways to share S3 buckets across accounts

  • Using Bucket Policies & IAM (applies across the entire bucket). Programmatic Access Only
  • Using Bucket ACLs & IAM (individual objects). Also Programmatic Access Only.
  • Cross-account IAM Roles. Programmatic and Console access.

Cross Region Replication

  • Versioning must be enabled on both the source and destination.
  • Files in an existing bucket are not replicated automatically.
  • All subsequent updated files will be replicated automatically.
  • Delete markers are not replicated.
  • Deleting individual versions or delete markers will not be replicated.
  • Understand what Cross Region Replication is at a high level.

S3 Transfer Acceleration

  • S3 Transfer Acceleration utilize the CloudFront Edge Network to accelerate your uploads to S3.

  • Instead of uploading directly to your S3 bucket, you can use a distinct URL to upload directly to an edge location which will then transfer that file to S3.

  • You will get a distinct URL to upload to

    acloudguru.s3-accelerate.amazonaws.com

AWS DataSync

  • DataSync automatically encrypts data and accelerates transfer over the WAN. DataSync performs automatic data integrity checks in-transit and at-rest.
  • DataSync Agent is deployed as an agent on a server and connects to your NAS or file system to copy data to AWS and write data from AWS.
  • DataSync seamlessly and securely connects to Amazon S3, Amazon EFS, or Amazon FSx for Windows File Server to copy data and meta-data to and from AWS.

CloudFront

  • A content delivery network(CDN) is a system of distributed servers(network) that deliver web pages and other web content to a user based on the geographic locations of the user, the origin of the web page, and a content delivery server.
  • Key Terminology
    • Edge Location
      • This is the location where content will be cached. This is separate to an AWS Region/AZ.
      • Edge locations are not just READ only — you can write to them too.
    • Origin: This is the origin of all the files that the CDN will distribute. This can be an S3 bucket, an EC2 Instance, an Elastic Load Balancer, or Route53.
    • Distribution: This is the name given the CDN which consists of a collection of Edge Locations.
    • Web Distribution: Typically used for Web sites.
    • RTMP: Used for Media Streaming.

CloudFront Signed URL’s and Cookies

Use signed URLs/cookies when you want to secure content so that only the people you authorize are able to access it.

  • ULRs vs. Cookies
    • 1 file = 1 URL: A signed URL is for individual files.
    • 1 cookie = multiple files: A signed cookie is for multiple files.
  • When we create a signed URL or signed cookie, we attach a policy. The policy can include:
    • URL expiration
    • IP ranges
    • Trusted signers (which AWS accounts can create signed URLs)
  • CloudFront Signed URL
    • Can have different origins. Does not have to be EC2.
    • Key-pair is account wide and managed by the root user
    • Can utilize caching features
    • Can filter by date, path, IP address, expiration, etc.
  • S3 Signed URL
    • Issues a request as the IAM user who creates the pre-signed URL
    • Limited lifetime

Snowball

It’s a big big disk. Snowball is a petabyte-scale data transport solution that uses secure appliances to transfer large amounts of data into and out of AWS.

  • Snowball comes in either a 50 TB or 80 TB size.
  • AWS Snowball Edge is a 100 TB data transfer device with on-board storage and compute capacities.
  • AWS Snowmobile is an Exabyte-scale data transfer service used to move extremely large amounts of data to AWS.

Storage Gateway

  • AWS Storage Gateway is a service that connects an on-premises software appliance with cloud-based storage to provide seamless and secure integration between an organization’s on-promises IT environment and AWS’s Storage infrastructure.

  • AWS Storage Gateway’s software appliance is available for download as a virtual machine(VM) image that you install on a host in your datacenter.

  • Three different types of Storage Gateway

    The volume interface presents your applications with disk volumes using the iSCSI block protocol.

    • File Gateway (NFS & SMB)

      Files are stored as objects in your S3 buckets, accessed through a Network File System (NFS) mount point.

    • Volume Gateway (iSCSI)

      It is basically is a way of storing your virtual hard disk drives in S3, and it looks like EBS snapshots.

      • Stored Volumes: let you store your primary data locally, while asynchronously backing up that data to AWS.
      • Cached Volumes: let you use Amazon Simple Storage Service(S3) as your primary data storage while retaining frequently accessed data locally in your storage gateway.
    • Tape Gateway (Gateway Virtual Tape Library)

      • Tape Gateway offers a durable, cost-effective solution to archive your data in the AWS Cloud.

      • Used for backup and uses popular backup applications like NetBackup. Backup Exec Veeam etc.

Athena vs Macie

Athena

Interactive query service which enables you to analyze and query data located in S3 using standard SQL.

  • Serverless, nothing to provision, pay per query / per TB scanned.
  • No need to set up complex Extract/Transform/Load (ETL) processes.
  • Works directly with data stored in S3.
  • Can be used to query log files stored in S3
  • Generate business reports on data stored in S3
  • Analyze AWS cost and usage reports
  • Run queries on click-stream data

Macie

Security service which uses Machine Learning and NLP (Natural Language Processing) to discover, classify and protect sensitive data stored in S3.

  • PII (Personally Identifiable Information)
    • Personal data used to establish an individual’s identity.
    • This data could be exploited by criminals, used in identity theft and financial fraud
    • Home address, email address, SSN
    • Passport number, driver’s license number
    • D.O.B, phone number, bank account, credit card number.
  • Macie
    • Used AI to recognize if your S3 objects contain sensitive data such as PII
    • Dashboards, reporting and alerts
    • Works directly with data stored in S3
    • Can also analyze CloudTrail logs
    • Great for PCI-DSS and preventing ID theft.

Exam Tips

  • Every tips in “What is S3” part

  • Not suitable to install an operating system or host a database on (box base storage needed, not object based).

  • You can turn on MFA Delete to protect the data.

  • Read after Write consistency for PUTS of NEW Objects.

  • Eventual consistency for overwrite PUTS and DELETES(can take some time to propagate).

  • Control access to buckets using either a bucket ACL or using Bucket Polices.

    • ACL (Access Control List) allow you to set fine grained permissions all the way down to individual objects.
    • Bucket policies (use JSON-based language) are applied to the entire bucket.
  • Stores all versions of an objects(including all writes and even if you delete an object)

  • Versioning cannot be disabled, only suspended once enabled

  • Life-cycle Policies:

    • Automates moving your objects between the different storage tiers.
    • Can be used in conjunction with versioning.
    • Can be applied to current versions and previous versions.
  • Object Lock & Glacier Vault Lock

    • Use S3 Object to store objects using a write once, read many (WORM) model.
    • Object locks can be on individual objects or applied across the bucket as a whole.
    • Object locks come in two modes: governance mode and compliance mode.
    • S3 Glacier Vault Lock: you can specify controls such as WORM in a Vault Lock policy and lock the policy from future edits.
  • Performance

    • prefixed simply is the pathway between you bucket name and filenames

      mybucketname/folder1/subfolder1/myfile.jpg -> /folder1/subfolder1

    • 3500 PUT/COPY/POST/DELETE and 5500 GET/HEAD requests per second per prefix.

    • You can get better performance by spreading your reads across different prefixes.

    • If you are using SSE-KMS to encrypt your objects in S3, you must keep in mind the KMS limits.

    • multipart uploads & S3 byte-range fetches

  • Practices with AWS Organizations

    • Always enable multi-factor authentication on root account.
    • Always use a strong and complex password on root account.
    • Paying account should be used for billing purposes only. Do not deploy resources into the paying account.
    • Enable/Disable AWS services using Service Control Policies (SCP) either on OU or on individual accounts.
  • 3 Different ways to share S3

  • Cross Region Replication

  • AWS DataSync

    • Used to move large amounts of data from on-promises to AWS.
    • Used with NFS- and SMB-compatible file systems.
    • Replication can be done hourly, daily, or weekly.
    • Install the DataSync agent to start the replication.
    • Can be used to replicate EFS to EFS.
  • CloudFront

    • Edge locations are not just READ only, you can write to them too.(i.e. put an object on to them.)
    • Objects are cached for the life of the TTL (Time to Live.)
    • You can clear cached objects, but you will be charges. That’s a really important exam topic that you can invalidate cache contents.
  • CloudFront Signed URL’s and Cookies

    • If your origin is EC2, then use CloudFront.
    • If your origin is going to be S3, and you’ve only got a single file in there, then you want to use a S3 signed URL instead of a CloudFront signed URL.
    • Think about whether or not your users can actually access S3 if they’re using OAI through CloudFront.
      • If they can’t, you’d be using a CloudFront signed URL.
      • If they can access the S3 bucket directly and it’s just an individual object, then you probably want an S3 signed URL.
  • Storage Gateway

    • File Gateway - For flat files, stored directly on S3.
    • Volume Gateway
      • Stored Volumes - Entire Dataset is stored on site and is asynchronously backed up to S3.
      • Cached Volumes - Entire Dataset is stored on S3 and the most frequently accessed data is cached on site.
    • Gateway Virtual Tape Library
  • Athena

    Remember what Athena is and what it allows you to do:

    • Athena is an interactive query service
    • It allows you to query data located in S3 using standard SQL
    • Serverless
    • Commonly used to analyze log data stored in S3.
  • Macie

    Remember what Macie is and what it allows you to do:

    • Macie uses AI to analyze data in S3 and helps identify PII
    • Can also be used to analyze CouldTrail logs for suspicious API activity
    • Includes Dashboards, Reports and Alerting
    • Great for PCI-DSS compliance and preventing ID theft.
  • Summary - IAM

    • IAM is universal. It does not apply to regions at this time.
    • The ‘root account’ is simply the account created when first setup your AWS account. It has complete Admin access.
    • New Users have NO permissions when first created.
    • New users are assigned Access Key ID & Secret Access Keys when first created.
    • These are not the same as a password. You cannot use the Access key ID & Secret Access Key to Login in to the console. You can use this to access AWS via the APIs and Command Line, however.
    • You only get to view these once. If you lose them, you have to regenerate them. So, save them in a secure location.
    • Always setup Multi-factor Authentication on tour root account.
    • You can create and customize your own password rotation policies.
  • Summary - S3

    • Remember that S3 is Object-based (allow you to upload files).
    • Files can be from 0 to 5 TB.
    • There is unlimited storage.
    • Files are stored in Buckets.
    • S3 is a universal name-space.
    • By default, all newly created buckets are private. you can setup access control to your buckets using: Bucket Policies and Access Control Lists.
    • The Key Fundamentals of S3
      • Key
      • Value
      • Version ID
      • Meta-data
      • Sub-resources
        • Access Control Lists
    • Read after Write consistency for PUTS of new Objects
    • Eventually Consistency for overwrite PUTS and DELETED (can take some time to propagate)
    • Understand how to get the best value out of S3
      • S3 Standard
      • S3 - IA
      • S3 One Zone - IA
      • S3 - Intelligent Tiering
      • S3 Glacier
      • S3 Glacier Deep Archive
    • Encryption in Transit is achieved by
      • SSL / TLS
    • Encryption At Rest (server side) is achieved by
      • S3 Managed Keys - SSE - S3
      • AWS Key Management Service, Managed Keys - SSE - KMS
      • Server Side Encryption With Customer Provided Keys - SSE - C
    • Client Side Encryption
    • Object Lock
      • Use S3 Object Lock to store objects using a write once, read many (WORM) model.
      • Object locks can be on individual objects or applied across the bucket as a whole.
      • Object locks come in two modes: governance mode and compliance mode.
        • With governance mode, users can’t overwrite or delete an object version or alter its lock setting unless they have special permissions.
        • With compliance mode, a protected object version can’t be overwritten or deleted by any user, including the root user in your AWS account.
      • S3 Glacier Vault Lock allows you to easily deploy and enforce compliance controls for individual S3 Glacier vaults with a Vault Lock policy. You can specify controls such as WORM in a Vault Lock policy and lock the policy from future edits. Once locked, the policy can no longer be changed.
    • You can get better performance by spreading your reads across different prefixes. For example, if you are using two prefixes, you can achieve 11,000 requests per second.
    • If you are using SSE-KMS to encrypt your objects in S3, you must keep in mind the KMS limits.
      • Uploading/downloading will count toward the KMS quota.
    • Multipart Uploads
      • Use multipart uploads to increase performance when uploading files to S3.
      • Should be used for any files over 100 MB and must be used for any file over 5 GB.
      • Use S3 byte-range fetches to increase performance when downloading files to S3.
    • S3 Select
      • Remember that S3 Select is used to retrieve only a subset of data from an object by using simple SQL expressions.
      • Get data by rows or columns using simple SQL expressions.
      • Save money on data transfer and increase speed.

REMEMBER TO READ FAQ

https://aws.amazon.com/s3/faqs/

https://aws.amazon.com/iam/faqs/

AWS Solution Architect(Associate) - Topic 1: Identity Access Management and S3

http://vincentgaohj.github.io/Blog/2020/08/13/AWS-Solution-Architect-Associate-1-Identity-Access-Management-and-S3/

Author

Haojun(Vincent) Gao

Posted on

2020-08-13

Updated on

2022-02-22

Licensed under

Comments