AWS (Amazon Web Services) Study Notes
The study notes during the internship period of AWS, including AWS Global Infrastructure, Networking in AWS, Storage in AWS, Compute in AWS and Database in AWS, to name a few.
[TOC]
AWS Global Infrastructure
Components
- Regions
- Available Zone
- Fully isolated data centers
- Meaningful distance of separation 80公里
- Long distance will lead to the high synchronization delay.
- Short distance will bring high risk that multiple available zone are affected at the same time.
- Data Centers
- Points of Presence(PoP)
- POP is primarily the infrastructure that allows remote users connect to the Internet.
- Network
- Custom Hardware
Machine Learning on AWS
- Application Developers
- SageMaker
- Rekognition
- Polly
- Lex
- etc.
- Data Scientists & Researchers
- AWS Deep Learning AMI
- Optimized for distributed machine
Networking in AWS
VPC (Virtual Private Cloud)
Amazon Virtual Private Cloud (Amazon VPC) lets you provision a logically isolated section of the AWS Cloud where you can launch AWS resources in a virtual network that you define.
The relations between VPC and region, VPC and AZ are below:
- VPCs deploy into 1 of the 18 AWS regions
- A VPC can host resources from any Available Zone within its region
Subnets
When you create a subnet, you specify the CIDR block for the subnet, which is a subset of the VPC CIDR block.
When you create a VPC, you must specify a range of IPv4 addresses for the VPC in the form of a Classless Inter-Domain Routing (CIDR) block; for example, 10.0.0.0/16
. This is the primary CIDR block for your VPC.
Gateway
An internet gateway is a horizontally scaled, redundant, and highly available VPC component that allows communication between instances in your VPC and the internet. It therefore imposes no availability risks or bandwidth constraints on your network traffic.
An internet gateway serves two purposes:
- to provide a target in your VPC route tables for internet-routable traffic
- to perform network address translation (NAT) for instances that have been assigned public IPv4 addresses.
You can use a NAT device to enable instances in a private subnet to connect to the internet (for example, for software updates) or other AWS services, but prevent the internet from initiating connections with the instances.
A NAT device forwards traffic from the instances in the private subnet to the internet or other AWS services, and then sends the response back to the instances. When traffic goes to the internet, the source IPv4 address is replaced with the NAT device’s address and similarly, when the response traffic goes to those instances, the NAT device translates the address back to those instances’ private IPv4 addresses.
Elastic Network Interfaces
You can create a network interface, attach it to an instance, detach it from an instance, and attach it to another instance. The attributes of a network interface follow it as it’s attached or detached from an instance and reattached to another instance. When you move a network interface from one instance to another, network traffic is redirected to the new instance.
弹性 IP 就是固定 IP ,弹性是指的可以弹性的关联实例,对于实例来说,弹性 IP 的作用就是固定 IP 。
Security
Security Groups
- A security group acts as a virtual firewall for your instance to control inbound and outbound traffic. When you launch an instance in a VPC, you can assign up to five security groups to the instance. Security groups act at the instance level, not the subnet level. Therefore, each instance in a subnet in your VPC could be assigned to a different set of security groups. If you don’t specify a particular group at launch time, the instance is automatically assigned to the default security group for the VPC.
- For each security group, you add rules that control the inbound traffic to instances, and a separate set of rules that control the outbound traffic. This section describes the basic things you need to know about security groups for your VPC and their rules.
Network ACLs
- A network access control list (ACL) is an optional layer of security for your VPC that acts as a firewall for controlling traffic in and out of one or more subnets. You might set up network ACLs with rules similar to your security groups in order to add an additional layer of security to your VPC.
Comparison of Security Groups and Network ACLs
- Traffic from an Internet gateway is routed to the appropriate subnet using the routes in the routing table.
- The rules of the network ACL associated with the subnet control which traffic is allowed to the subnet.
- The rules of the security group associated with an instance control which traffic is allowed to the instance.
AWS Direct Connect
AWS Direct Connect links your internal network to an AWS Direct Connect location over a standard Ethernet fiber-optic cable.
One end of the cable is connected to your router, the other to an AWS Direct Connect router. With this connection, you can create virtual interfaces directly to public AWS services (for example, to Amazon S3) or to Amazon VPC, bypassing internet service providers in your network path. An AWS Direct Connect location provides access to AWS in the Region with which it is associated. You can use a single connection in a public Region or AWS GovCloud (US) to access public AWS services in all other public Regions.
VPC Peering
Amazon Virtual Private Cloud (Amazon VPC) enables you to launch AWS resources into a virtual network that you’ve defined.
- A VPC peering connection is a networking connection between two VPCs that enables you to route traffic between them using private IPv4 addresses or IPv6 addresses.
- Instances in either VPC can communicate with each other as if they are within the same network.
- You can create a VPC peering connection between your own VPCs, or with a VPC in another AWS account.
- The VPCs can be in different regions (also known as an inter-region VPC peering connection).
Transit Gateway
AWS Transit Gateway is a service that enables customers to connect their Amazon Virtual Private Clouds (VPCs) and their on-premises networks to a single gateway.
With AWS Transit Gateway, you only have to create and manage a single connection from the central gateway in to each Amazon VPC, on-premises data center, or remote office across your network. Transit Gateway acts as a hub that controls how traffic is routed among all the connected networks which act like spokes. This hub and spoke model significantly simplifies management and reduces operational costs because each network only has to connect to the Transit Gateway and not to every other network. Any new VPC is simply connected to the Transit Gateway and is then automatically available to every other network that is connected to the Transit Gateway. This ease of connectivity makes it easy to scale your network as you grow.
VPC Endpoints
A VPC endpoint enables you to privately connect your VPC to supported AWS services and VPC endpoint services powered by PrivateLink without requiring an internet gateway, NAT device, VPN connection, or AWS Direct Connect connection. Instances in your VPC do not require public IP addresses to communicate with resources in the service. Traffic between your VPC and the other service does not leave the Amazon network.
主要是出于安全及合规考虑,访问 AWS 公有服务时,不走 Internet。主要分为 2 类:
- Gateway VPC Endpoint
早期的技术实现,主要针对 S3 和 DynamoDB,将这些 AWS 服务的公网路由注入 VPC 及 Subnet 的路由表中(用 PL-xxxxxxxx 标识、作为 Destination),VPC Endpoint 作为 Target(用 vpce-xxxxxxxx 标识,应该是提供 NAT 功能)。可以在 VPC Endpoint 配置 IAM 策略,能够访问哪些 S3 Bucket;也可以在 S3 Bucket 配置 IAM 策略,能够被哪些 VPC 或 VPC Endpoint 访问,不能采用基于源 IP 地址的策略。此外在安全组也可以引用 PL-xxxxxxxx、配置策略,网络 ACL 中不能引用 PL-xxxxxxxx。
- Interface VPC Endpoint
最新的技术实现,基于 AWS PrivateLink 技术,针对 EC2、ELB、Kinesis 等,为这些 AWS 服务在 Consumer VPC 增加了一个或多个 ENI 接口及 IP 地址,同时为这些 ENI 接口提供 Region 及 Zone 的 DNS 域名(公网可解析、返回私网 IP 地址),也可以在 Consumer VPC 内部将标准 AWS 服务域名(如:ec2.us-east-2.amazonaws.com)解析为这些 ENI 接口的私有 IP 地址。
通过 PrivateLink 技术,我们自己也可以对外发布 Endpoint Service:在 Provider VPC 创建 Network ELB 及 Back-end 服务器、基于 ELB 创建 Endpoint Service;在 Consumer VPC 创建 Interface VPC Endpoint、引用 Provider VPC 的 Endpoint Service。
ELB (Elastic Load Balancing)
Elastic Load Balancing distributes incoming application or network traffic across multiple targets, such as Amazon EC2 instances, containers, and IP addresses, in multiple Availability Zones. Elastic Load Balancing scales your load balancer as traffic to your application changes over time, and can scale to the vast majority of workloads automatically.
Elastic Load Balancing supports three types of load balancers:
- Application Load Balancers
- Network Load Balancers
- Classic Load Balancers
You can select the appropriate load balancer based on your application needs. If you need flexible application management, we recommend that you use an Application Load Balancer. If extreme performance and static IP is needed for your application, we recommend that you use a Network Load Balancer. If you have an existing application that was built within the EC2-Classic network, then you should use a Classic Load Balancer.
Tips
Please pay attention to the FAQs.
API Gateway
Amazon API Gateway is an AWS service for creating, publishing, maintaining, monitoring, and securing REST and WebSocket APIs at any scale. API developers can create APIs that access AWS or other web services as well as data stored in the AWS Cloud.
Features of API Gateway
- Support for stateful (WebSocket) and stateless (REST) APIs.
- Powerful, flexible authentication mechanisms, such as AWS Identity and Access Management policies, Lambda authorizer functions, and Amazon Cognito user pools.
- Developer portal for publishing your APIs.
- Canary release deployments for safely rolling out changes.
- CloudTrail logging and monitoring of API usage and API changes.
- Integration with AWS WAF for protecting your APIs against common web exploits.
- Integration with AWS X-Ray for understanding and triaging performance latencies.
Storage in AWS
S3 (Simple Storage Service)
Amazon Simple Storage Service (Amazon S3) is storage for the internet. You can use Amazon S3 to store and retrieve any amount of data at any time, from anywhere on the web.
Buckets
A bucket is a container for objects stored in Amazon S3. Every object is contained in a bucket.
Objects
Objects are the fundamental entities stored in Amazon S3.
Keys
A key is the unique identifier for an object within a bucket.
Regions
You can choose the geographical region where Amazon S3 will store the buckets you create.
S3 Glacier
Glacier is an extremely low-cost storage service that provides durable storage with security features for data archiving and backup. With Glacier, customers can store their data cost effectively for months, years, or even decades.
EBS (Elastic Block Store)
Amazon Elastic Block Store (Amazon EBS) provides block level storage volumes for use with EC2 instances. EBS volumes behave like raw, unformatted block devices. You can mount these volumes as devices on your instances.
Amazon EBS is recommended when data must be quickly accessible and requires long-term persistence.
Comparison of EBS and S3
- EBS 是块存储,S3 是对象存储。EBS 仅能与 EC2 实例结合使用。你可以把 EBS 想象成 EC2 的硬盘,把 S3 就想象成一个网盘;
- 收费:EBS 的卷存储按每月预置的 GB 量计费,无论使用与否,而 S3 按照实际使用 GB 量收费;
- 请求:EBS 按卷的 I/O 请求进行收费,S3 对 GET 及所有其他请求按次数进行收费,对于小文件 S3 的请求费用甚至会高于传输费用;
- 数据传出:两者数据传出至 Internet 的费用目前一致;S3 的数据存储相对更为可靠,S3 通过冗余方式将数据同步到多个设备,而 EBS 卷的持久性取决于您的卷大小和自上次快照后数据更新的比例,因此 EBS 提高持久性需定期快照至 S3。
Compute in AWS
EC2 (Elastic Compute Cloud)
Amazon Elastic Compute Cloud (Amazon EC2) provides scalable computing capacity in the Amazon Web Services (AWS) cloud. Using Amazon EC2 eliminates your need to invest in hardware up front, so you can develop and deploy applications faster.
An Amazon EC2 Windows instance is similar to the traditional Windows Server. After you launch an instance, it briefly goes into the pending state while registration takes place, then it goes into the running state. The instance remains active until you stop or terminate it. You can’t restart an instance after you terminate it. You can create a backup image of your instance while it’s running, and launch a new instance from that backup image.
Instance Purchasing Options
On-Demand
With On-Demand instances, you pay for compute capacity by per hour or per second depending on which instances you run.
Spot instances
Amazon EC2 Spot instances allow you to request spare Amazon EC2 computing capacity for up to 90% off the On-Demand price.
Reserved Instances
Reserved Instances provide you with a significant discount (up to 75%) compared to On-Demand instance pricing.
Differences between Dedicated Hosts and Dedicated Instances
Dedicated Hosts and Dedicated Instances can both be used to launch Amazon EC2 instances onto physical servers that are dedicated for your use.
Dedicated Instances 知道在一个独享的设备上运行,但是不知道在那个设备上。Dedicated Host 不仅独享一个设备,而且知道是在那一个设备上运行。
Dedicated Host | Dedicated Instance | |
---|---|---|
Visibility of sockets, cores, and host ID | Provides visibility of the number of sockets and physical cores | No visibility |
Host and instance affinity | Allows you to consistently deploy your instances to the same physical server over time | Not supported |
Targeted instance placement | Provides additional visibility and control over how instances are placed on a physical server | Not supported |
Automatic instance recovery | Supported. For more information, see Host Recovery. | Supported |
Bring Your Own License (BYOL) | Supported | Not supported |
Database in AWS
Overview
Relational Database
- RDS
- Redshift
Non-relational Database
- Aurora
- DocumentDB
- DynamoDB
- Neptune
类型 | 关系型数据库 | 非关系型数据库 |
---|---|---|
特性 | 1、采用了关系模型来组织数据的数据库; 2、最大特点就是事务的一致性; 3、简单来说,关系模型指的就是二维表格模型, 而一个关系型数据库就是由二维表及其之间的联系所组成的一个数据组织。 | 1、使用键值对存储数据; 2、分布式; 3、一般不支持 ACID 特性; 4、非关系型数据库严格上不是一种数据库,应该是一种数据结构化存储方法的集合。 |
优点 | 1、容易理解:二维表结构是非常贴近逻辑世界一个概念,关系模型相对网状、层次等其他模型来说更容易理解; 2、使用方便:通用的 SQL 语言使得操作关系型数据库非常方便; 3、易于维护:丰富的完整性 (实体完整性、参照完整性和用户定义的完整性) 大大减低了数据冗余和数据不一致的概率; 4、支持 SQL,可用于复杂的查询。 | 1、无需经过 sql 层的解析,读写性能很高; 2、基于键值对,数据没有耦合性,容易扩展; 3、存储数据的格式:nosql 的存储格式是 key,value 形式、文档形式、图片形式等等,文档形式、图片形式等等,而关系型数据库则只支持基础类型。 |
缺点 | 1、为了维护一致性所付出的巨大代价就是其读写性能比较差; 2、固定的表结构; 3、高并发读写需求; 4、海量数据的高效率读写; | 1、不提供 sql 支持,学习和使用成本较高; 2、无事务处理,附加功能 bi 和报表等支持也不好; |
Management & Governance in AWS
Auto Scaling
AWS Auto Scaling enables you to configure automatic scaling for the AWS resources that are part of your application in a matter of minutes.
With AWS Auto Scaling, you configure and manage scaling for your resources through a scaling plan. The scaling plan uses dynamic scaling and predictive scaling to automatically scale your application’s resources.
Ways to Auto Scale
- Scheduled — Excellent for predictable workloads
- Dynamic — Most used
- Predictive — Could combine with machine learning
CloudWatch
Amazon CloudWatch monitors your Amazon Web Services (AWS) resources and the applications you run on AWS in real time. You can use CloudWatch to collect and track metrics, which are variables you can measure for your resources and applications.
Amazon CloudWatch is basically a metrics repository. An AWS service—such as Amazon EC2—puts metrics into the repository, and you retrieve statistics based on those metrics. If you put your own custom metrics into the repository, you can retrieve statistics on these metrics as well.
The CloudWatch overview home page appears.
What can I do with Cloudwatch?
- Dashboards - Creates awesome dashboards to see what is happening with your AWS environment.
- Alarms - Allows you to set Alarms that notify you when particular thresholds are hit.
- Events - CloudWatch Events helps you to respond to state changes in your AWS resources.
- Logs - CloudWatch Logs helps you to aggregate, monitor, and store logs.
CloudFormation
AWS CloudFormation is a service that helps you model and set up your Amazon Web Services resources so that you can spend less time managing those resources and more time focusing on your applications that run in AWS. You create a template that describes all the AWS resources that you want (like Amazon EC2 instances or Amazon RDS DB instances), and AWS CloudFormation takes care of provisioning and configuring those resources for you.
The following scenarios demonstrate how AWS CloudFormation can help.
- Simplify Infrastructure Management
- Quickly Replicate Your Infrastructure
- Easily Control and Track Changes to Your Infrastructure
- Related Information
Application Integration in AWS
Overview
- SQS - Queue
- SNS - Notification
- SWF - Work Flow
SQS (Simple Queue Service)
Amazon Simple Queue Service (Amazon SQS) offers a secure, durable, and available hosted queue that lets you integrate and decouple distributed software systems and components. Amazon SQS offers common constructs such as dead-letter queues and cost allocation tags. It provides a generic web services API and it can be accessed by any programming language that the AWS SDK supports.
Amazon SQS supports both standard and FIFO queues.
Standard Queue | FIFO Queue |
---|---|
Unlimited Throughput – Standard queues support a nearly unlimited number of transactions per second (TPS) per API action. | High Throughput – By default, FIFO queues support up to 3,000 messages per second, per API action , with batching. FIFO queues support up to 300 messages per second, per API action without batching. |
At-Least-Once Delivery – A message is delivered at least once, but occasionally more than one copy of a message is delivered. | Exactly-Once Processing – A message is delivered once and remains available until a consumer processes and deletes it. Duplicates aren’t introduced into the queue. |
Best-Effort Ordering – Occasionally, messages might be delivered in an order different from which they were sent. | First-In-First-Out Delivery – The order in which messages are sent and received is strictly preserved. |
Send data between applications when the throughput is important, for example: Decouple live user requests from intensive background work: let users upload media while resizing or encoding it.Allocate tasks to multiple worker nodes: process a high number of credit card validation requests.Batch messages for future processing: schedule multiple entries to be added to a database. | Send data between applications when the order of events is important, for example: Ensure that user-entered commands are executed in the right order.Display the correct product price by sending price modifications in the right order.Prevent a student from enrolling in a course before registering for an account. |
SQS EXAM TIPS
- SQS is a distributed message queueing system
- Allows you to decouple the components of an application So that they are independent
- Pull-based, not push-based
- Standard Queues (default)- best effort ordering message delivered at least once
- FIFO Queues (First In First Out)一ordering strictly preserved, message delivered once, no duplicates. e.g. good for banking transactions which need to happen in strict order.
- Visibility Timeout
- Default is 30 seconds - Increase If your task takes > 30 seconds to complete
- Max 12 hours
- Short Polling-peturned immediately even if no messages are in the queue
- Long Polling - polls the queue perodically and only returns a response when a message is in the queue or the timeout reached
SNS (Simple Notification Service)
Amazon Simple Notification Service (Amazon SNS) is a web service that coordinates and manages the delivery or sending of messages to subscribing endpoints or clients. In Amazon SNS, there are two types of clients—publishers and subscribers—also referred to as producers and consumers. Publishers communicate asynchronously with subscribers by producing and sending a message to a topic, which is a logical access point and communication channel.
Differences between SNS and SQS
SNS 是分布式发布订阅系统。当发布者发送给 SNS 时,邮件会被推送到订阅者。 SQS 是分布式排队系统。消息不会推送到接收器。接收器必须轮询 SQS 以接收消息。消息不能由多个接收器同时接收。任何一个接收器可以接收消息,处理和删除它。其他接收器稍后不接收相同的消息。轮询固有地在 SQS 中的消息传递中引入了一些延迟,而不是 SNS,其中消息被立即推送到订户。 SNS 支持多个端点,例如电子邮件,短信,http 端点和 SQS。如果想要未知数量和类型的订阅者接收邮件,您需要 SNS。
SQS 主要用于解耦应用程序或集成应用程序。消息可以短期存储在 SQS 中 (最多 14 天)。 SNS 将多个消息副本分发给多个订户。例如,假设您要将应用程序生成的数据复制到多个存储系统。您可以使用 SNS 并将此数据发送给多个订阅者,每个订阅者都会将其收到的消息复制到不同的存储系统 (s3,主机上的硬盘,数据库等)。
SWF (Simple Workflow Service)
Using the Amazon Simple Workflow Service (Amazon SWF), you can implement distributed, asynchronous applications as workflows. Workflows coordinate and manage the execution of activities that can be run asynchronously across multiple computing devices and that can feature both sequential and parallel processing.
For example, the following figure shows a simple e-commerce order-processing workflow involving both people and automated processes.
Workflow Execution
Bringing together the ideas discussed in the preceding sections, here is an overview of the steps to develop and run a workflow in Amazon SWF:
Write activity workers that implement the processing steps in your workflow.
An activity worker is a program that receives activity tasks, performs them, and provides results back. Note that the task itself might actually be performed by a person, in which case the person would use the activity worker software for the receipt and disposition of the task. An example might be a statistical analyst, who receives sets of data, analyzes them, and then sends back the analysis.
Write a decider to implement the coordination logic of your workflow.
The coordination logic in a workflow is contained in a software program called a decider. The decider schedules activity tasks, provides input data to the activity workers, processes events that arrive while the workflow is in progress, and ultimately ends (or closes) the workflow when the objective has been completed.
Register your activities and workflow with Amazon SWF.
You can do this step programmatically or by using the AWS Management Console.
Start your activity workers and decider.
These actors can run on any computing device that can access an Amazon SWF endpoint. For example, you could use compute instances in the cloud, such as Amazon Elastic Compute Cloud (Amazon EC2); servers in your data center; or even a mobile device, to host a decider or activity worker. Once started, the decider and activity workers should start polling Amazon SWF for tasks.
Start one or more executions of your workflow.
Executions can be initiated either programmatically or via the AWS Management Console.
Each execution runs independently and you can provide each with its own set of input data. When an execution is started, Amazon SWF schedules the initial decision task. In response, your decider begins generating decisions which initiate activity tasks. Execution continues until your decider makes a decision to close the execution.
View workflow executions using the AWS Management Console.
You can filter and view complete details of running as well as completed executions. For example, you can select an open execution to see which tasks have completed and what their results were.
Differences between SWF and SQS
- Amazon SWF presents a task oriented API, whereas Amazon SQS offers a message oriented API.
- Amazon SWF ensures that a task is assigned only once and is never duplicated. With Amazon SQS, you need to handle duplicated messages and may also need to ensure that a message is processed only once.
- Amazon SWF keeps track of all the tasks and events in an application. With Amazon SQS, you need to
implement your own application-level tracking especially if your application uses multiple queues.
Media Services in AWS
Elastic Transcoder
Amazon Elastic Transcoder lets you convert media files that you have stored in Amazon Simple Storage Service (Amazon S3) into media files in the formats required by consumer playback devices. For example, you can convert large, high-quality digital media files into formats that users can play back on mobile devices, tablets, web browsers, and connected televisions.
Elastic Transcoder has four components:
Jobs do the work of transcoding.
Each job converts one file into up to 30 formats. For example, if you want to convert a media file into six different formats, you can create files in all six formats by creating a single job.
Pipelines are queues that manage your transcoding jobs.
When you create a job, you specify which pipeline you want to add the job to. Elastic Transcoder starts processing the jobs in a pipeline in the order in which you added them. If you configure a job to transcode into more than one format, Elastic Transcoder creates the files for each format in the order in which you specify the formats in the job.
Presets are templates that contain most of the settings for transcoding media files from one format to another.
Elastic Transcoder includes some default presets for common formats, for example, several iPod and iPhone versions. You can also create your own presets for formats that aren’t included among the default presets. You specify which preset you want to use when you create a job.
Notifications let you optionally configure Elastic Transcoder and Amazon Simple Notification Service to keep you apprised of the status of a job: when Elastic Transcoder starts processing the job, when Elastic Transcoder finishes the job, and whether Elastic Transcoder encounters warning or error conditions during processing.
Analytics in AWS
Kinesis
Amazon Kinesis makes it easy to collect, process, and analyze video and data streams in real time.
SQS EXAM TIPS
Know the differences between video stream and Firehose, and choose the proper service under specific scenarios.
Kinesis Video Streams
Amazon Kinesis Video Streams is a fully managed AWS service that enables you to stream live video from devices to the AWS Cloud and durably store it. You can then build your own applications for real-time video processing or perform batch-oriented video analytics.
Kinesis Data Firehose
Amazon Kinesis Data Firehose is a fully managed service for delivering real-time streaming data to destinations such as Amazon Simple Storage Service (Amazon S3), Amazon Redshift, Amazon Elasticsearch Service (Amazon ES), and Splunk.
For Amazon S3 destinations, streaming data is delivered to your S3 bucket. If data transformation is enabled, you can optionally back up source data to another Amazon S3 bucket.
Reference
AWS (Amazon Web Services) Study Notes
http://vincentgaohj.github.io/Blog/2019/08/07/AWS-Amazon-Web-Services-Study-Notes/