AWS Certified Machine Learning (Specialty) - Topic 1: Data Engineering

AWS Certified Machine Learning - Specialty is an advanced certification a bit different from the others, because it is the only one which focuses on specific sector knowledge not strictly tied to AWS services.

In fact, in order to pass the exam and obtain the certification, it’s fundamental being able to recognize, analyze and optimize different machine learning problems starting from use cases’ descriptions, without them being exclusively linked to peculiar AWS’ solutions.

AWS Machine Learning Specialty

[toc]

Data Engineering

Amazon S3

This is my previous article about Amazon Storage(for solution architect), and you can get pretty much everything you need to know about AWS Storage.

AWS S3 for Machine Learning

  • Backbone for many AWS ML services (e.g. SageMaker)
  • Create a “Data Lake”
    • Infinite size, no provisioning
    • 99.999999999% durability
    • Decoupling of storage (S3) to compute (EC2, Amazon Athena, Amazon Redshift Spectrum, Amazon Rekognition, and AWS Glue)
  • Centralized Architecture: All your data can be in one place
  • Object Storage => Supports any file format (Common format for ML: CSV, JSON)

AWS S3 Data Partitioning

  • Pattern for speeding up range queries (ex: AWS Athena)
  • You can define whatever partitioning strategy you like
  • Data Partitioning will be handled by some tools we use (e.g. AWS Glue)

S3 Encryption for Objects

There are 4 methods of encrypting objects in S3

  • SSE-S3: encrypts S3 objects using keys handled & managed by AWS
  • SSE-KMS: use AWS Key Management Service to manage encryption keys
    • Additional security (user must have access to KMS key)
    • Audit trail for KMS key usage
  • SSE-C: when you want to manage your own encryption keys
  • Client Side Encryption

From an ML perspective, SSE-S3 and SSE-KMS will be most likely used.

AWS Kinesis


References

Dive Into Exam

AWS Certified Machine Learning (Specialty) - Topic 1: Data Engineering

http://vincentgaohj.github.io/Blog/2021/04/18/AWS-Machine-Learning-Specialty-1-Data-Engineer/

Author

Haojun(Vincent) Gao

Posted on

2021-04-18

Updated on

2022-02-22

Licensed under

Comments