AWS Data & ETL Training Master Deck (Editable)
10-Day instructor-led hands-on training — outline & slides
Day 1: AWS Basics & Account Setup
-
Slide 1: Title, Duration, Instructor
Course title slide showing Day 1, total duration for the session, and instructor name.
-
Slide 2: Agenda & Learning Objectives
List the day's agenda and measurable learning objectives (account setup, billing monitoring, MFA, AWS infra concepts).
-
Slide 3: What is Cloud & Why AWS?
High-level cloud concepts, benefits of cloud vs on-prem, reasons to choose AWS (services, scale, ecosystem).
-
Slide 4: AWS Global Infrastructure Diagram
Diagram illustrating Regions, Availability Zones, and Edge Locations with brief notes on use-cases (latency, fault-isolation).
-
Slide 5: AWS Account Setup Steps (screenshots)
Step-by-step account creation guidance with placeholders for screenshots: sign-up, billing info, support plan, root account safety.
-
Slide 6: Hands-on Demo: Billing alarm, MFA
Step-by-step technical tasks students must perform in the lab:
- Enable IAM Billing Access — Console: Account settings → activate IAM access to billing info.
- Create CloudWatch Billing Alarm — Console: CloudWatch → Alarms → Create Alarm → Metric: Billing → Total Estimated Charge; set threshold (e.g. $5) → create SNS topic for email notifications → subscribe student email.
- Enable MFA on Root/Users — Console: IAM → Users → select user (or root) → Security credentials → Manage MFA → choose Virtual MFA → scan QR with Authenticator app (Google Authenticator/Authy) → verify codes.
- Test Access — Demonstrate logging in with an IAM user and validate MFA prompts; verify billing alarm notification by temporarily lowering threshold or using simulated billing metric if available.
# Example AWS CLI (for reference - optional) aws cloudwatch put-metric-alarm \ --alarm-name "EstimatedChargesAlarm" \ --metric-name "EstimatedCharges" \ --namespace "AWS/Billing" \ --statistic Maximum \ --period 21600 \ --evaluation-periods 1 \ --threshold 5 \ --comparison-operator GreaterThanOrEqualToThreshold \ --dimensions Name=Currency,Value=USD \ --alarm-actions arn:aws:sns:us-east-1:123456789012:BillingAlerts -
Slide 7: Summary & Q&A
Recap key takeaways: cloud fundamentals, AWS infra, account safety practices (MFA, billing alarms). Open floor for questions.
Day 2: IAM & Security
-
Slide 1: Agenda & Objectives
Outline of day: IAM concepts, hands-on user & group creation, policies, best practices.
-
Slide 2: IAM Concepts (Users, Groups, Roles, Policies)
Explain IAM building blocks: Users, Groups, Roles, Policies, trust vs permissions.
-
Slide 3: IAM Architecture Diagram
Diagram showing relationship between identities, roles, STS, and resources.
-
Slide 4: Hands-on: Create IAM user/group, attach policy
Lab steps for students:
- Create an IAM group (e.g.,
etl-developers
). - Create an IAM user (e.g.,
student01
) and add to group. - Create and attach an inline or managed policy (least-privilege example: S3 read/write to a specific bucket).
- Test access using AWS CLI with generated access key (recommend temporary credentials or role-based cross-account testing).
- Create an IAM group (e.g.,
-
Slide 5: Best Practices: Least Privilege, MFA
Guidelines: use roles for services, avoid root, enable MFA, rotate keys, use IAM Access Analyzer, and log with CloudTrail.
-
Slide 6: Summary & Q&A
Recap and Q&A.
Day 3: Amazon S3 Basics
-
Slide 1: Agenda & Objectives
Intro to S3, storage classes, basic operations, versioning & lifecycle.
-
Slide 2: S3 Overview (Buckets, Objects, Storage Classes)
Explain buckets, objects, keys, metadata, and storage classes (Standard, Intelligent-Tiering, IA, Glacier).
-
Slide 3: Versioning & Lifecycle Diagram
Diagram and examples of versioning and lifecycle rules to transition objects to cheaper storage.
-
Slide 4: Hands-on: Create bucket, upload/download objects
Lab steps: create bucket, set bucket policy, upload/download via console and CLI, enable versioning.
-
Slide 5: Summary & Q&A
Recap and Q&A.
Day 4: Amazon S3 Advanced
-
Slide 1: Agenda & Objectives
Encryption, bucket policies, event notifications and integration with Lambda/SNS/SQS.
-
Slide 2: Encryption & Security (SSE-S3, SSE-KMS, ACL, Bucket Policy)
Explain server-side encryption options, KMS keys, ACLs vs bucket policies, and public access blocks.
-
Slide 3: Event Notifications Diagram (S3 → Lambda/SNS/SQS)
Diagram showing S3 event notification flows to Lambda, SNS, and SQS for processing pipelines.
-
Slide 4: Hands-on: Trigger Lambda on S3 upload
Lab: create Lambda function, add S3 trigger, upload object to test invocation, view CloudWatch logs.
-
Slide 5: Summary & Q&A
Recap and Q&A.
Day 5: Amazon RDS
-
Slide 1: Agenda & Objectives
Relational databases on AWS, engines, HA patterns, backups and restores.
-
Slide 2: RDS Overview (Engines, Multi-AZ, Read Replica)
Discuss supported engines (MySQL, PostgreSQL, Aurora), Multi-AZ, read replicas, and failover behavior.
-
Slide 3: Security & VPC integration Diagram
Diagram showing RDS inside VPC, subnets, SGs, route for application access, and IAM authentication options.
-
Slide 4: Hands-on: Launch RDS instance, connect & query
Lab: launch a small RDS instance (free tier if available), configure security group, connect via psql/mysql client, run sample queries.
-
Slide 5: Summary & Q&A
Recap and Q&A.
Day 6: AWS Glue Basics & Data Catalog
-
Slide 1: Agenda & Objectives
Intro to Glue, Data Catalog, Crawlers, Jobs and Studio.
-
Slide 2: Glue Architecture Diagram
Architecture showing Glue interacting with S3, Catalog, and compute (Glue jobs).
-
Slide 3: Glue Components (Catalog, Crawler, Jobs, Studio)
Explain each component and how they fit into ETL workflows.
-
Slide 4: Hands-on: Catalog S3 CSV/JSON → Glue table
Lab: create a Glue Crawler to catalogue S3 files and validate the Glue table schema.
-
Slide 5: Query with Athena
Show how to query Glue cataloged tables using Athena.
-
Slide 6: Summary & Q&A
Recap and Q&A.
Day 7: AWS Glue Advanced & PySpark ETL
-
Slide 1: Agenda & Objectives
Advanced Glue topics and PySpark-based ETL jobs.
-
Slide 2: DynamicFrame vs DataFrame Diagram
Explain differences, when to use DynamicFrame (schema flexibility) vs DataFrame (performance / Spark APIs).
-
Slide 3: PySpark ETL Transformations (filter, join, aggregate)
Common transformations with examples and notes about performance and partitioning.
-
Slide 4: Hands-on Demo: CSV → Parquet → RDS
Lab: run a PySpark job to convert CSV to Parquet, partition data, and (optionally) push results to RDS.
-
Slide 5: Sample PySpark ETL Job (code snippet)
Include a short PySpark snippet in the slide for students to review and run (full code in appendix).# PySpark (Glue) snippet - pseudocode df = spark.read.csv("s3://bucket/raw/data.csv", header=True) df = df.filter("status = 'active'") \ .withColumn("event_date", to_date(col("timestamp"))) df.write.partitionBy("event_date").parquet("s3://bucket/processed/")
-
Slide 6: Integration with Athena
Show how Athena can query the Parquet output using Glue catalog partitions.
-
Slide 7: Summary & Q&A
Recap and Q&A.
Day 8: Amazon Athena
-
Slide 1: Agenda & Objectives
Introduce Athena, cost model, and best practices for querying data lakes.
-
Slide 2: Athena Overview & Cost Model
Explain pay-per-query model (data scanned), partitioning, compression, and reducing cost.
-
Slide 3: Querying Glue tables (SELECT, GROUP BY, partitions)
Examples for common SQL queries over Glue catalog tables and partition-aware queries.
-
Slide 4: Hands-on: Athena SQL Queries
Lab: run sample queries, test performance, and measure scanned bytes for cost awareness.
-
Slide 5: Summary & Q&A
Recap and Q&A.
Day 9: AWS Lambda & CloudWatch
-
Slide 1: Agenda & Objectives
Serverless compute basics, event-driven architecture, monitoring & observability.
-
Slide 2: Lambda Lifecycle Diagram
Diagram: cold start, container reuse, concurrency limits.
-
Slide 3: Triggers: S3, Glue, RDS
Examples of event sources and patterns to invoke Lambda for ETL steps.
-
Slide 4: CloudWatch Metrics, Logs, Alarms
How to instrument Lambda with logs, custom metrics, and alarms for failure/latency.
-
Slide 5: Hands-on: Lambda triggered by S3
Lab: deploy a Python Lambda, configure S3 trigger, upload object to test, observe CloudWatch logs.
-
Slide 6: Sample Python Lambda Code
Example code snippet to include on slide:# sample lambda handler def handler(event, context): for record in event['Records']: key = record['s3']['object']['key'] # process object (e.g., read, transform, write) print(f"Processing {key}")
-
Slide 7: Summary & Q&A
Recap and Q&A.
Day 10: Capstone Project & Wrap-Up
-
Slide 1: Agenda & Objectives
Overview of final integrated pipeline and evaluation criteria for the capstone.
-
Slide 2: End-to-End ETL Pipeline Diagram (S3 → Glue → Athena → RDS)
A diagram showing full flow: data ingest → catalog → transform → query → store and monitor.
-
Slide 3: Step-by-Step Demo Script
Steps for the instructor & students to follow:
- Upload CSV to S3
- Glue Crawler → Catalog
- Glue PySpark ETL → Parquet
- Athena Queries
- Optional: Load into RDS
- CloudWatch Monitoring
-
Slide 4: Summary of Key Takeaways
Highlight the major learnings from the course and recommended next steps/resources.
-
Slide 5: Final Q&A
Open discussion, feedback, and next steps for continued learning.
No comments:
Post a Comment