Knowledge Check :Building Data Lakes on AWS (AWS Cloud Solutions Architect Professional Certificate) Answers 2025
1. Question 1
Function of a data lake as a centralized repository
-
❌ Store unstructured data from a single source
-
❌ Store structured data from any source
-
✅ Store structured and unstructured data from any source
-
❌ Store structured and unstructured data from a single source
Explanation:
A data lake ingests all data types (structured, semi-structured, unstructured) from any source in raw form.
2. Question 2
Most cost-effective storage for a data lake
-
❌ Amazon EBS
-
✅ Amazon S3
-
❌ Amazon RDS
-
❌ Amazon Redshift
Explanation:
Amazon S3 is the industry-standard storage layer for data lakes due to low cost, massive durability, scalability, and deep integration with analytics tools.
3. Question 3 (Select TWO)
Services used in the processing layer of a data lake
-
❌ AWS Snowball
-
✅ AWS Glue
-
✅ Amazon EMR
-
❌ Amazon QuickSight
-
❌ Amazon Athena
Explanation:
Processing layer = ETL + big data processing
-
AWS Glue → serverless ETL
-
Amazon EMR → big data frameworks (Spark, Hadoop, Hive)
Athena is a query layer, QuickSight is visualization, Snowball is for data transfer, not processing.
🧾 Summary Table
| Q# | Correct Answer | Key Concept |
|---|---|---|
| 1 | Store structured & unstructured data from any source | Data lake definition |
| 2 | Amazon S3 | Data lake storage |
| 3 | AWS Glue & Amazon EMR | Data processing layer |