Knowledge Check :Building Data Lakes on AWS (AWS Cloud Solutions Architect Professional Certificate) Answers 2025
1. Question 1 (Select TWO)
Which data formats improve query performance and reduce analytics cost?
-
❌ CSV
-
❌ JSON
-
✅ Apache Parquet
-
✅ Apache ORC
-
❌ XML
Explanation:
Parquet and ORC are columnar, compressed, optimized file formats.
They:
-
reduce storage costs
-
scan only needed columns → faster queries
-
work efficiently with Athena, EMR, Redshift Spectrum, Glue
CSV, JSON, XML are row-based, uncompressed, and more expensive to query.
2. Question 2
Best tool for cleansing/normalizing data without coding
-
❌ AWS Glue Data Catalog
-
✅ AWS Glue DataBrew
-
❌ Amazon Athena
-
❌ AWS Glue ETL (requires coding in PySpark)
Explanation:
AWS Glue DataBrew is a no-code, visual data preparation tool for cleaning, normalizing, and transforming large datasets.
🧾 Summary Table
| Q# | Correct Answer | Key Concept |
|---|---|---|
| 1 | Apache Parquet, Apache ORC | Cost-efficient, optimized data formats |
| 2 | AWS Glue DataBrew | No-code data preparation |