Skip to content

Knowledge Check :Building Data Lakes on AWS (AWS Cloud Solutions Architect Professional Certificate) Answers 2025

1. Question 1 (Select TWO)

Which data formats improve query performance and reduce analytics cost?

  • ❌ CSV

  • ❌ JSON

  • Apache Parquet

  • Apache ORC

  • ❌ XML

Explanation:
Parquet and ORC are columnar, compressed, optimized file formats.
They:

  • reduce storage costs

  • scan only needed columns → faster queries

  • work efficiently with Athena, EMR, Redshift Spectrum, Glue

CSV, JSON, XML are row-based, uncompressed, and more expensive to query.


2. Question 2

Best tool for cleansing/normalizing data without coding

  • ❌ AWS Glue Data Catalog

  • AWS Glue DataBrew

  • ❌ Amazon Athena

  • ❌ AWS Glue ETL (requires coding in PySpark)

Explanation:
AWS Glue DataBrew is a no-code, visual data preparation tool for cleaning, normalizing, and transforming large datasets.


🧾 Summary Table

Q# Correct Answer Key Concept
1 Apache Parquet, Apache ORC Cost-efficient, optimized data formats
2 AWS Glue DataBrew No-code data preparation