Skip to content

Graded Quiz: Data Handling :AI Capstone Project with Deep Learning (IBM AI Engineering Professional Certificate) Answers 2025

 

1. How should Sarah decide between bulk loading and sequential loading?

❌ Use bulk loading to minimize memory usage and maximize training speed.
❌ Choose bulk loading to ensure both low memory usage and slow training speed.
Select sequential loading for lower memory usage but potentially slower training speed.
❌ Sequential loading is preferred for faster training speeds in all scenarios.

Explanation:
Sequential loading streams data batch-by-batch, saving memory but potentially slowing training. Bulk loading is faster but consumes more RAM.


2. Key consideration for data augmentation in Keras?

❌ Ensure that the augmented data does not exceed the original dataset size.
Augmentation techniques should be relevant and enhance the model’s ability to generalize.
❌ Keras automatically handles all augmentation processes without any user input.
❌ Data augmentation should be applied after the model training is complete.

Explanation:
Augmentation should meaningfully represent real-world variations so the model generalizes better.


3. What should Emily consider when using Dataset + DataLoader in PyTorch?

DataLoader can be used to manage memory usage by controlling batch size and data shuffling.
❌ The DataLoader class automatically determines the optimal batch size.
❌ The Dataset class is responsible for the model’s training speed.
❌ The Dataset class should be used for data augmentation only.

Explanation:
DataLoader handles batching, parallelism, and shuffling, which impact memory efficiency and training behavior.


4. Why is os.path.join() preferred?

It ensures platform-independent path separators.
❌ It converts relative paths to absolute paths.
❌ It automatically downloads the file.
❌ It validates that the file exists.

Explanation:
os.path.join() ensures that the correct / or \ is used depending on Windows, Linux, or macOS.


**5. Fill in the blank:

The generator reshuffles indices at the start of _________.**
❌ The script
❌ Validation only
Each epoch
❌ Each batch

Explanation:
Shuffling once per epoch ensures randomness while keeping each epoch consistent.


6. Which normalization matches img = img / 2 + 0.5?

Normalize(mean=[0.5,0.5,0.5], std=[0.5,0.5,0.5])
❌ Normalize(mean=[0.5], std=[2.0])
❌ No normalization was applied
❌ Normalize(mean=[0.0], std=[1.0])

Explanation:
If data was normalized as (x - 0.5)/0.5, denormalization is img*0.5 + 0.5 → implemented here as img/2 + 0.5.


7. How does ImageFolder determine class labels?

The alphabetical order of immediate subfolder names
❌ Reading a CSV file called labels.csv
❌ EXIF metadata inside each image
❌ Hashing file names

Explanation:
PyTorch assigns class indices based strictly on alphabetical folder names inside the top-level directory.


🧾 Summary Table

Q# Correct Answer Key Concept
1 Sequential loading Memory vs speed tradeoff
2 Relevant augmentation for generalization Proper augmentation usage
3 DataLoader manages batching & shuffling Efficient PyTorch pipelines
4 Platform-independent paths Using os.path.join()
5 Each epoch Shuffling strategy
6 Normalize(mean=0.5, std=0.5) Normalization/denormalization
7 Alphabetical subfolders ImageFolder label mapping