1. How should Sarah decide between bulk loading and sequential loading?

❌ Use bulk loading to minimize memory usage and maximize training speed.
❌ Choose bulk loading to ensure both low memory usage and slow training speed.
✅ Select sequential loading for lower memory usage but potentially slower training speed.
❌ Sequential loading is preferred for faster training speeds in all scenarios.

Explanation:
Sequential loading streams data batch-by-batch, saving memory but potentially slowing training. Bulk loading is faster but consumes more RAM.

2. Key consideration for data augmentation in Keras?

❌ Ensure that the augmented data does not exceed the original dataset size.
✅ Augmentation techniques should be relevant and enhance the model’s ability to generalize.
❌ Keras automatically handles all augmentation processes without any user input.
❌ Data augmentation should be applied after the model training is complete.

Explanation:
Augmentation should meaningfully represent real-world variations so the model generalizes better.

3. What should Emily consider when using Dataset + DataLoader in PyTorch?

✅ DataLoader can be used to manage memory usage by controlling batch size and data shuffling.
❌ The DataLoader class automatically determines the optimal batch size.
❌ The Dataset class is responsible for the model’s training speed.
❌ The Dataset class should be used for data augmentation only.

Explanation:
DataLoader handles batching, parallelism, and shuffling, which impact memory efficiency and training behavior.

4. Why is `os.path.join()` preferred?

✅ It ensures platform-independent path separators.
❌ It converts relative paths to absolute paths.
❌ It automatically downloads the file.
❌ It validates that the file exists.

Explanation:
os.path.join() ensures that the correct / or \ is used depending on Windows, Linux, or macOS.

**5. Fill in the blank:

The generator reshuffles indices at the start of _________.**
❌ The script
❌ Validation only
✅ Each epoch
❌ Each batch

Explanation:
Shuffling once per epoch ensures randomness while keeping each epoch consistent.

6. Which normalization matches `img = img / 2 + 0.5`?

✅ Normalize(mean=[0.5,0.5,0.5], std=[0.5,0.5,0.5])
❌ Normalize(mean=[0.5], std=[2.0])
❌ No normalization was applied
❌ Normalize(mean=[0.0], std=[1.0])

Explanation:
If data was normalized as (x - 0.5)/0.5, denormalization is img*0.5 + 0.5 → implemented here as img/2 + 0.5.

7. How does ImageFolder determine class labels?

✅ The alphabetical order of immediate subfolder names
❌ Reading a CSV file called labels.csv
❌ EXIF metadata inside each image
❌ Hashing file names

Explanation:
PyTorch assigns class indices based strictly on alphabetical folder names inside the top-level directory.

🧾 Summary Table

Q#	Correct Answer	Key Concept
1	Sequential loading	Memory vs speed tradeoff
2	Relevant augmentation for generalization	Proper augmentation usage
3	DataLoader manages batching & shuffling	Efficient PyTorch pipelines
4	Platform-independent paths	Using `os.path.join()`
5	Each epoch	Shuffling strategy
6	Normalize(mean=0.5, std=0.5)	Normalization/denormalization
7	Alphabetical subfolders	ImageFolder label mapping

Graded Quiz: Data Handling :AI Capstone Project with Deep Learning (IBM AI Engineering Professional Certificate) Answers 2025