Graded Quiz: Data Handling :AI Capstone Project with Deep Learning (IBM AI Engineering Professional Certificate) Answers 2025
1. How should Sarah decide between bulk loading and sequential loading?
❌ Use bulk loading to minimize memory usage and maximize training speed.
❌ Choose bulk loading to ensure both low memory usage and slow training speed.
✅ Select sequential loading for lower memory usage but potentially slower training speed.
❌ Sequential loading is preferred for faster training speeds in all scenarios.
Explanation:
Sequential loading streams data batch-by-batch, saving memory but potentially slowing training. Bulk loading is faster but consumes more RAM.
2. Key consideration for data augmentation in Keras?
❌ Ensure that the augmented data does not exceed the original dataset size.
✅ Augmentation techniques should be relevant and enhance the model’s ability to generalize.
❌ Keras automatically handles all augmentation processes without any user input.
❌ Data augmentation should be applied after the model training is complete.
Explanation:
Augmentation should meaningfully represent real-world variations so the model generalizes better.
3. What should Emily consider when using Dataset + DataLoader in PyTorch?
✅ DataLoader can be used to manage memory usage by controlling batch size and data shuffling.
❌ The DataLoader class automatically determines the optimal batch size.
❌ The Dataset class is responsible for the model’s training speed.
❌ The Dataset class should be used for data augmentation only.
Explanation:
DataLoader handles batching, parallelism, and shuffling, which impact memory efficiency and training behavior.
4. Why is os.path.join() preferred?
✅ It ensures platform-independent path separators.
❌ It converts relative paths to absolute paths.
❌ It automatically downloads the file.
❌ It validates that the file exists.
Explanation:os.path.join() ensures that the correct / or \ is used depending on Windows, Linux, or macOS.
**5. Fill in the blank:
The generator reshuffles indices at the start of _________.**
❌ The script
❌ Validation only
✅ Each epoch
❌ Each batch
Explanation:
Shuffling once per epoch ensures randomness while keeping each epoch consistent.
6. Which normalization matches img = img / 2 + 0.5?
✅ Normalize(mean=[0.5,0.5,0.5], std=[0.5,0.5,0.5])
❌ Normalize(mean=[0.5], std=[2.0])
❌ No normalization was applied
❌ Normalize(mean=[0.0], std=[1.0])
Explanation:
If data was normalized as (x - 0.5)/0.5, denormalization is img*0.5 + 0.5 → implemented here as img/2 + 0.5.
7. How does ImageFolder determine class labels?
✅ The alphabetical order of immediate subfolder names
❌ Reading a CSV file called labels.csv
❌ EXIF metadata inside each image
❌ Hashing file names
Explanation:
PyTorch assigns class indices based strictly on alphabetical folder names inside the top-level directory.
🧾 Summary Table
| Q# | Correct Answer | Key Concept |
|---|---|---|
| 1 | Sequential loading | Memory vs speed tradeoff |
| 2 | Relevant augmentation for generalization | Proper augmentation usage |
| 3 | DataLoader manages batching & shuffling | Efficient PyTorch pipelines |
| 4 | Platform-independent paths | Using os.path.join() |
| 5 | Each epoch | Shuffling strategy |
| 6 | Normalize(mean=0.5, std=0.5) | Normalization/denormalization |
| 7 | Alphabetical subfolders | ImageFolder label mapping |