Graded Quiz: Checklist: Vision Transformers Using PyTorch :AI Capstone Project with Deep Learning (IBM AI Engineering Professional Certificate) Answers 2025
1. Did you implement the ConvNet class for the CNN backbone and instantiate it with num_classes?
✔️ Yes
❌ No
Explanation:
Building the CNN backbone establishes the feature extractor whose outputs serve as input to the transformer.
2. Did you successfully load the stored state_dict and set the CNN to evaluation mode?
✔️ Yes
❌ No
Explanation:
Loading the state dict restores learned weights, and model.eval() freezes dropout and batchnorm for consistent inference.
3. Did you code the PatchEmbed layer inheriting from nn.Module?
✔️ Yes
❌ No
Explanation:
Patch embedding converts CNN feature maps into patch sequences compatible with transformer attention layers.
4. Did you validate that PatchEmbed outputs (batch, num_patches, embed_dim)?
✔️ Yes
❌ No
Explanation:
Correct shape ensures the MHSA and transformer blocks receive proper token sequences.
5. Did you implement MHSA with Q, K, V projections?
✔️ Yes
❌ No
Explanation:
MHSA allows the model to capture global dependencies between image patches.
6. Did you define the full TransformerBlock with MHSA, LayerNorm, and MLP + dropout?
✔️ Yes
❌ No
Explanation:
This block forms the fundamental unit of transformer-based deep feature processing.
7. Did you assemble the full ViT model with depth, heads, and embedding dimension?
✔️ Yes
❌ No
Explanation:
Complete ViT assembly generates a full transformer suitable for downstream classification.
8. Did you wrap the CNN backbone and ViT into CNN_ViT_Hybrid and expose a forward() method?
✔️ Yes
❌ No
Explanation:
Integrating CNN + ViT combines local convolutional features with global transformer reasoning.
9. Did you implement the train() function to loop batches, compute loss, and update weights?
✔️ Yes
❌ No
Explanation:
A training loop is essential for gradient-based optimization across epochs.
10. Did you implement evaluate() under @torch.no_grad() to measure validation metrics?
✔️ Yes
❌ No
Explanation:
This makes evaluation faster and memory-efficient by disabling gradient tracking.
11. Did you complete and download the JupyterLab notebook?
✔️ Yes
❌ No
Explanation:
Downloading the notebook is required for final submission and verifies all tasks are completed.
🧾 Summary Table
| Q# | Correct Answer | Key Concept |
|---|---|---|
| 1 | Yes | CNN backbone implementation |
| 2 | Yes | Restoring weights & eval mode |
| 3 | Yes | Patch embedding design |
| 4 | Yes | Shape validation for ViT |
| 5 | Yes | Multi-head self-attention |
| 6 | Yes | Transformer block structure |
| 7 | Yes | Full ViT assembly |
| 8 | Yes | Hybrid CNN-ViT model |
| 9 | Yes | Training loop |
| 10 | Yes | Evaluation loop (no gradients) |
| 11 | Yes | Notebook completion |