Skip to content

Graded Quiz: Checklist: Vision Transformers Using PyTorch :AI Capstone Project with Deep Learning (IBM AI Engineering Professional Certificate) Answers 2025

1. Did you implement the ConvNet class for the CNN backbone and instantiate it with num_classes?

✔️ Yes
❌ No

Explanation:
Building the CNN backbone establishes the feature extractor whose outputs serve as input to the transformer.


2. Did you successfully load the stored state_dict and set the CNN to evaluation mode?

✔️ Yes
❌ No

Explanation:
Loading the state dict restores learned weights, and model.eval() freezes dropout and batchnorm for consistent inference.


3. Did you code the PatchEmbed layer inheriting from nn.Module?

✔️ Yes
❌ No

Explanation:
Patch embedding converts CNN feature maps into patch sequences compatible with transformer attention layers.


4. Did you validate that PatchEmbed outputs (batch, num_patches, embed_dim)?

✔️ Yes
❌ No

Explanation:
Correct shape ensures the MHSA and transformer blocks receive proper token sequences.


5. Did you implement MHSA with Q, K, V projections?

✔️ Yes
❌ No

Explanation:
MHSA allows the model to capture global dependencies between image patches.


6. Did you define the full TransformerBlock with MHSA, LayerNorm, and MLP + dropout?

✔️ Yes
❌ No

Explanation:
This block forms the fundamental unit of transformer-based deep feature processing.


7. Did you assemble the full ViT model with depth, heads, and embedding dimension?

✔️ Yes
❌ No

Explanation:
Complete ViT assembly generates a full transformer suitable for downstream classification.


8. Did you wrap the CNN backbone and ViT into CNN_ViT_Hybrid and expose a forward() method?

✔️ Yes
❌ No

Explanation:
Integrating CNN + ViT combines local convolutional features with global transformer reasoning.


9. Did you implement the train() function to loop batches, compute loss, and update weights?

✔️ Yes
❌ No

Explanation:
A training loop is essential for gradient-based optimization across epochs.


10. Did you implement evaluate() under @torch.no_grad() to measure validation metrics?

✔️ Yes
❌ No

Explanation:
This makes evaluation faster and memory-efficient by disabling gradient tracking.


11. Did you complete and download the JupyterLab notebook?

✔️ Yes
❌ No

Explanation:
Downloading the notebook is required for final submission and verifies all tasks are completed.


🧾 Summary Table

Q# Correct Answer Key Concept
1 Yes CNN backbone implementation
2 Yes Restoring weights & eval mode
3 Yes Patch embedding design
4 Yes Shape validation for ViT
5 Yes Multi-head self-attention
6 Yes Transformer block structure
7 Yes Full ViT assembly
8 Yes Hybrid CNN-ViT model
9 Yes Training loop
10 Yes Evaluation loop (no gradients)
11 Yes Notebook completion