1. Did you implement the `ConvNet` class for the CNN backbone and instantiate it with `num_classes`?

✔️ Yes
❌ No

Explanation:
Building the CNN backbone establishes the feature extractor whose outputs serve as input to the transformer.

2. Did you successfully load the stored `state_dict` and set the CNN to evaluation mode?

✔️ Yes
❌ No

Explanation:
Loading the state dict restores learned weights, and model.eval() freezes dropout and batchnorm for consistent inference.

3. Did you code the `PatchEmbed` layer inheriting from `nn.Module`?

✔️ Yes
❌ No

Explanation:
Patch embedding converts CNN feature maps into patch sequences compatible with transformer attention layers.

4. Did you validate that `PatchEmbed` outputs `(batch, num_patches, embed_dim)`?

✔️ Yes
❌ No

Explanation:
Correct shape ensures the MHSA and transformer blocks receive proper token sequences.

5. Did you implement MHSA with Q, K, V projections?

✔️ Yes
❌ No

Explanation:
MHSA allows the model to capture global dependencies between image patches.

6. Did you define the full `TransformerBlock` with MHSA, LayerNorm, and MLP + dropout?

✔️ Yes
❌ No

Explanation:
This block forms the fundamental unit of transformer-based deep feature processing.

7. Did you assemble the full ViT model with depth, heads, and embedding dimension?

✔️ Yes
❌ No

Explanation:
Complete ViT assembly generates a full transformer suitable for downstream classification.

8. Did you wrap the CNN backbone and ViT into `CNN_ViT_Hybrid` and expose a `forward()` method?

✔️ Yes
❌ No

Explanation:
Integrating CNN + ViT combines local convolutional features with global transformer reasoning.

9. Did you implement the `train()` function to loop batches, compute loss, and update weights?

✔️ Yes
❌ No

Explanation:
A training loop is essential for gradient-based optimization across epochs.

10. Did you implement `evaluate()` under `@torch.no_grad()` to measure validation metrics?

✔️ Yes
❌ No

Explanation:
This makes evaluation faster and memory-efficient by disabling gradient tracking.

11. Did you complete and download the JupyterLab notebook?

✔️ Yes
❌ No

Explanation:
Downloading the notebook is required for final submission and verifies all tasks are completed.

🧾 Summary Table

Q#	Correct Answer	Key Concept
1	Yes	CNN backbone implementation
2	Yes	Restoring weights & eval mode
3	Yes	Patch embedding design
4	Yes	Shape validation for ViT
5	Yes	Multi-head self-attention
6	Yes	Transformer block structure
7	Yes	Full ViT assembly
8	Yes	Hybrid CNN-ViT model
9	Yes	Training loop
10	Yes	Evaluation loop (no gradients)
11	Yes	Notebook completion

Graded Quiz: Checklist: Vision Transformers Using PyTorch :AI Capstone Project with Deep Learning (IBM AI Engineering Professional Certificate) Answers 2025

1. Did you implement the ConvNet class for the CNN backbone and instantiate it with num_classes?

2. Did you successfully load the stored state_dict and set the CNN to evaluation mode?

3. Did you code the PatchEmbed layer inheriting from nn.Module?

4. Did you validate that PatchEmbed outputs (batch, num_patches, embed_dim)?

5. Did you implement MHSA with Q, K, V projections?

6. Did you define the full TransformerBlock with MHSA, LayerNorm, and MLP + dropout?

7. Did you assemble the full ViT model with depth, heads, and embedding dimension?

8. Did you wrap the CNN backbone and ViT into CNN_ViT_Hybrid and expose a forward() method?

9. Did you implement the train() function to loop batches, compute loss, and update weights?

10. Did you implement evaluate() under @torch.no_grad() to measure validation metrics?

11. Did you complete and download the JupyterLab notebook?

🧾 Summary Table

1. Did you implement the `ConvNet` class for the CNN backbone and instantiate it with `num_classes`?

2. Did you successfully load the stored `state_dict` and set the CNN to evaluation mode?

3. Did you code the `PatchEmbed` layer inheriting from `nn.Module`?

4. Did you validate that `PatchEmbed` outputs `(batch, num_patches, embed_dim)`?

6. Did you define the full `TransformerBlock` with MHSA, LayerNorm, and MLP + dropout?

8. Did you wrap the CNN backbone and ViT into `CNN_ViT_Hybrid` and expose a `forward()` method?

9. Did you implement the `train()` function to loop batches, compute loss, and update weights?

10. Did you implement `evaluate()` under `@torch.no_grad()` to measure validation metrics?