Graded Quiz: Parameter Efficient Fine-Tuning (PEFT) :Generative AI Engineering and Fine-Tuning Transformers (IBM AI Engineering Professional Certificate) Answers 2025
1. What does QLoRA use to minimize memory during fine-tuning?
❌ Zero-shot inference
❌ Few-shot inference
✅ 4-bit quantization
❌ LoRA adaptation
Explanation:
QLoRA uses 4-bit quantization to compress model weights and drastically reduce GPU memory usage during fine-tuning.
2. Which technique adds low-rank matrices to reduce trainable parameters?
❌ Soft prompts
❌ Full fine-tuning
✅ LoRA
❌ Additive fine-tuning
Explanation:
LoRA uses trainable low-rank matrices to update only a tiny subset of parameters while keeping the original model frozen.
3. How does adding low-rank matrices affect parameter efficiency?
❌ Replaces full weight matrices
✅ Adds a very small number of trainable parameters to the existing weights
❌ Tracks original parameter count
❌ Increases trainable parameters with high-rank matrices
Explanation:
LoRA keeps the original weights frozen and adds a minimal low-rank update matrix → extremely parameter-efficient.
4. Why apply LoRA to a BERT-like model on HuggingFace?
✅ LoRA integrates low-rank matrices into selected modules via PEFT, configured using TrainingArguments.
❌ Tailors all model layers & disables dropout
❌ Trains from scratch
❌ Removes tokenization & transformer blocks
Explanation:
LoRA via HuggingFace PEFT modifies only certain layers (e.g., attention projections), enabling memory-efficient fine-tuning.
5. What mechanism enables fast & memory-efficient LoRA fine-tuning?
❌ Quantization (QLoRA uses this, LoRA alone does not)
❌ Maintains full network from scratch
❌ Small batch engineering
✅ Freezes the main weights and updates only the added low-rank matrices
Explanation:
LoRA’s core idea: freeze the large pretrained model and train only tiny low-rank adapters.
🧾 Summary Table
| Q# | Correct Answer | Key Concept |
|---|---|---|
| 1 | 4-bit quantization | QLoRA memory savings |
| 2 | LoRA | Low-rank training |
| 3 | Add small low-rank matrices | Parameter efficiency |
| 4 | LoRA via PEFT | Efficient fine-tuning |
| 5 | Freeze weights + train low-rank updates | LoRA mechanism |