Skip to content

Graded Quiz: Parameter Efficient Fine-Tuning (PEFT) :Generative AI Engineering and Fine-Tuning Transformers (IBM AI Engineering Professional Certificate) Answers 2025

1. What does QLoRA use to minimize memory during fine-tuning?

❌ Zero-shot inference
❌ Few-shot inference
4-bit quantization
❌ LoRA adaptation

Explanation:
QLoRA uses 4-bit quantization to compress model weights and drastically reduce GPU memory usage during fine-tuning.


2. Which technique adds low-rank matrices to reduce trainable parameters?

❌ Soft prompts
❌ Full fine-tuning
LoRA
❌ Additive fine-tuning

Explanation:
LoRA uses trainable low-rank matrices to update only a tiny subset of parameters while keeping the original model frozen.


3. How does adding low-rank matrices affect parameter efficiency?

❌ Replaces full weight matrices
Adds a very small number of trainable parameters to the existing weights
❌ Tracks original parameter count
❌ Increases trainable parameters with high-rank matrices

Explanation:
LoRA keeps the original weights frozen and adds a minimal low-rank update matrix → extremely parameter-efficient.


4. Why apply LoRA to a BERT-like model on HuggingFace?

LoRA integrates low-rank matrices into selected modules via PEFT, configured using TrainingArguments.
❌ Tailors all model layers & disables dropout
❌ Trains from scratch
❌ Removes tokenization & transformer blocks

Explanation:
LoRA via HuggingFace PEFT modifies only certain layers (e.g., attention projections), enabling memory-efficient fine-tuning.


5. What mechanism enables fast & memory-efficient LoRA fine-tuning?

❌ Quantization (QLoRA uses this, LoRA alone does not)
❌ Maintains full network from scratch
❌ Small batch engineering
Freezes the main weights and updates only the added low-rank matrices

Explanation:
LoRA’s core idea: freeze the large pretrained model and train only tiny low-rank adapters.


🧾 Summary Table

Q# Correct Answer Key Concept
1 4-bit quantization QLoRA memory savings
2 LoRA Low-rank training
3 Add small low-rank matrices Parameter efficiency
4 LoRA via PEFT Efficient fine-tuning
5 Freeze weights + train low-rank updates LoRA mechanism