Graded Quiz: Evaluating and Validating Machine Learning Models :Machine Learning with Python (IBM AI Engineering Professional Certificate) Answers 2025
1. Question 1
In medical diagnosis, which metric is most important?
-
✅ Recall
-
❌ Accuracy
-
❌ F1 Score
-
❌ Precision
Explanation:
Recall measures how many actual positive cases were correctly detected.
In healthcare, missing a true positive can be dangerous, so recall is critical.
2. Question 2
Which regression metric is the square root of MSE?
-
❌ R-squared
-
❌ MAE
-
❌ MAE
-
✅ Root Mean Squared Error (RMSE)
Explanation:
RMSE = √MSE, giving error in the same unit as the target variable.
3. Question 3
Best metric to evaluate cluster separation?
-
✅ Silhouette score
-
❌ Elbow method
-
❌ Evaluation method
-
❌ Davies-Bouldin Index
Explanation:
Silhouette score measures cohesion vs separation.
Higher = better-separated clusters.
4. Question 4
Model performs well on training but poorly on test data → problem?
-
✅ Overfitting
-
❌ Cross-validation
-
❌ Train-test split
-
❌ Data snooping
Explanation:
Overfitting = model memorizes training data but fails to generalize.
5. Question 5
Difference between Lasso and Ridge?
-
❌ Lasso only for feature selection
-
✅ Lasso uses L1 penalty, Ridge uses L2 penalty
-
❌ Lasso uses larger datasets
-
❌ Ridge uses L1
Explanation:
Lasso (L1) → can shrink coefficients to zero (feature selection).
Ridge (L2) → shrinks coefficients but not to zero.
6. Question 6
How to mitigate data leakage?
-
❌ Use single feature
-
✅ Avoid including features derived from the entire dataset
-
❌ Shuffle data
-
❌ Ensure proper splits only
Explanation:
If a feature is computed using global dataset knowledge (e.g., overall average), it leaks test information into training.
7. Question 7
Interpreting feature importance without checking relationships leads to:
-
❌ Missing scaling
-
❌ Using minimal-target features
-
❌ Using uncorrelated features only
-
✅ Overlooking correlated features in importance scores
Explanation:
If features are correlated, importance scores can be misleading (importance gets split among correlated features).
🧾 Summary Table
| Q# | Correct Answer | Key Concept |
|---|---|---|
| 1 | Recall | Avoid missing true positives |
| 2 | RMSE | Square root of MSE |
| 3 | Silhouette score | Cluster separation |
| 4 | Overfitting | Poor generalization |
| 5 | L1 vs L2 | Lasso vs Ridge |
| 6 | Avoid dataset-derived features | Data leakage prevention |
| 7 | Correlated features issue | Feature importance interpretation |