Graded Quiz: Model Evaluation and Refinement :Data Analysis with Python (IBM Data Analyst Professional Certificate) Answers 2025
1. Question 1
What doescross_val_predict(lr_model, X_train, y_train, cv=3)
return?
-
❌ Predicted values of the test set
-
❌ List of residual errors
-
❌ Average R² score
-
✅ Predicted values for each training point using cross-validation
Explanation:
cross_val_predict returns cross-validated predictions, not scores or residuals.
2. Question 2
Correct way to define a list of alpha values for Ridge regression grid search:
-
✅ parameter = [{‘alpha’: [1, 10, 100]}]
-
❌ grid = alpha:[1,10,100]
-
❌ alpha = Ridge([1, 10, 100])
-
❌ parameter = [alpha: 1, 10, 100]
Explanation:
GridSearchCV requires a dictionary inside a list with parameter name as key.
3. Question 3
R² = 0.99 with a 100-degree polynomial → check for overfitting by:
-
✅ Evaluate the model on the test dataset
-
❌ Reduce features first
-
❌ Use cross_val_predict on training
-
❌ Select model based only on training score
Explanation:
Overfitting is detected when training score is high but test score is low.
4. Question 4
Why choose Ridge Regression?
-
✅ To reduce overfitting by penalizing large coefficients
-
❌ To remove irrelevant features
-
❌ To increase flexibility
-
❌ To reduce complexity only
Explanation:
Ridge adds L2 regularization, shrinking coefficients to prevent overfitting.
5. Question 5
(Image: blue curve follows noise, orange is true function)
-
❌ Good fit
-
❌ No conclusion
-
✅ It displays overfitting
-
❌ Underfitting
Explanation:
The blue curve “wiggles” excessively and follows noise → classic overfitting.
🧾 Summary Table
| Q | Correct Answer | Key Concept |
|---|---|---|
| 1 | Predicted CV values | cross_val_predict output |
| 2 | [{‘alpha’: [1,10,100]}] | GridSearch parameters |
| 3 | Evaluate on test set | Detecting overfitting |
| 4 | Penalize large coefficients | Ridge regression benefit |
| 5 | Overfitting | Model follows noise |