Quiz 4:Practical Machine Learning (Data Science Specialization) Answers 2025
1. Question 1
Train Random Forest (RF) and Boosted Tree (GBM) models on the vowel dataset (classification problem).
✅ RF Accuracy = 0.6082, GBM Accuracy = 0.5152, Agreement Accuracy = 0.6361
❌ RF Accuracy = 0.9987, GBM Accuracy = 0.5152, Agreement Accuracy = 0.9985
❌ RF Accuracy = 0.6082, GBM Accuracy = 0.5152, Agreement Accuracy = 0.5152
❌ RF Accuracy = 0.9881, GBM Accuracy = 0.8371, Agreement Accuracy = 0.9983
Explanation:
When both models (rf and gbm via caret::train) are trained on vowel data with seed 33833:
-
RF ≈ 0.6082 accuracy
-
GBM ≈ 0.5152 accuracy
-
When both agree, accuracy ≈ 0.6361.
This shows agreement yields higher reliability than individual weak predictions.
2. Question 2
Alzheimer’s disease dataset — models: RF, GBM, LDA; stacked using RF.
✅ Stacked Accuracy: 0.80 — better than random forests and lda and the same as boosting.
❌ Stacked Accuracy: 0.80 is better than all three
❌ Stacked Accuracy: 0.76 is better than RF and boosting but not lda
❌ Stacked Accuracy: 0.76 is better than lda but not RF or boosting
Explanation:
Stacking combines model strengths.
-
RF ≈ 0.74
-
GBM ≈ 0.80
-
LDA ≈ 0.76
-
Stacked RF ≈ 0.80, matching GBM, outperforming RF and LDA individually.
3. Question 3
Concrete dataset — Lasso model (lasso via caret or elasticnet).
✅ Cement
❌ CoarseAggregate
❌ Water
❌ Age
Explanation:
In the lasso model, as λ increases, coefficients shrink to zero.
The last variable to remain (strongest effect) is Cement, as it’s the most predictive of compressive strength.
4. Question 4
Tumblr blog visitors — bats() time series forecast model.
✅ 96%
❌ 92%
❌ 94%
❌ 100%
Explanation:
After fitting the bats() model on 2011 and earlier data and forecasting for 2012, about 96% of test observations fall within the 95% prediction interval. This indicates an accurate model with well-calibrated uncertainty.
5. Question 5
Concrete dataset — Support Vector Machine (e1071::svm) regression model, RMSE on test set.
✅ 6.72
❌ 11543.39
❌ 45.09
❌ 6.93
Explanation:
Fitting an SVM model with default kernel (radial) and parameters yields an RMSE ≈ 6.72 — strong predictive performance compared to simpler linear models.
🧾 Summary Table
| Q# | ✅ Correct Answer | Key Concept |
|---|---|---|
| 1 | RF=0.6082, GBM=0.5152, Agreement=0.6361 | Comparing ensemble accuracy |
| 2 | Stacked Accuracy=0.80, matches GBM | Model stacking improves over base learners |
| 3 | Cement | Lasso shrinkage path variable retention |
| 4 | 96% | Forecast coverage accuracy with bats() |
| 5 | 6.72 | SVM regression RMSE on test data |