Quiz 1:Practical Machine Learning (Data Science Specialization) Answers 2025
1. Question 1
Which of the following are components in building a machine learning algorithm?
✅ Statistical inference
✅ Collecting data to answer the question
✅ Training and test sets
❌ Artificial intelligence
❌ Machine learning
Explanation:
Machine learning uses statistical inference, data collection, and splitting into training/testing sets to build predictive models. “Artificial intelligence” and “machine learning” are broader umbrella terms, not components themselves.
2. Question 2
Suppose we build a prediction algorithm that is 100% accurate on the training data. Why might it not work well on new data?
✅ Our algorithm may be overfitting the training data, predicting both the signal and the noise.
❌ We have used neural networks which have notoriously bad performance.
❌ We may be using bad variables that don’t explain the outcome.
❌ We are not asking a relevant question that can be answered with machine learning.
Explanation:
Overfitting occurs when the model memorizes noise in the training data instead of learning general patterns — resulting in poor generalization on new data.
3. Question 3
What are typical sizes for the training and test sets?
✅ 80% training set, 20% test set
✅ 90% training set, 10% test set
❌ 0% training set, 100% test set
❌ 50% in the training set, 50% in the testing set
Explanation:
A typical split keeps most data (80–90%) for training to fit the model and 10–20% for testing to evaluate performance. Exact ratios vary by dataset size.
4. Question 4
What are common error metrics for predicting binary variables?
✅ Accuracy
❌ Root mean squared error
❌ Correlation
❌ Median absolute deviation
❌ R²
Explanation:
For binary outcomes (yes/no, clicked/didn’t click), typical performance metrics include accuracy, sensitivity, specificity, precision, and AUC — not regression-based metrics like RMSE or R².
5. Question 5
A link is clicked on 1 in 1,000 visits (prevalence = 0.001). Model has 99% sensitivity and 99% specificity.
If model predicts “clicked,” what’s the probability it actually is clicked?
✅ 9%
❌ 89.9%
❌ 90%
❌ 50%
Explanation:
Using Bayes’ theorem:
P(Clicked | Predicted Click)=0.99×0.0010.99×0.001+0.01×0.999≈0.09P(\text{Clicked | Predicted Click}) = \frac{0.99 × 0.001}{0.99×0.001 + 0.01×0.999} ≈ 0.09
So the positive predictive value (PPV) ≈ 9%. Even with high sensitivity/specificity, rare events yield many false positives.
🧾 Summary Table
| Q# | ✅ Correct Answer(s) | Key Concept |
|---|---|---|
| 1 | Statistical inference; Data collection; Training/test sets | ML building components |
| 2 | Overfitting explains poor generalization | Overfitting and generalization |
| 3 | 80/20 or 90/10 split | Common dataset partition ratios |
| 4 | Accuracy | Binary classification error metric |
| 5 | 9% | Bayes theorem & base rate effect |