Quiz 1:Practical Machine Learning (Data Science Specialization) Answers 2025

1. Question 1

Which of the following are components in building a machine learning algorithm?

✅ Statistical inference
✅ Collecting data to answer the question
✅ Training and test sets
❌ Artificial intelligence
❌ Machine learning

Explanation:
Machine learning uses statistical inference, data collection, and splitting into training/testing sets to build predictive models. “Artificial intelligence” and “machine learning” are broader umbrella terms, not components themselves.

2. Question 2

Suppose we build a prediction algorithm that is 100% accurate on the training data. Why might it not work well on new data?

✅ Our algorithm may be overfitting the training data, predicting both the signal and the noise.
❌ We have used neural networks which have notoriously bad performance.
❌ We may be using bad variables that don’t explain the outcome.
❌ We are not asking a relevant question that can be answered with machine learning.

Explanation:
Overfitting occurs when the model memorizes noise in the training data instead of learning general patterns — resulting in poor generalization on new data.

3. Question 3

What are typical sizes for the training and test sets?

✅ 80% training set, 20% test set
✅ 90% training set, 10% test set
❌ 0% training set, 100% test set
❌ 50% in the training set, 50% in the testing set

Explanation:
A typical split keeps most data (80–90%) for training to fit the model and 10–20% for testing to evaluate performance. Exact ratios vary by dataset size.

4. Question 4

What are common error metrics for predicting binary variables?

✅ Accuracy
❌ Root mean squared error
❌ Correlation
❌ Median absolute deviation
❌ R²

Explanation:
For binary outcomes (yes/no, clicked/didn’t click), typical performance metrics include accuracy, sensitivity, specificity, precision, and AUC — not regression-based metrics like RMSE or R².

5. Question 5

A link is clicked on 1 in 1,000 visits (prevalence = 0.001). Model has 99% sensitivity and 99% specificity.
If model predicts “clicked,” what’s the probability it actually is clicked?

✅ 9%
❌ 89.9%
❌ 90%
❌ 50%

Explanation:
Using Bayes’ theorem:

$Click)=0.99×0.0010.99×0.001+0.01×0.999≈0.09P(\text{Clicked | Predicted Click}) = \frac{0.99 × 0.001}{0.99×0.001 + 0.01×0.999} ≈ 0.09$

So the positive predictive value (PPV) ≈ 9%. Even with high sensitivity/specificity, rare events yield many false positives.

🧾 Summary Table

Q#	✅ Correct Answer(s)	Key Concept
1	Statistical inference; Data collection; Training/test sets	ML building components
2	Overfitting explains poor generalization	Overfitting and generalization
3	80/20 or 90/10 split	Common dataset partition ratios
4	Accuracy	Binary classification error metric
5	9%	Bayes theorem & base rate effect