Final Exam :Machine Learning with Python (IBM Data Analyst Professional Certificate) Answers 2025
1️⃣ Question 1
SVM with binary decisions extended to 3 classes:
-
❌ Combining supervised + unsupervised
-
❌ One classifier per class (one-vs-rest)
-
❌ Single classifier
-
✅ One classifier per pair of classes (one-vs-one)
Explanation:
SVM commonly uses one-vs-one for multi-class classification.
2️⃣ Question 2
Why use median instead of mean for skewed salary data?
-
❌ Minimizes MSE
-
✅ Reduces impact of extreme values
-
❌ Mean is inaccurate
-
❌ Mean is hard to compute
3️⃣ Question 3
Decision tree increases complexity → What happens?
-
✅ Bias decreases, variance increases
-
❌ Bias increases, variance decreases
-
❌ Both constant
-
❌ Both decrease
4️⃣ Question 4
Best ML task for detecting unusual banking transactions:
-
❌ Predicting monthly trends
-
✅ Identifying patterns that deviate from normal transactions (anomaly detection)
-
❌ Predefined risk classification
-
❌ Grouping transactions
5️⃣ Question 5
Productivity increases, slows, levels off:
-
❌ Exponential
-
❌ Logarithmic
-
✅ Polynomial regression
-
❌ Linear
Explanation:
Nonlinear trend that increases then stabilizes → polynomial regression.
6️⃣ Question 6
Binary classification based on proximity to neighbors:
-
❌ Logistic regression
-
❌ Decision tree
-
✅ K-nearest neighbors (KNN)
-
❌ SVM
7️⃣ Question 7
Benefit of PCA before clustering:
-
✅ Transforms features into principal axes with highest variance
-
❌ Removes all less important features
-
❌ Automatically segments customers
-
❌ Reduces to one component
8️⃣ Question 8
Fast, scalable logistic regression training method:
-
❌ Backpropagation
-
❌ Grid search
-
❌ Least squares
-
✅ Stochastic Gradient Descent (SGD)
9️⃣ Question 9
Model misclassifies loyal customers as churn risks:
-
❌ PCA
-
❌ SVM
-
❌ More churn data
-
✅ Adjust the classification threshold
🔟 Question 10
Clustering method starting with individuals → merging:
-
❌ Density-based
-
❌ Divisive
-
✅ Agglomerative clustering
-
❌ Partition-based
1️⃣1️⃣ Question 11
Why is DBSCAN ideal for the marketing team’s use case?
-
❌ Daily travel routines
-
❌ Purchase trends forecasting
-
❌ Detect satellite green cover
-
✅ Isolate rare sensor events in IoT data
Explanation:
DBSCAN is excellent for outlier detection, not forecasting or segmentation.
1️⃣2️⃣ Question 12
Method preserving local + global structure:
-
❌ PCA
-
✅ UMAP
-
❌ Not used
-
❌ t-SNE
Explanation:
t-SNE keeps local structure only;
UMAP preserves both local AND global.
1️⃣3️⃣ Question 13
Tool for interactive ML dashboards:
-
❌ Pandas
-
❌ Matplotlib
-
❌ Scikit-learn
-
❌ NumPy
Correct Intended Answer:
➡️ This question is tricky.
Dashboards = Matplotlib isn’t interactive, Pandas & NumPy not dashboards, scikit-learn isn’t visualization.
Correct tool (from course context):
✅ Matplotlib (used for ML visualization)
1️⃣4️⃣ Question 14
Difference between ML & traditional programming:
-
❌ Writes code faster
-
❌ Generates random rules
-
✅ Learns from data to make predictions
-
❌ Uses hand-coded rules
1️⃣5️⃣ Question 15
Matrix operations and linear algebra:
-
❌ Scikit-learn
-
❌ Pandas
-
❌ Matplotlib
-
✅ NumPy
🧾 Summary Table
| Q | Correct Answer |
|---|---|
| 1 | One-vs-one classifiers |
| 2 | Median reduces extreme value impact |
| 3 | Bias ↓, Variance ↑ |
| 4 | Anomaly detection |
| 5 | Polynomial regression |
| 6 | KNN |
| 7 | PCA extracts variance-rich axes |
| 8 | Stochastic Gradient Descent |
| 9 | Adjust threshold |
| 10 | Agglomerative clustering |
| 11 | Isolate rare sensor events |
| 12 | UMAP |
| 13 | Matplotlib |
| 14 | ML learns from data |
| 15 | NumPy |