Final Exam:Machine Learning with Python (IBM AI Engineering Professional Certificate) Answers 2025
1. Question 1
SVM multi-class classification strategy?
-
❌ Combine supervised + unsupervised
-
❌ One classifier per class
-
❌ Single combined classifier
-
✅ One classifier per pair of classes (One-vs-One)
Explanation:
Binary SVMs → use One-vs-One for multi-class problems.
2. Question 2
Why use median at leaf nodes with skewed salary data?
-
❌ Minimizes MSE
-
✅ Reduces impact of extreme values
-
❌ Mean is inaccurate
-
❌ Mean is hard to compute
Explanation:
Median is robust against outliers in skewed data.
3. Question 3
Effect of increasing decision tree complexity?
-
✅ Bias decreases, variance increases
-
❌ Bias increases
-
❌ Both constant
-
❌ Both decrease
Explanation:
More complex trees → fit training data better → overfitting.
4. Question 4
Finding unusual transactions?
-
❌ Predicting trends
-
✅ Identifying patterns that deviate from normal transactions (Anomaly Detection)
-
❌ Predefined classification
-
❌ Simple grouping
Explanation:
Goal = detect outliers/anomalies.
5. Question 5
Productivity increases → slows → stabilizes. Best regression?
-
❌ Exponential
-
❌ Logarithmic
-
✅ Polynomial regression
-
❌ Linear
Explanation:
Nonlinear curve with rise → plateau → polynomial fits well.
6. Question 6
Binary classification based on proximity?
-
❌ Logistic regression
-
❌ Decision tree
-
✅ K-nearest neighbors (KNN)
-
❌ SVM
Explanation:
KNN classifies based on nearest neighbors.
7. Question 7
Advantage of PCA before clustering?
-
✅ Transforms into high-variance principal axes revealing key features
-
❌ Removes all unimportant features
-
❌ Automatically segments
-
❌ Reduces to one component
Explanation:
PCA keeps maximum variance directions → simplifies clustering.
8. Question 8
Faster alternative to gradient descent for large datasets?
-
❌ Backpropagation
-
❌ Grid search
-
❌ Least squares
-
✅ Stochastic Gradient Descent (SGD)
Explanation:
SGD updates weights using small batches → much faster.
9. Question 9
Model misclassifies loyal customers as churn risks — fix?
-
❌ PCA
-
❌ Use SVM
-
❌ Add more churn data
-
✅ Adjust the classification threshold
Explanation:
Shifting threshold reduces false positives.
10. Question 10
Start with each customer as its own cluster → merge upward?
-
❌ Density-based
-
❌ Divisive
-
✅ Agglomerative clustering
-
❌ Partition-based
Explanation:
Agglomerative = bottom-up clustering.
11. Question 11
Why is DBSCAN ideal?
-
❌ Daily travel routines
-
❌ Forecast purchase trends
-
❌ Satellite green cover
-
✅ To isolate rare sensor events in IoT data
Explanation:
DBSCAN excels at detecting outliers/anomalies.
12. Question 12
Preserve local AND global structure in high-dimensional data?
-
❌ PCA
-
✅ UMAP
-
❌ Dimensionality reduction not used
-
❌ t-SNE
Explanation:
UMAP preserves both local + global structure better than t-SNE.
13. Question 13
Tool for visualizing ML insights?
-
❌ Pandas
-
✅ Matplotlib
-
❌ Scikit-learn
-
❌ NumPy
Explanation:
Matplotlib is the core Python visualization library.
14. Question 14
How is ML different from traditional programming?
-
❌ Writes code faster
-
❌ Generates random rules
-
✅ Learns from data to make predictions
-
❌ Hand-coded trees
Explanation:
ML learns patterns instead of using explicit rules.
15. Question 15
Library for matrix operations and linear algebra?
-
❌ Scikit-learn
-
❌ Pandas
-
❌ Matplotlib
-
✅ NumPy
Explanation:
NumPy = fast vectorized operations + linear algebra.
🧾 Summary Table
| Q# | Correct Answer | Key Concept |
|---|---|---|
| 1 | One-vs-One | SVM multi-class |
| 2 | Median reduces impact of outliers | Skewed data |
| 3 | Bias ↓, Variance ↑ | Overfitting trees |
| 4 | Detect unusual transactions | Anomaly detection |
| 5 | Polynomial regression | Nonlinear productivity curve |
| 6 | KNN | Proximity-based classification |
| 7 | PCA finds variance-rich axes | Dimensionality reduction |
| 8 | SGD | Fast optimization |
| 9 | Adjust threshold | Reduce false positives |
| 10 | Agglomerative | Hierarchical clustering |
| 11 | Isolate rare sensor events | DBSCAN |
| 12 | UMAP | Preserve local + global structure |
| 13 | Matplotlib | Visualization |
| 14 | Learns from data | ML vs Programming |
| 15 | NumPy | Matrix & algebra operations |