Graded Quiz: Data Wrangling :Data Analysis with Python (IBM Data Analyst Professional Certificate) Answers 2025
1. Question 1
Method to replace missing values for continuous attributes:
-
❌ Educated guess
-
❌ Mean square error
-
❌ Min–max difference
-
✅ Use the average of the other values in the column
Explanation:
For continuous variables, the most common imputation method is using the mean (average).
2. Question 2
First step when deciding bin values for continuous data:
-
❌ Divide average by standard deviation
-
✅ Visualize the distribution (e.g., histogram)
-
❌ Convert object types
-
❌ Use IQR
Explanation:
You must first understand the distribution before creating bins.
3. Question 3
Most appropriate data type for city names like “N.Y.”, “Ny”, “New York”:
-
✅ object
-
❌ float
-
❌ DataFrame
-
❌ int
Explanation:
Non-numeric text values in Pandas are stored as object type.
4. Question 4
Primary purpose of normalization:
-
❌ Make all features identical
-
❌ Remove outliers
-
✅ Ensure features have similar ranges for fair comparison
-
❌ Remove missing values
Explanation:
Normalization adjusts scale, preventing large-range features from dominating.
5. Question 5
Converting categorical values into numerical values:
-
❌ Convert numerical to categorical
-
✅ Turns categorical values into numerical values
-
❌ Bin values
-
❌ Change data type manually
Explanation:
Encoding transforms categories into machine-learning friendly numbers.
6. Question 6
First step in data preparation:
-
✅ Cleaning missing or inconsistent values
-
❌ Normalizing values
-
❌ Running ML models
-
❌ Encoding categorical variables
Explanation:
Cleaning always comes first to ensure data reliability.
7. Question 7
Prepare “fuel type” column (gas/diesel) for model training:
-
❌ cut()
-
✅ get_dummies()
-
❌ dropna()
-
❌ astype()
Explanation:
get_dummies() performs one-hot encoding for categorical variables.
8. Question 8
Convert “N/A” text entries into actual NaN values:
-
✅ replace()
-
❌ astype()
-
❌ dropna()
-
❌ fillna()
Explanation:
Use replace("N/A", np.nan) to convert string placeholders to true missing values.
🧾 Summary Table
| Q | Correct Answer | Key Concept |
|---|---|---|
| 1 | Mean imputation | Continuous missing value handling |
| 2 | Visualize distribution | Bin selection |
| 3 | object | Text data storage |
| 4 | Similar feature ranges | Normalization purpose |
| 5 | Encode categories → numbers | ML preprocessing |
| 6 | Clean data first | Data pipeline |
| 7 | get_dummies() | One-hot encoding |
| 8 | replace() | Converting placeholders to NaN |