Graded Quiz: Data Wrangling:IBM Data Analyst Capstone Project (IBM Data Analyst Professional Certificate) Answer 2025
1. Question 1
Identify number of duplicate rows:
-
❌ df.sum_duplicates()
-
❌ df.duplicates().sum()
-
✅ df.duplicated().sum()
-
❌ df.find_duplicates()
Explanation:
df.duplicated() returns True/False for duplicate rows → .sum() counts them.
2. Question 2
Remove duplicates based on specific columns:
-
✅ df.drop_duplicates(subset=[‘ResponseId’, ‘MainBranch’])
-
❌ df.drop_duplicates()
-
❌ df.drop_duplicates([‘ResponseId’, ‘MainBranch’])
-
❌ df.drop_duplicates(columns=…)
Explanation:
Use subset= to specify columns.
3. Question 3
Identify columns with same values in duplicate rows:
-
❌ df.loc[df.duplicated(keep=False)].nunique()
-
❌ duplicated().unique()
-
❌ nunique(axis=1)
-
✅ df.loc[df.duplicated(keep=False)].nunique(axis=0)
Explanation:
Check nunique per column (axis=0) among duplicated rows.
4. Question 4
Verify duplicates were removed:
-
❌ Compare row counts after dropna()
-
❌ Check df.drop_duplicates() returns zero
-
✅ Re-run df.duplicated().sum() and ensure it equals zero
-
❌ Check isnull
Explanation:
If no duplicates remain → .sum() will be 0.
5. Question 5
Replace missing values with the most frequent value:
-
❌ fillna(0)
-
✅ df[‘column’].fillna(df[‘column’].mode()[0])
-
❌ replace(0)
-
❌ fillna(mean)
Explanation:
Mode returns the most frequent value. Use .mode()[0].
6. Question 6
Purpose of df.describe(include='all'):
-
❌ Remove duplicates
-
❌ Count missing values
-
❌ Identify missing values
-
✅ Display summary statistics for all columns including categorical
7. Question 7
Fill missing values with most frequent value:
-
❌ df.fillna(mode())
-
✅ df[‘column’].fillna(df[‘column’].mode()[0])
-
❌ Random text
-
❌ df.mode().fillna()
8. Question 8
Replace NaN in ‘RemoteWork’ with a specific value:
-
✅ df[‘RemoteWork’].fillna(‘value’, inplace=True)
-
❌ dropna
-
❌ fillna(df.mean()
-
❌ replace()
9. Question 9
Function of MinMaxScaler:
-
✅ Scales data to be between 0 and 1
-
❌ Drops duplicates
-
❌ Replaces NaN
-
❌ Normalizes around mean 0
Explanation:
MinMaxScaler transforms:
x_scaled = (x - min) / (max - min)
10. Question 10
Function that provides summary statistics (mean, count, etc.):
-
❌ isnull()
-
❌ dropna()
-
❌ fillna()
-
✅ df.describe()
🧾 Summary Table
| Q | Correct Answer | Concept |
|---|---|---|
| 1 | df.duplicated().sum() | Count duplicates |
| 2 | drop_duplicates(subset=…) | Remove selected duplicates |
| 3 | duplicated + nunique(axis=0) | Analyze duplicate columns |
| 4 | duplicated().sum()==0 | Verify removal |
| 5 | fillna(mode) | Frequent value imputation |
| 6 | describe(include=’all’) | Summary stats |
| 7 | fillna(mode) | Replace with mode |
| 8 | fillna(value, inplace=True) | Replace NaN |
| 9 | MinMaxScaler | Scale 0 to 1 |
| 10 | describe() | Summary statistics |