Skip to content

Graded Quiz: Data Wrangling:IBM Data Analyst Capstone Project (IBM Data Analyst Professional Certificate) Answer 2025

1. Question 1

Identify number of duplicate rows:

  • ❌ df.sum_duplicates()

  • ❌ df.duplicates().sum()

  • df.duplicated().sum()

  • ❌ df.find_duplicates()

Explanation:

df.duplicated() returns True/False for duplicate rows → .sum() counts them.


2. Question 2

Remove duplicates based on specific columns:

  • df.drop_duplicates(subset=[‘ResponseId’, ‘MainBranch’])

  • ❌ df.drop_duplicates()

  • ❌ df.drop_duplicates([‘ResponseId’, ‘MainBranch’])

  • ❌ df.drop_duplicates(columns=…)

Explanation:

Use subset= to specify columns.


3. Question 3

Identify columns with same values in duplicate rows:

  • ❌ df.loc[df.duplicated(keep=False)].nunique()

  • ❌ duplicated().unique()

  • ❌ nunique(axis=1)

  • df.loc[df.duplicated(keep=False)].nunique(axis=0)

Explanation:

Check nunique per column (axis=0) among duplicated rows.


4. Question 4

Verify duplicates were removed:

  • ❌ Compare row counts after dropna()

  • ❌ Check df.drop_duplicates() returns zero

  • Re-run df.duplicated().sum() and ensure it equals zero

  • ❌ Check isnull

Explanation:

If no duplicates remain → .sum() will be 0.


5. Question 5

Replace missing values with the most frequent value:

  • ❌ fillna(0)

  • df[‘column’].fillna(df[‘column’].mode()[0])

  • ❌ replace(0)

  • ❌ fillna(mean)

Explanation:

Mode returns the most frequent value. Use .mode()[0].


6. Question 6

Purpose of df.describe(include='all'):

  • ❌ Remove duplicates

  • ❌ Count missing values

  • ❌ Identify missing values

  • Display summary statistics for all columns including categorical


7. Question 7

Fill missing values with most frequent value:

  • ❌ df.fillna(mode())

  • df[‘column’].fillna(df[‘column’].mode()[0])

  • ❌ Random text

  • ❌ df.mode().fillna()


8. Question 8

Replace NaN in ‘RemoteWork’ with a specific value:

  • df[‘RemoteWork’].fillna(‘value’, inplace=True)

  • ❌ dropna

  • ❌ fillna(df.mean()

  • ❌ replace()


9. Question 9

Function of MinMaxScaler:

  • Scales data to be between 0 and 1

  • ❌ Drops duplicates

  • ❌ Replaces NaN

  • ❌ Normalizes around mean 0

Explanation:

MinMaxScaler transforms:

x_scaled = (x - min) / (max - min)

10. Question 10

Function that provides summary statistics (mean, count, etc.):

  • ❌ isnull()

  • ❌ dropna()

  • ❌ fillna()

  • df.describe()


🧾 Summary Table

Q Correct Answer Concept
1 df.duplicated().sum() Count duplicates
2 drop_duplicates(subset=…) Remove selected duplicates
3 duplicated + nunique(axis=0) Analyze duplicate columns
4 duplicated().sum()==0 Verify removal
5 fillna(mode) Frequent value imputation
6 describe(include=’all’) Summary stats
7 fillna(mode) Replace with mode
8 fillna(value, inplace=True) Replace NaN
9 MinMaxScaler Scale 0 to 1
10 describe() Summary statistics