Graded Quiz: Exploratory Data Analysis :IBM Data Analyst Capstone Project (IBM Data Analyst Professional Certificate) Answer 2025
1. Question 1
Which function identifies missing values in each column?
-
❌ df.info()
-
❌ df.missing_values()
-
✅ df.isnull().sum()
-
❌ df.describe()
Explanation:
isnull().sum() counts missing values per column.
2. Question 2
Visualize distribution of a categorical variable:
-
❌ scatterplot
-
❌ lineplot
-
❌ histplot (best for numeric)
-
✅ countplot
Explanation:
countplot shows frequency counts of categories.
3. Question 3
Function for cross-tabulations:
-
❌ pd.correlation()
-
❌ pd.merge()
-
✅ pd.crosstab()
-
❌ pd.groupby()
Explanation:
pd.crosstab() creates contingency tables.
4. Question 4
Median ConvertedCompYearly (from the dataset used in the lab):
-
❌ 55,000
-
❌ 50,000
-
❌ 65,000
-
✅ 60,000
Explanation:
The median yearly compensation ≈ 60,000 in the dataset.
5. Question 5
Method to detect outliers via 25th–75th percentile range:
-
❌ Standard deviation
-
✅ Interquartile Range (IQR)
-
❌ Mean absolute deviation
-
❌ Z-score
Explanation:
IQR = Q3 − Q1 → common method for outlier detection.
6. Question 6
Function to calculate skewness:
-
❌ df.corr()
-
❌ df.describe()
-
❌ df.var()
-
✅ df.skew()
Explanation:
df.skew() measures asymmetry in distribution.
7. Question 7
Best practice for handling extreme outliers in compensation data:
-
✅ Remove the outliers to prevent skewing the analysis
-
❌ Replace with NaN
-
❌ Set to max within 1.5×IQR
-
❌ Ignore them
Explanation:
Compensation data is highly skewed—removing extreme outliers improves representation.
8. Question 8
Identify median compensation for full-time employees:
-
❌ Use mode
-
❌ Use mean
-
✅ Filter for full-time employees → then calculate median
-
❌ Remove all outliers first
9. Question 9
Correlation between Age and WorkExp:
-
❌ No impact
-
❌ Work experience unrelated
-
❌ Experience decreases
-
✅ There is a strong relationship, but it is not perfect
Explanation:
As age increases, work experience also increases → positive correlation, but not 1.0.
10. Question 10
Purpose of removing outliers before analyzing salary trends:
-
❌ Ensure unique data
-
✅ Focus on common salary values & reduce skewness
-
❌ Decrease dataset size
-
❌ Increase median salary
🧾 Summary Table
| Q | Correct Answer | Key Concept |
|---|---|---|
| 1 | df.isnull().sum() | Missing values |
| 2 | countplot | Categorical visualization |
| 3 | pd.crosstab() | Cross-tabs |
| 4 | 60,000 | Median compensation |
| 5 | IQR | Outlier detection |
| 6 | df.skew() | Skewness |
| 7 | Remove outliers | Clean analysis |
| 8 | Filter full-time → median | Filtering |
| 9 | Strong but imperfect | Correlation |
| 10 | Reduce skewness | Outlier removal |