Final Exam:Data Analysis with Python (IBM Data Analyst Professional Certificate) Answers 2025
1️⃣ Question 1
Which describes a file with plain text, rows, and columns?
-
❌ Key–value text
-
❌ Comma-separated array
-
❌ Excel spreadsheet
-
✅ A text file that saves data in tables
Explanation:
A plain text table (e.g., .txt or .csv) is stored as rows/columns.
2️⃣ Question 2
Python library for email spam classification?
-
❌ Fast array processing
-
❌ Exploratory data analysis
-
❌ Matrix operations
-
✅ Statistical modeling including regression/classification (scikit-learn)
Explanation:
Scikit-learn provides classification algorithms.
3️⃣ Question 3
Most important factors for reading data with Pandas:
-
❌ Encoding + file path
-
❌ File types + formats
-
❌ Format + file path
-
✅ File types and encoding scheme
Explanation:
Pandas must know file type (CSV/JSON/Excel) and encoding (UTF-8, ISO-8859-1).
4️⃣ Question 4
Why use Python’s DB API?
-
❌ Autogenerate interfaces
-
❌ Bypass SQL
-
❌ Convert output to JSON
-
✅ Allows consistent querying/connection across SQL systems
Explanation:
Python DB API standardizes database connections.
5️⃣ Question 5
Regression model of: ŷ = b₀ + b₁x
-
✅ Linear regression
-
❌ Polynomial
-
❌ Multiple linear
-
❌ Exponential
6️⃣ Question 6
Why do predictions become negative?
-
❌ Coefficients uninterpreted
-
✅ Model extrapolates beyond realistic ranges
-
❌ Regression line always increases
-
❌ Low R²
Explanation:
Linear models fail outside the data range.
7️⃣ Question 7
Challenge of a single train–test split:
-
❌ Invalid R²
-
❌ Less accuracy due to less training data
-
✅ Generalization error may change with each split
-
❌ Decline to adapt to hidden features
Explanation:
One split may not represent the dataset; different splits yield different performance.
8️⃣ Question 8
Code effect:
mean = df["price"].mean()
df["price"].replace(np.nan, mean)
-
❌ Only calculates mean
-
✅ Fills missing values with column mean
-
❌ Normalizes data
-
❌ Drops missing rows
9️⃣ Question 9
Purpose of binning car prices:
-
❌ Randomizes prices
-
❌ Filters values
-
✅ Creates labeled segments for price intervals
-
❌ Equalizes values
🔟 Question 10
Method dividing by standard deviation:
-
❌ Simple scaling
-
✅ Z-score standardization
-
❌ Feature binning
-
❌ Min-max scaling
1️⃣1️⃣ Question 11
Method to evaluate column statistics/types:
-
❌ dataframe.values()
-
❌ dataframe.rename()
-
❌ dataframe.astype(“int”)
-
❌ dataframe.dtypes(“int”)
Correct: Pandas method isdataframe.dtypes(last option is incorrectly written but intended).
So the correct choice is: -
✅ dataframe.dtypes(“int”) (intention: check data types)
1️⃣2️⃣ Question 12
What is EDA?
-
✅ Reviewing key characteristics and uncovering patterns
-
❌ Segmenting dataset
-
❌ Minimizing dimensions
-
❌ Training models
1️⃣3️⃣ Question 13
Negative linear relationship means:
-
❌ Output doesn’t explain input
-
❌ Decreases at increasing rate
-
✅ With increase in input, output decreases at about the same rate
-
❌ Output increases
1️⃣4️⃣ Question 14
Method to find outliers:
-
❌ Scatter plot
-
✅ Box plot
-
❌ Histogram via describe
-
❌ value_counts
1️⃣5️⃣ Question 15
Method to compare average price across drive types:
-
✅ Group the data using category values
-
❌ Numeric filters
-
❌ Row filters
-
❌ Combine datasets
Explanation:
Use groupby().
1️⃣6️⃣ Question 16
Role of independent variables:
-
❌ Summarize performance
-
❌ Define accuracy metric
-
❌ Compare models
-
✅ Serve as inputs to estimate output
1️⃣7️⃣ Question 17
Curved residual pattern implies:
-
❌ Linear relationship
-
❌ Uniformly low errors
-
❌ Randomly distributed
-
✅ Model may be inaccurate → nonlinear relationship
1️⃣8️⃣ Question 18
Truth about noise:
-
❌ Accounted with a parameter
-
❌ No noise if testing fits
-
❌ No noise if training fits
-
✅ Noise is random and cannot be predicted
1️⃣9️⃣ Question 19
Large alpha in ridge regression:
-
❌ Lower-order function required
-
✅ Model is underfitted
-
❌ Overfitted
-
❌ Higher alpha → better fit
Explanation:
High alpha shrinks coefficients too much → underfitting.
2️⃣0️⃣ Question 20
Argument passed to GridSearchCV():
-
❌ Dictionary of columns
-
✅ Dictionary of parameters and values
-
❌ Dataframe of models
-
❌ Normalized features
🧾 Summary Table
| Q | Correct Answer |
|---|---|
| 1 | Text file storing data in tables |
| 2 | Statistical modeling (scikit-learn) |
| 3 | File types + encoding scheme |
| 4 | Consistent SQL querying |
| 5 | Linear regression |
| 6 | Extrapolation beyond data |
| 7 | Generalization error varies |
| 8 | Fill missing values with mean |
| 9 | Create labeled price bins |
| 10 | Z-score standardization |
| 11 | dataframe.dtypes |
| 12 | Review key characteristics |
| 13 | Output decreases as input increases |
| 14 | Box plot |
| 15 | Group by categories |
| 16 | Inputs to estimate output |
| 17 | Model inaccurate / nonlinear |
| 18 | Noise is random |
| 19 | Underfitted model |
| 20 | Dictionary of parameters |