1️⃣ Question 1

Which describes a file with plain text, rows, and columns?

❌ Key–value text
❌ Comma-separated array
❌ Excel spreadsheet
✅ A text file that saves data in tables

Explanation:
A plain text table (e.g., .txt or .csv) is stored as rows/columns.

2️⃣ Question 2

Python library for email spam classification?

❌ Fast array processing
❌ Exploratory data analysis
❌ Matrix operations
✅ Statistical modeling including regression/classification (scikit-learn)

Explanation:
Scikit-learn provides classification algorithms.

3️⃣ Question 3

Most important factors for reading data with Pandas:

❌ Encoding + file path
❌ File types + formats
❌ Format + file path
✅ File types and encoding scheme

Explanation:
Pandas must know file type (CSV/JSON/Excel) and encoding (UTF-8, ISO-8859-1).

4️⃣ Question 4

Why use Python’s DB API?

❌ Autogenerate interfaces
❌ Bypass SQL
❌ Convert output to JSON
✅ Allows consistent querying/connection across SQL systems

Explanation:
Python DB API standardizes database connections.

5️⃣ Question 5

Regression model of: ŷ = b₀ + b₁x

✅ Linear regression
❌ Polynomial
❌ Multiple linear
❌ Exponential

6️⃣ Question 6

Why do predictions become negative?

❌ Coefficients uninterpreted
✅ Model extrapolates beyond realistic ranges
❌ Regression line always increases
❌ Low R²

Explanation:
Linear models fail outside the data range.

7️⃣ Question 7

Challenge of a single train–test split:

❌ Invalid R²
❌ Less accuracy due to less training data
✅ Generalization error may change with each split
❌ Decline to adapt to hidden features

Explanation:
One split may not represent the dataset; different splits yield different performance.

8️⃣ Question 8

Code effect:

❌ Only calculates mean
✅ Fills missing values with column mean
❌ Normalizes data
❌ Drops missing rows

9️⃣ Question 9

Purpose of binning car prices:

❌ Randomizes prices
❌ Filters values
✅ Creates labeled segments for price intervals
❌ Equalizes values

🔟 Question 10

Method dividing by standard deviation:

❌ Simple scaling
✅ Z-score standardization
❌ Feature binning
❌ Min-max scaling

1️⃣1️⃣ Question 11

Method to evaluate column statistics/types:

❌ dataframe.values()
❌ dataframe.rename()
❌ dataframe.astype(“int”)
❌ dataframe.dtypes(“int”)
Correct: Pandas method is dataframe.dtypes (last option is incorrectly written but intended).
So the correct choice is:
✅ dataframe.dtypes(“int”) (intention: check data types)

1️⃣2️⃣ Question 12

What is EDA?

✅ Reviewing key characteristics and uncovering patterns
❌ Segmenting dataset
❌ Minimizing dimensions
❌ Training models

1️⃣3️⃣ Question 13

Negative linear relationship means:

❌ Output doesn’t explain input
❌ Decreases at increasing rate
✅ With increase in input, output decreases at about the same rate
❌ Output increases

1️⃣4️⃣ Question 14

Method to find outliers:

❌ Scatter plot
✅ Box plot
❌ Histogram via describe
❌ value_counts

1️⃣5️⃣ Question 15

Method to compare average price across drive types:

✅ Group the data using category values
❌ Numeric filters
❌ Row filters
❌ Combine datasets

Explanation:
Use groupby().

1️⃣6️⃣ Question 16

Role of independent variables:

❌ Summarize performance
❌ Define accuracy metric
❌ Compare models
✅ Serve as inputs to estimate output

1️⃣7️⃣ Question 17

Curved residual pattern implies:

❌ Linear relationship
❌ Uniformly low errors
❌ Randomly distributed
✅ Model may be inaccurate → nonlinear relationship

1️⃣8️⃣ Question 18

Truth about noise:

❌ Accounted with a parameter
❌ No noise if testing fits
❌ No noise if training fits
✅ Noise is random and cannot be predicted

1️⃣9️⃣ Question 19

Large alpha in ridge regression:

❌ Lower-order function required
✅ Model is underfitted
❌ Overfitted
❌ Higher alpha → better fit

Explanation:
High alpha shrinks coefficients too much → underfitting.

2️⃣0️⃣ Question 20

Argument passed to GridSearchCV():

❌ Dictionary of columns
✅ Dictionary of parameters and values
❌ Dataframe of models
❌ Normalized features

🧾 Summary Table

Q	Correct Answer
1	Text file storing data in tables
2	Statistical modeling (scikit-learn)
3	File types + encoding scheme
4	Consistent SQL querying
5	Linear regression
6	Extrapolation beyond data
7	Generalization error varies
8	Fill missing values with mean
9	Create labeled price bins
10	Z-score standardization
11	dataframe.dtypes
12	Review key characteristics
13	Output decreases as input increases
14	Box plot
15	Group by categories
16	Inputs to estimate output
17	Model inaccurate / nonlinear
18	Noise is random
19	Underfitted model
20	Dictionary of parameters

Final Exam:Data Analysis with Python (IBM Data Analyst Professional Certificate) Answers 2025