Skip to content

Final Exam:Data Analysis with Python (IBM Data Analyst Professional Certificate) Answers 2025

1️⃣ Question 1

Which describes a file with plain text, rows, and columns?

  • ❌ Key–value text

  • ❌ Comma-separated array

  • ❌ Excel spreadsheet

  • A text file that saves data in tables

Explanation:
A plain text table (e.g., .txt or .csv) is stored as rows/columns.


2️⃣ Question 2

Python library for email spam classification?

  • ❌ Fast array processing

  • ❌ Exploratory data analysis

  • ❌ Matrix operations

  • Statistical modeling including regression/classification (scikit-learn)

Explanation:
Scikit-learn provides classification algorithms.


3️⃣ Question 3

Most important factors for reading data with Pandas:

  • ❌ Encoding + file path

  • ❌ File types + formats

  • ❌ Format + file path

  • File types and encoding scheme

Explanation:
Pandas must know file type (CSV/JSON/Excel) and encoding (UTF-8, ISO-8859-1).


4️⃣ Question 4

Why use Python’s DB API?

  • ❌ Autogenerate interfaces

  • ❌ Bypass SQL

  • ❌ Convert output to JSON

  • Allows consistent querying/connection across SQL systems

Explanation:
Python DB API standardizes database connections.


5️⃣ Question 5

Regression model of: ŷ = b₀ + b₁x

  • Linear regression

  • ❌ Polynomial

  • ❌ Multiple linear

  • ❌ Exponential


6️⃣ Question 6

Why do predictions become negative?

  • ❌ Coefficients uninterpreted

  • Model extrapolates beyond realistic ranges

  • ❌ Regression line always increases

  • ❌ Low R²

Explanation:
Linear models fail outside the data range.


7️⃣ Question 7

Challenge of a single train–test split:

  • ❌ Invalid R²

  • ❌ Less accuracy due to less training data

  • Generalization error may change with each split

  • ❌ Decline to adapt to hidden features

Explanation:
One split may not represent the dataset; different splits yield different performance.


8️⃣ Question 8

Code effect:

mean = df["price"].mean()
df["price"].replace(np.nan, mean)
  • ❌ Only calculates mean

  • Fills missing values with column mean

  • ❌ Normalizes data

  • ❌ Drops missing rows


9️⃣ Question 9

Purpose of binning car prices:

  • ❌ Randomizes prices

  • ❌ Filters values

  • Creates labeled segments for price intervals

  • ❌ Equalizes values


🔟 Question 10

Method dividing by standard deviation:

  • ❌ Simple scaling

  • Z-score standardization

  • ❌ Feature binning

  • ❌ Min-max scaling


1️⃣1️⃣ Question 11

Method to evaluate column statistics/types:

  • ❌ dataframe.values()

  • ❌ dataframe.rename()

  • ❌ dataframe.astype(“int”)

  • ❌ dataframe.dtypes(“int”)
    Correct: Pandas method is dataframe.dtypes (last option is incorrectly written but intended).
    So the correct choice is:

  • dataframe.dtypes(“int”) (intention: check data types)


1️⃣2️⃣ Question 12

What is EDA?

  • Reviewing key characteristics and uncovering patterns

  • ❌ Segmenting dataset

  • ❌ Minimizing dimensions

  • ❌ Training models


1️⃣3️⃣ Question 13

Negative linear relationship means:

  • ❌ Output doesn’t explain input

  • ❌ Decreases at increasing rate

  • With increase in input, output decreases at about the same rate

  • ❌ Output increases


1️⃣4️⃣ Question 14

Method to find outliers:

  • ❌ Scatter plot

  • Box plot

  • ❌ Histogram via describe

  • ❌ value_counts


1️⃣5️⃣ Question 15

Method to compare average price across drive types:

  • Group the data using category values

  • ❌ Numeric filters

  • ❌ Row filters

  • ❌ Combine datasets

Explanation:
Use groupby().


1️⃣6️⃣ Question 16

Role of independent variables:

  • ❌ Summarize performance

  • ❌ Define accuracy metric

  • ❌ Compare models

  • Serve as inputs to estimate output


1️⃣7️⃣ Question 17

Curved residual pattern implies:

  • ❌ Linear relationship

  • ❌ Uniformly low errors

  • ❌ Randomly distributed

  • Model may be inaccurate → nonlinear relationship


1️⃣8️⃣ Question 18

Truth about noise:

  • ❌ Accounted with a parameter

  • ❌ No noise if testing fits

  • ❌ No noise if training fits

  • Noise is random and cannot be predicted


1️⃣9️⃣ Question 19

Large alpha in ridge regression:

  • ❌ Lower-order function required

  • Model is underfitted

  • ❌ Overfitted

  • ❌ Higher alpha → better fit

Explanation:
High alpha shrinks coefficients too much → underfitting.


2️⃣0️⃣ Question 20

Argument passed to GridSearchCV():

  • ❌ Dictionary of columns

  • Dictionary of parameters and values

  • ❌ Dataframe of models

  • ❌ Normalized features


🧾 Summary Table

Q Correct Answer
1 Text file storing data in tables
2 Statistical modeling (scikit-learn)
3 File types + encoding scheme
4 Consistent SQL querying
5 Linear regression
6 Extrapolation beyond data
7 Generalization error varies
8 Fill missing values with mean
9 Create labeled price bins
10 Z-score standardization
11 dataframe.dtypes
12 Review key characteristics
13 Output decreases as input increases
14 Box plot
15 Group by categories
16 Inputs to estimate output
17 Model inaccurate / nonlinear
18 Noise is random
19 Underfitted model
20 Dictionary of parameters