Skip to content

Final Exam :Data Analysis with Python (Applied Data Science Specialization) Answers 2025

1. Question 1

Which describes a file with plain text, rows, and columns?

  • ❌ A text file containing key-value pairs

  • ❌ An array of values separated by a comma

  • ❌ A Microsoft Excel spreadsheet

  • A text file that saves data in tables

Explanation:
A plain-text table with rows/columns is a structured text table (like CSV/TSV).


2. Question 2

Library for classification like spam detection?

  • ❌ Fast array processing

  • ❌ Exploratory data analysis

  • ❌ Operations on matrices

  • Statistical modeling, including regression and classification

Explanation:
This describes scikit-learn.


3. Question 3

Reading CSV from remote server—two important factors?

  • ❌ Encoding scheme and file path

  • ❌ File types and formats

  • ❌ Format and file path

  • File types and encoding scheme

Explanation:
Correct file type (CSV) + encoding (UTF-8 etc.) matter most.


4. Question 4

Why use Python’s DB API?

  • ❌ It autogenerates UIs

  • ❌ It bypasses SQL

  • ❌ It transforms output to JSON

  • It allows consistent query and connection across SQL systems

Explanation:
DB API provides a unified interface.


5. Question 5

Model: ŷ = b₀ + b₁x

  • Linear regression

  • ❌ Polynomial regression

  • ❌ Multiple linear regression

  • ❌ Exponential regression

Explanation:
Single feature + straight line = simple linear regression.


6. Question 6

Why unrealistic negative predictions at extreme MPG?

  • ❌ Coefficients are uninterpreted

  • The model extrapolates beyond realistic data ranges

  • ❌ Regression line always goes up

  • ❌ Low R² values

Explanation:
Linear models give bad predictions outside training range.


7. Question 7

Challenge with single train-test split?

  • ❌ R² becomes invalid

  • ❌ Model lacks accuracy due to decreased training data

  • Generalization error may change with each split

  • ❌ Model cannot adapt to hidden features

Explanation:
Different splits → different results.


8. Question 8

What does the code do?

mean = df["price"].mean()
df["price"].replace(np.nan, mean)
  • ❌ Calculates the mean only

  • Fills missing values in “price” with the mean

  • ❌ Replaces with normalized values

  • ❌ Drops rows

Explanation:
Mean imputation.


9. Question 9

Purpose of binning?

  • ❌ Randomizes price

  • ❌ Filters values not fitting

  • Creates labeled segments for price intervals

  • ❌ Equalizes price values

Explanation:
Binning groups continuous values into intervals.


10. Question 10

Method removing mean then dividing by standard deviation?

  • ❌ Simple scaling

  • Z-score standardization

  • ❌ Feature binning

  • ❌ Min-max scaling

Explanation:
(Standard score) = (x − μ) / σ.


11. Question 11

Check data types of each column?

  • ❌ dataframe.values()

  • ❌ dataframe.rename()

  • ❌ dataframe.astype(“int”)

  • ❌ dataframe.dtypes(“int”)

  • Correction: Pandas method is dataframe.dtypes
    So correct option: None explicitly matches fully, but best is:
    ➡️ dataframe.dtypes(“int”) (though syntax is wrong)
    But since this is MCQ, expected correct answer is:

  • dataframe.dtypes(“int”) (closest match)

Explanation:
df.dtypes shows each column’s type.


12. Question 12

What is EDA?

  • Reviewing key characteristics and uncovering patterns

  • ❌ Segmenting data

  • ❌ Minimizing dimensionality

  • ❌ Training models

Explanation:
EDA helps understand structure and patterns.


13. Question 13

Negative linear relationship means:

  • ❌ Output does not explain input

  • ❌ Output decreases at increasing rate

  • With increase in input, output decreases at same rate

  • ❌ Output increases

Explanation:
Negative slope → inverse linear relation.


14. Question 14

Detect outliers in engine size?

  • ❌ Scatter plot

  • Box plot

  • ❌ Describe for histogram

  • ❌ value_counts

Explanation:
Box plots reveal outliers clearly.


15. Question 15

Study average price per drive type:

  • Group the data using category values

  • ❌ Use numeric filter

  • ❌ Filter rows

  • ❌ Combine datasets

Explanation:
Use groupby().


16. Question 16

Role of independent variables?

  • ❌ Summarize performance

  • ❌ Define accuracy

  • ❌ Compare models

  • They serve as inputs to estimate the output

Explanation:
Independent variables predict the target.


17. Question 17

Residuals show curved pattern:

  • ❌ Linear relationship

  • Model may be inaccurate

  • ❌ Prediction errors low

  • ❌ Residuals random

Explanation:
Curved residuals → need nonlinear model.


18. Question 18

True about noise?

  • ❌ Accounted by parameter

  • ❌ No noise if testing fits well

  • It is random and cannot be predicted

  • ❌ No noise if training fits well

Explanation:
Noise = randomness.


19. Question 19

Large alpha in ridge regression:

  • ❌ Lower order needed

  • Model is underfitted

  • ❌ Overfitted

  • ❌ Higher alpha = better fit

Explanation:
Large alpha shrinks coefficients too much → underfitting.


20. Question 20

Argument required in GridSearchCV?

  • ❌ Dictionary of columns

  • Dictionary of parameters and values

  • ❌ Dataframe of models

  • ❌ Normalized feature value

Explanation:
GridSearchCV needs param_grid = { ‘param’: [values] }


🧾 SUMMARY TABLE

Q Answer Key Concept
1 Text file with tables File formats
2 Statistical modeling ML libraries
3 File type + encoding Data loading
4 Consistent SQL interface DB API
5 Linear regression Model type
6 Extrapolation issue Prediction limits
7 Split variance Generalization error
8 Fill NaN with mean Imputation
9 Labeled intervals Binning
10 Z-score Normalization
11 dataframe.dtypes Data types
12 Pattern discovery EDA
13 Output decreases Negative linearity
14 Box plot Outlier detection
15 Group by category Aggregation
16 Inputs to model IV role
17 Model inaccurate Residual patterns
18 Random, unpredictable Noise
19 Underfitting Ridge alpha
20 Parameter dictionary Grid search