Logistic Regression Quiz :Fitting Statistical Models to Data with Python (Statistics with Python Specialization) Answers 2025
1. Which collected variables could be predicted using a logistic regression model?
(Recall logistic regression predicts a binary outcome / probability of a dichotomous event.)
-
❌ Sex (male vs. female) — No (this is a binary variable that could be predicted by logistic regression only if it were the outcome; but as a predictor it’s not something we predict with logistic regression here).
-
✅ Whether a shot on goal traveled more than 20 feet — Yes. (This is a binary outcome: >20 ft = yes/no.)
-
❌ Height — No. (Continuous — better predicted with linear regression.)
-
✅ Scoring a soccer goal on a given shot — Yes. (Binary outcome: goal vs. no goal.)
-
❌ Age (years) — No. (Continuous — linear regression is appropriate.)
Explanation: Logistic regression is for binary outcomes (or probabilities of categories). Choose variables that are binary as the response.
2. Which of the illustrated graphs could be a possible form/shape for a logistic regression model?
(Select the sigmoid (S-shaped), monotone curve bounded between 0 and 1.)
-
✅ Graph that shows a monotone S-shaped (sigmoid) curve bounded between 0 and 1 — Yes.
-
❌ Graphs that are linear, U-shaped, or unbounded — No.
Explanation: The logistic function maps real-valued inputs to probabilities in (0,1) and produces an S-shaped, monotone curve. (Pick the graph that is a bounded sigmoid; the other shapes are not logistic.)
3. Of the two logit-transformed values, which corresponds to a higher original probability?
-
❌ -2
-
✅ 0.25
-
❌ They are the same
-
❌ Can’t tell
Explanation: The logit function is monotone increasing: a larger logit corresponds to a larger probability. 0.25 > −2, so 0.25 maps to the higher probability.
4. Interpretation of coefficient 0.0037 (single-variable logistic model with BMI predicting smoking 100+ cigarettes)
-
❌ For each increase by one in BMI, the probability increases by about 0.0037.
-
❌ For each increase by one in BMI, the odds increases by about 0.0037.
-
✅ For each increase by one in BMI, the log odds of smoking 100 cigarettes increases by about 0.0037, on average.
-
❌ For each increase in one in BMI, the odds increases multiplicatively by about 0.0037.
Explanation: In logistic regression the raw coefficient is the change in log odds per one-unit increase in the predictor.
5. Interpretation of coefficient 0.0169 for Age in the model with BMI and Age
-
❌ For BMI → odds change 0.0169
-
❌ For Age → odds change 0.0169
-
✅ For each increase of one in Age, the log odds of smoking 100 cigarettes increases by about 0.0169 while holding BMI constant, on average.
-
❌ For each increase of one in Age, the log odds increases by 0.0169 (without conditioning)
Explanation: In a multivariable logistic model the coefficient for Age is the change in log odds per year of age controlling for BMI.
6. At two-sided 10% significance level, which coefficients are statistically significant?
-
❌ Both coefficients are significant
-
❌ Neither coefficient is significant
-
❌ Only the coefficient for BMI is significant
-
✅ Only the coefficient for Age is significant
Explanation: The 95% CI for Age (0.014, 0.020) does not include 0, so Age is significant even at 10%. The BMI 95% CI includes 0 (for example, −0.005 to 0.011), so BMI is not significant.
7. If the 95% CI for Age is (0.014, 0.020), how would a 90% CI change?
-
✅ It would be narrower
-
❌ It would be wider
-
❌ It would stay the same
-
❌ Can’t tell
Explanation: A 90% confidence interval uses a smaller critical value and therefore is narrower than a 95% interval (less coverage → less width).
8. Predicted log odds for BMI = 22 and Age = 45 using the model with BMI and Age (pick the closest)
-
✅ -0.417
-
❌ 0.8265
-
❌ 0.327
-
❌ -0.7367
-
❌ Can’t tell
Explanation: Using the model intercept and coefficients reported in the output (intercept ≈ −1.259, BMI ≈ 0.0037, Age ≈ 0.0169), the linear predictor ≈ −1.259 + 0.0037·22 + 0.0169·45 ≈ −0.417 (closest choice).
9. Is that predicted log odds for BMI=22, Age=45 trustworthy as interpolation or extrapolation?
-
❌ No, this is extrapolation
-
❌ No, this is interpolation
-
❌ Yes, this is extrapolation
-
✅ Yes, this is interpolation
Explanation: The sample covers Age 20–80 and BMI 14.5–64.6, so Age=45 and BMI=22 fall well within the observed ranges — this is interpolation and thus the prediction is reasonable (subject to model assumptions).
10. Fill in the blanks. With 95% confidence, the increase in log odds of smoking 100+ cigarettes for each +1 BMI (holding Age constant) is between ____ and ____.
-
❌ -1.2435 and 0.149
-
❌ 0.014 and 0.020
-
❌ -1.535 and -0.952
-
✅ -0.005 and 0.011
-
❌ Can’t tell
Explanation: The reported 95% CI for the BMI coefficient in the multivariable model is (−0.005, 0.011), which contains 0 and therefore indicates no statistically significant BMI effect at the 5% level.
🧾 Summary Table
| Q# | Answer (selected) | Key point |
|---|---|---|
| 1 | Whether >20 ft; Scoring a goal ✅ | Logistic for binary outcomes |
| 2 | The S-shaped, monotone sigmoid graph ✅ | Logistic maps to (0,1) |
| 3 | 0.25 ✅ | Logit is monotone ↑ → larger logit → larger p |
| 4 | Log-odds increases by 0.0037 ✅ | Coefficient = change in log odds |
| 5 | Age coef = log-odds change of 0.0169 (holding BMI) ✅ | Multivariable interpretation |
| 6 | Only Age significant ✅ | Age CI excludes 0; BMI CI includes 0 |
| 7 | 90% CI is narrower ✅ | Less coverage → narrower |
| 8 | Predicted log odds ≈ −0.417 ✅ | Using intercept and coefficients (closest choice) |
| 9 | Yes — interpolation ✅ | Values lie inside observed ranges |
| 10 | BMI 95% CI = (−0.005, 0.011) ✅ | Contains 0 → not significant |