Week 3 Python Assessment :Fitting Statistical Models to Data with Python (Statistics with Python Specialization) Answers 2025
1. What is clustered data?
-
❌ Clustered data is when observations are the exact same.
-
❌ Data has low variance.
-
❌ One group is over-represented.
-
✅ Data is considered clustered when observations are correlated within groups, sometimes related to study designs.
Explanation:
Clustered (or grouped) data means subjects within the same group share similarities → intra-cluster correlation.
2. Which feature has the highest correlation between two observations in the same cluster?
-
❌ BPXSY1
-
✅ SDMVSTRA
-
❌ RIDAGEYR
-
❌ BMXBMI
-
❌ smq
Explanation:
SDMVSTRA is the stratum indicator used in NHANES sample design.
Within the same stratum, values are identical for all members, so correlation is effectively 1.0, the maximum possible.
3. What is true about multiple linear regression vs marginal linear models when dependence is present?
-
❌ Standard errors tend to be the same
-
❌ Multiple linear regression is theoretically justified with dependent data
-
✅ Marginal linear model estimates and SEs are meaningful when dependence is strictly within groups
-
❌ Standard error in multiple linear regression tends to be higher
Explanation:
MLR assumes independent observations. Marginal models (like GEE) correct SEs when there is intra-cluster dependence.
4. Multilevel models are expressed in terms of ________.
-
❌ Mixed effects
-
❌ Correlation coefficients
-
❌ Fixed effects
-
✅ Random effects
Explanation:
Multilevel (hierarchical) models are formally expressed through random effects, though they also include fixed effects.
The defining feature is random-effects structure.
5. Which is NOT true about reasons why we fit marginal models?
-
❌ Quicker computation
-
❌ Robust SEs for correlated data
-
❌ Easier accommodation of non-normal outcomes
-
✅ All the above are true
Explanation:
All statements listed are correct advantages of marginal models (GEEs), so the correct answer is: all are true.
🧾 Summary Table
| Q# | Correct Answer | Key Idea |
|---|---|---|
| 1 | Observations correlated within groups | Definition of clustering |
| 2 | SDMVSTRA | Identical within stratum → max correlation |
| 3 | Marginal SEs valid for clustered dependence | MLR invalid with dependence |
| 4 | Random effects | Core of multilevel model formulation |
| 5 | All the above are true | All listed reasons are correct |