1. What is clustered data?

❌ Clustered data is when observations are the exact same.
❌ Data has low variance.
❌ One group is over-represented.
✅ Data is considered clustered when observations are correlated within groups, sometimes related to study designs.

Explanation:
Clustered (or grouped) data means subjects within the same group share similarities → intra-cluster correlation.

2. Which feature has the highest correlation between two observations in the same cluster?

❌ BPXSY1
✅ SDMVSTRA
❌ RIDAGEYR
❌ BMXBMI
❌ smq

Explanation:
SDMVSTRA is the stratum indicator used in NHANES sample design.
Within the same stratum, values are identical for all members, so correlation is effectively 1.0, the maximum possible.

3. What is true about multiple linear regression vs marginal linear models when dependence is present?

❌ Standard errors tend to be the same
❌ Multiple linear regression is theoretically justified with dependent data
✅ Marginal linear model estimates and SEs are meaningful when dependence is strictly within groups
❌ Standard error in multiple linear regression tends to be higher

Explanation:
MLR assumes independent observations. Marginal models (like GEE) correct SEs when there is intra-cluster dependence.

4. Multilevel models are expressed in terms of ________.

❌ Mixed effects
❌ Correlation coefficients
❌ Fixed effects
✅ Random effects

Explanation:
Multilevel (hierarchical) models are formally expressed through random effects, though they also include fixed effects.
The defining feature is random-effects structure.

5. Which is NOT true about reasons why we fit marginal models?

❌ Quicker computation
❌ Robust SEs for correlated data
❌ Easier accommodation of non-normal outcomes
✅ All the above are true

Explanation:
All statements listed are correct advantages of marginal models (GEEs), so the correct answer is: all are true.

🧾 Summary Table

Q#	Correct Answer	Key Idea
1	Observations correlated within groups	Definition of clustering
2	SDMVSTRA	Identical within stratum → max correlation
3	Marginal SEs valid for clustered dependence	MLR invalid with dependence
4	Random effects	Core of multilevel model formulation
5	All the above are true	All listed reasons are correct