Week 1 Quiz:Reproducible Research(Data Science Specialization):Answers2025

Question 1

Suppose I conduct a study and publish my findings. Which of the following is an example of a replication of my study?

✅ An investigator at another institution conducts a study addressing the same question, collects her own data, analyzes it separately from me, and publishes her own findings.
❌ I take my own data, analyze it again, and publish new findings.
❌ An investigator at another institution conducts a study addressing a different scientific question.
❌ I give my data to another investigator, and she gets the same results.

Explanation:
Replication means an independent study with new data addressing the same research question — confirming the findings independently.

Question 2

Which of the following is a requirement for a published data analysis to be reproducible?

✅ The investigator makes the analytic data publicly available.
❌ The data analysis is conducted using R.
❌ It must run on Unix.
❌ The investigator makes his original computer available.

Explanation:
For reproducibility, others must be able to access the same data and code to reproduce results — not necessarily use the same environment or software.

Question 3

Which of the following is an example of a reproducible study?

✅ The study’s analytic data and computer code for the data analysis are publicly available. When the code is run on the analytic data, the findings are identical to the published results.
❌ Data and code not public, but replicated manually.
❌ Authors re-run their own code.
❌ Only data available, not code.

Explanation:
Reproducibility means independent re-execution of the same code and data yielding identical results.

Question 4

Which of the following is a reason that a study might NOT be fully replicated?

✅ The original study was very expensive and there is no money to repeat it in a different setting.
❌ The original study had null findings.
❌ The original study was done by a famous investigator.
❌ The investigator refuses to share data.

Explanation:
Replication can be limited by practical constraints like cost, logistics, or feasibility — not necessarily scientific factors.

Question 5

Why is publishing reproducible research increasingly important?

✅ New technologies are increasing the rate of data collection, creating datasets that are more complex and extremely high dimensional.
❌ Computing power is limited.
❌ Statistical methods can be described in plain language.
❌ Studies are small and easily replicated.

Explanation:
With big data and complex analyses, reproducibility ensures transparency and reliability in research results.

Question 6

What is the role of processing code in the research pipeline?

✅ It transforms the measured data into analytic data.
❌ It transforms analytic data into computational results.
❌ It makes plots/tables.
❌ It performs the final statistical analysis.

Explanation:
Processing code cleans and reshapes raw data into analyzable data, a crucial preprocessing step.

Question 7

Which is a goal of literate statistical programming?

✅ Combine explanatory text and data analysis code in a single document.
❌ Ensure output in PDF only.
❌ Separate figures and tables.
❌ Require LaTeX.

Explanation:
Literate programming integrates code and narrative, ensuring transparency and readability.

Question 8

What does it mean to weave a literate statistical program?

✅ Transform the literate program into a human readable document.
❌ Compress the program.
❌ Make it machine-readable only.
❌ Convert from R to Python.

Explanation:
Weaving creates an output report (HTML, PDF, etc.) from the literate source file, showing code and results.

Question 9

Which of the following is required to implement a literate programming system?

✅ A programming language like R.
❌ A Unix system.
❌ A web server.
❌ A PDF viewer.

Explanation:
Literate programming integrates narrative with code execution, so a programming language is essential.

Question 10

What is one way in which the knitr system differs from Sweave?

✅ knitr allows for the use of markdown instead of LaTeX.
❌ knitr is written in Python.
❌ knitr was developed by Friedrich Leisch.
❌ knitr lacks caching features.

Explanation:
knitr (by Yihui Xie) extends Sweave — adding Markdown, caching, and multiple language support.

🧾 Summary Table

Q#	✅ Correct Answer	Key Concept
1	Independent investigator repeats study with new data	Definition of replication
2	Data publicly available	Requirement for reproducibility
3	Data + code public → identical results	Reproducible study example
4	Study too expensive to repeat	Barriers to replication
5	Big data complexity	Importance of reproducibility
6	Transforms measured → analytic data	Role of processing code
7	Combine text + code	Literate programming goal
8	Weave = readable output	Weaving concept
9	Programming language (e.g., R)	Required component
10	knitr supports Markdown	Difference between knitr and Sweave