Week 2 Quiz:Getting and Cleaning Data(Data Science Specialization):Answers2025
Question 1
Register an app and query https://api.github.com/users/jtleek/repos. What time was the datasharing repo created?
✅ 2013-11-07T13:25:07Z
❌ 2014-03-05T16:11:46Z
❌ 2013-08-28T18:18:50Z
❌ 2012-06-20T18:39:06Z
Explanation: The created_at timestamp for datasharing in the jtleek repos is 2013-11-07T13:25:07Z.
Question 2
Which sqldf command selects pwgtp1 for ages < 50?
❌ sqldf("select * from acs where AGEP <<\lt 50 and pwgtp1")
❌ sqldf("select * from acs")
✅ sqldf("select pwgtp1 from acs where AGEP < 50")
❌ sqldf("select pwgtp1 from acs")
Explanation: The correct SQL selects the pwgtp1 column with a WHERE clause AGEP < 50: select pwgtp1 from acs where AGEP < 50.
Question 3
Equivalent to unique(acs$AGEP)?
✅ sqldf("select distinct AGEP from acs")
❌ sqldf("select unique AGEP from acs")
❌ sqldf("select distinct pwgtp1 from acs")
❌ sqldf("select AGEP where unique from acs")
Explanation: SQL SELECT DISTINCT AGEP FROM acs returns the distinct values of AGEP, matching unique().
Question 4
Number of characters in lines 10, 20, 30, 100 of http://biostat.jhsph.edu/~jleek/contact.html?
✅ 45 31 7 25
❌ 45 31 2 25
❌ 43 99 7 25
❌ 45 31 7 31
❌ 43 99 8 6
❌ 45 0 2 2
❌ 45 92 7 2
Explanation: Reading the page lines and applying nchar() produces 45 31 7 25.
Question 5
Read fixed-width file and sum numbers in the 4th of 9 columns. Which is the sum?
❌ 35824.9
❌ 28893.3
❌ 101.83
❌ 36.5
✅ 32426.7
❌ 222243.1
Explanation: Correct fixed-width parsing and summing the 4th column yields 32426.7.
🧾 Summary Table
| Q# | ✅ Correct Answer | Key Concept |
|---|---|---|
| 1 | 2013-11-07T13:25:07Z | GitHub API — created_at timestamp |
| 2 | select pwgtp1 from acs where AGEP < 50 |
SQL query with WHERE filter |
| 3 | select distinct AGEP from acs |
SQL DISTINCT = R unique() |
| 4 | 45 31 7 25 |
HTML line reading + nchar() |
| 5 | 32426.7 |
Fixed-width file parsing & column sum |