Skip to content

Week 2 Quiz:Getting and Cleaning Data(Data Science Specialization):Answers2025

Question 1

Register an app and query https://api.github.com/users/jtleek/repos. What time was the datasharing repo created?

2013-11-07T13:25:07Z
❌ 2014-03-05T16:11:46Z
❌ 2013-08-28T18:18:50Z
❌ 2012-06-20T18:39:06Z

Explanation: The created_at timestamp for datasharing in the jtleek repos is 2013-11-07T13:25:07Z.


Question 2

Which sqldf command selects pwgtp1 for ages < 50?

sqldf("select * from acs where AGEP <<\lt 50 and pwgtp1")
sqldf("select * from acs")
sqldf("select pwgtp1 from acs where AGEP < 50")
sqldf("select pwgtp1 from acs")

Explanation: The correct SQL selects the pwgtp1 column with a WHERE clause AGEP < 50: select pwgtp1 from acs where AGEP < 50.


Question 3

Equivalent to unique(acs$AGEP)?

sqldf("select distinct AGEP from acs")
sqldf("select unique AGEP from acs")
sqldf("select distinct pwgtp1 from acs")
sqldf("select AGEP where unique from acs")

Explanation: SQL SELECT DISTINCT AGEP FROM acs returns the distinct values of AGEP, matching unique().


Question 4

Number of characters in lines 10, 20, 30, 100 of http://biostat.jhsph.edu/~jleek/contact.html?

45 31 7 25
❌ 45 31 2 25
❌ 43 99 7 25
❌ 45 31 7 31
❌ 43 99 8 6
❌ 45 0 2 2
❌ 45 92 7 2

Explanation: Reading the page lines and applying nchar() produces 45 31 7 25.


Question 5

Read fixed-width file and sum numbers in the 4th of 9 columns. Which is the sum?

❌ 35824.9
❌ 28893.3
❌ 101.83
❌ 36.5
32426.7
❌ 222243.1

Explanation: Correct fixed-width parsing and summing the 4th column yields 32426.7.


🧾 Summary Table

Q# ✅ Correct Answer Key Concept
1 2013-11-07T13:25:07Z GitHub API — created_at timestamp
2 select pwgtp1 from acs where AGEP < 50 SQL query with WHERE filter
3 select distinct AGEP from acs SQL DISTINCT = R unique()
4 45 31 7 25 HTML line reading + nchar()
5 32426.7 Fixed-width file parsing & column sum