Week 1 Quiz:Getting and Cleaning Data(Data Science Specialization):Answers2025

Question 1

How many properties are worth $1,000,000 or more?

✅ 53
❌ 24
❌ 31
❌ 2076

Explanation:
Using R:

In the codebook, VAL == 24 corresponds to properties worth $1,000,000 or more, and the count is 53.

Question 2

Consider the variable FES. Which tidy data principle does it violate?

✅ Tidy data has one variable per column.
❌ Tidy data has one observation per row.
❌ Each tidy data table contains information about only one type of observation.
❌ Each variable in a tidy data set has been transformed to be interpretable.

Explanation:
FES encodes multiple family types in one variable (like “Married-couple family”, “Single male”, “Single female”).
Hence, it violates the rule of “one variable per column”.

Question 3

What is the value of sum(dat$Zip*dat$Ext, na.rm=T) from the Excel dataset?

✅ 36534720
❌ 154339
❌ 33544718
❌ NA

Explanation:
Code used:

Result = 36,534,720

Question 4

How many restaurants have zipcode 21231?

✅ 127
❌ 17
❌ 100
❌ 156

Explanation:

Result = 127

Question 5

Using the data.table package, which is the fastest way to calculate the average of pwgtp15 by SEX?

✅ DT[, mean(pwgtp15), by=SEX]
❌ rowMeans(DT)[DT$SEX==1]; rowMeans(DT)[DT$SEX==2]
❌ mean(DT$pwgtp15,by=DT$SEX)
❌ mean(DT[DT$SEX==1,]$pwgtp15); mean(DT[DT$SEX==2,]$pwgtp15)
❌ tapply(DT$pwgtp15,DT$SEX,mean)
❌ sapply(split(DT$pwgtp15,DT$SEX),mean)

Explanation:
data.table syntax DT[, mean(pwgtp15), by=SEX] is vectorized and optimized in C, hence provides the fastest user time compared to base R methods like tapply or split.

🧾 Summary Table

Q#	✅ Correct Answer	Key Concept
1	53	Property count worth ≥ $1,000,000
2	One variable per column	Tidy data rule violation
3	36,534,720	Excel subset + sum calculation
4	127	XML parsing + filtering by zipcode
5	DT[, mean(pwgtp15), by=SEX]	Fastest data.table aggregation