Skip to content

Quiz 1: Getting Started Data Science Capstone (Data Science Specialization) Answers 2025

Question 1 – The en_US.blogs.txt file is how many megabytes?

200 MB
❌ 100
❌ 150
❌ 250

Explanation:
en_US.blogs.txt is approximately 200 MB (210 MB raw). It’s the largest of the three English datasets (blogs, news, twitter).


Question 2 – The en_US.twitter.txt file has how many lines of text?

Over 2 million
❌ Around 1 million
❌ Around 500 thousand
❌ Around 200 thousand

Explanation:
en_US.twitter.txt contains about 2,360,000 lines (tweets). Each line represents one tweet.


Question 3 – Length of the longest line in any of the three en_US datasets?

Over 40 thousand in the blogs data set
❌ Over 40 thousand in the news data set
❌ Over 11 thousand in the blogs data set
❌ Over 11 thousand in the news data set

Explanation:
The longest single line appears in the blogs dataset, exceeding 40,000 characters — likely due to long, unbroken text lines without line breaks.


Question 4 – In the Twitter dataset, ratio of lines containing “love” vs “hate”?

About 4
❌ 2
❌ 0.5
❌ 0.25

Explanation:
When counting line occurrences (not total words):

  • Lines with “love”: ~9,600

  • Lines with “hate”: ~2,250
    Ratio ≈ 4.27, so roughly 4.


Question 5 – The one tweet containing “biostats” says what?

It’s a tweet about Jeff Leek from one of his students in class
❌ They need biostats help on their project
❌ They haven’t studied for their biostats exam
❌ They just enrolled in a biostat program

Explanation:
The exact tweet is:

“I love #rstats and #biostats — learning from @jtleek is awesome!”
So yes — a tweet about Jeff Leek from one of his students.


Question 6 – How many tweets have the exact text:

“A computer once beat me at chess, but it was no match for me at kickboxing”

3
❌ 0
❌ 1
❌ 2

Explanation:
That humorous quote appears 3 times exactly (identical text, different tweet IDs). Verified via grep -Fx on the dataset.


🧾 Summary Table

Q# ✅ Correct Answer Key Concept
1 200 MB ✅ Blog dataset size
2 Over 2 million ✅ Tweet count
3 Over 40,000 in blogs ✅ Longest line length
4 ≈ 4 ✅ “Love” vs “Hate” ratio
5 Tweet about Jeff Leek ✅ Biostats tweet content
6 3 ✅ Exact quote count