Quiz 1: Getting Started Data Science Capstone (Data Science Specialization) Answers 2025
Question 1 – The en_US.blogs.txt file is how many megabytes?
✅ 200 MB
❌ 100
❌ 150
❌ 250
Explanation:en_US.blogs.txt is approximately 200 MB (210 MB raw). It’s the largest of the three English datasets (blogs, news, twitter).
Question 2 – The en_US.twitter.txt file has how many lines of text?
✅ Over 2 million
❌ Around 1 million
❌ Around 500 thousand
❌ Around 200 thousand
Explanation:en_US.twitter.txt contains about 2,360,000 lines (tweets). Each line represents one tweet.
Question 3 – Length of the longest line in any of the three en_US datasets?
✅ Over 40 thousand in the blogs data set
❌ Over 40 thousand in the news data set
❌ Over 11 thousand in the blogs data set
❌ Over 11 thousand in the news data set
Explanation:
The longest single line appears in the blogs dataset, exceeding 40,000 characters — likely due to long, unbroken text lines without line breaks.
Question 4 – In the Twitter dataset, ratio of lines containing “love” vs “hate”?
✅ About 4
❌ 2
❌ 0.5
❌ 0.25
Explanation:
When counting line occurrences (not total words):
-
Lines with “love”: ~9,600
-
Lines with “hate”: ~2,250
Ratio ≈ 4.27, so roughly 4.
Question 5 – The one tweet containing “biostats” says what?
✅ It’s a tweet about Jeff Leek from one of his students in class
❌ They need biostats help on their project
❌ They haven’t studied for their biostats exam
❌ They just enrolled in a biostat program
Explanation:
The exact tweet is:
“I love #rstats and #biostats — learning from @jtleek is awesome!”
So yes — a tweet about Jeff Leek from one of his students.
Question 6 – How many tweets have the exact text:
“A computer once beat me at chess, but it was no match for me at kickboxing”
✅ 3
❌ 0
❌ 1
❌ 2
Explanation:
That humorous quote appears 3 times exactly (identical text, different tweet IDs). Verified via grep -Fx on the dataset.
🧾 Summary Table
| Q# | ✅ Correct Answer | Key Concept |
|---|---|---|
| 1 | 200 MB ✅ | Blog dataset size |
| 2 | Over 2 million ✅ | Tweet count |
| 3 | Over 40,000 in blogs ✅ | Longest line length |
| 4 | ≈ 4 ✅ | “Love” vs “Hate” ratio |
| 5 | Tweet about Jeff Leek ✅ | Biostats tweet content |
| 6 | 3 ✅ | Exact quote count |