Regular Expressions :Using Python to Access Web Data (Python for Everybody Specialization) Answers 2025
Q1
Which regex would extract uct.ac.za from the string using re.findall?
-
❌
F.+: -
❌
@\S+ -
✅
@(\S+) -
❌
..@\S+..
Explanation: @\S+ would match @uct.ac.za (including the @). Using a capturing group @(\S+) returns only the captured part — uct.ac.za — which is what the question asks for.
Q2
Which is the way to match the start of a line in a regex?
-
✅
^ -
❌
str.startswith() -
❌
\linestart -
❌
String.startsWith() -
❌
variable[0:1]
Explanation: The caret ^ anchors the pattern to the start of the line.
Q3
What does [a-z0-9] mean in a regex?
-
❌ Match anything but a lowercase letter or digit
-
❌ Match any text that is surrounded by square braces
-
❌ Match an entire line as long as it is lowercase letters or digits
-
✅ Match a lowercase letter or a digit
-
❌ Match any number of lowercase letters followed by any number of digits
Explanation: Bracket expressions list possible single characters to match. [a-z0-9] matches one character that is either a lowercase letter a–z or a digit 0–9.
Q4
What type does re.findall() return?
-
❌ A boolean
-
✅ A list of strings
-
❌ A single character
-
❌ An integer
-
❌ A string
Explanation: re.findall() returns a list of all non-overlapping matches (strings). If the pattern contains capturing groups, it returns the group contents.
Q5
What is the regex “wild card” (matches any character)?
-
❌
+ -
❌
^ -
❌
$ -
❌
* -
❌
? -
✅
.
Explanation: The dot . matches any single character except a newline (unless flags change that).
Q6
Difference between + and * in regex:
-
✅
+matches at least one and*matches zero or more -
❌
+matches upper case etc. -
❌ other incorrect options
Explanation: a+ requires one or more a; a* allows zero or more a.
Q7
What does [0-9]+ match?
-
❌ Several digits followed by a plus sign
-
❌ Any mathematical expression
-
✅ One or more digits
-
❌ Zero or more digits
-
❌ Any number of digits at the beginning of a line
Explanation: + means one or more, so [0-9]+ matches a sequence of one or more digits.
Q8
What does this print?
-
✅
['From: Using the :'] -
❌
^F.+: -
❌
From: -
❌
['From:'] -
❌
:
Explanation: ^F.+: starts at the F, then .+ is greedy and stretches to the last : in the string, so the match is 'From: Using the :'. findall returns a list with that string.
Q9
What do you add to + or * to make the match non-greedy?
-
❌
** -
❌
\g -
❌
^ -
❌
$ -
✅
? -
❌
++
Explanation: +? or *? makes the quantifier non-greedy (match as little as possible).
Q10
Given the lineFrom stephen.marquard@uct.ac.za Sat Jan 5 09:14:16 2008
what would '\S+?@\S+' match?
-
❌
\@\ -
✅
stephen.marquard@uct.ac.za -
❌
d@uct.ac.za -
❌
From -
❌
marquard@uct
Explanation: \S+? (non-greedy) matches minimal non-whitespace up to @, then @\S+ continues to match the rest of the non-whitespace domain — together they capture the full email stephen.marquard@uct.ac.za.
🧾 Summary Table
| Q | Correct answer | Key concept |
|---|---|---|
| 1 | @(\S+) |
Use capturing group to extract substring without @ |
| 2 | ^ |
Start-of-line anchor |
| 3 | [a-z0-9] → lowercase letter or digit |
Character class matches one char |
| 4 | list of strings |
re.findall() returns a list |
| 5 | . |
Wildcard (any character) |
| 6 | + = 1+, * = 0+ |
Quantifier difference |
| 7 | [0-9]+ = one or more digits |
Digit quantifier |
| 8 | ['From: Using the :'] |
Greedy .+ matches to last : |
| 9 | ? |
Makes quantifiers non-greedy (+?, *?) |
| 10 | stephen.marquard@uct.ac.za |
\S+?@\S+ matches email-like non-whitespace |