Reading Web Data From Python :Using Python to Access Web Data (Python for Everybody Specialization) Answers 2025

Question 1

Which of the following Python data structures is most similar to the value returned in this line of Python:
x = urllib.request.urlopen('http://data.pr4e.org/romeo.txt')

✅ file handle
❌ regular expression
❌ dictionary
❌ list
❌ socket

Explanation:
urlopen() returns a file-like object — you can use .read() or iterate over it like a file handle.

Question 2

In this Python code, which line actually reads the data?

✅ mysock.recv()
❌ socket.socket()
❌ mysock.close()
❌ mysock.connect()
❌ mysock.send()

Explanation:
recv(512) reads 512 bytes of data from the socket — that’s the actual “reading” operation.

Question 3

Which of the following regular expressions would extract the URL from this line of HTML?
<p>Please click <a href="http://www.dr-chuck.com">here</a></p>

✅ href="(.+)"
❌ href=".+"
❌ http://.*
❌ <.*>

Explanation:
href="(.+)" captures what’s inside the quotation marks — i.e., the URL.

Question 4

In this Python code, which line is most like the open() call to read a file?

✅ socket.socket()
❌ mysock.connect()
❌ import socket
❌ mysock.recv()
❌ mysock.send()

Explanation:
socket.socket() creates a socket object (just like open() creates a file handle).

Question 5

Which HTTP header tells the browser the kind of document that is being returned?

✅ Content-Type:
❌ ETag:
❌ Document-Type:
❌ HTML-Document:
❌ Metadata:

Explanation:
Content-Type specifies the MIME type of the document, e.g., text/html, application/json.

Question 6

What should you check before scraping a website?

✅ That the website allows scraping
❌ That it supports GET
❌ That it only has internal links
❌ That it returns HTML

Explanation:
Always check the site’s robots.txt or terms of service to confirm that scraping is permitted.

Question 7

What is the purpose of the BeautifulSoup Python library?

✅ It repairs and parses HTML to make it easier for a program to understand
❌ It optimizes file retrieval
❌ It animates web operations
❌ It builds word clouds
❌ It chooses attractive skins

Explanation:
BeautifulSoup parses broken or messy HTML into a structured form that Python can navigate easily.

Question 8

What ends up in the variable x?

✅ A list of all the anchor tags (<a ...>) in the HTML
❌ True if any anchor tags exist
❌ All CSS files
❌ All paragraphs

Explanation:
soup('a') finds all <a> tags and returns them as a list of BeautifulSoup tag objects.

Question 9

What is the most common Unicode encoding when moving data between systems?

✅ UTF-8
❌ UTF-32
❌ UTF-128
❌ UTF-16
❌ UTF-64

Explanation:
UTF-8 is the standard for web and network data transmission — compact and backward compatible with ASCII.

Question 10

What is the ASCII character with decimal value 42?

✅ *
❌ +
❌ !
❌ /
❌ ^

Explanation:
In ASCII, decimal 42 corresponds to the asterisk (*).

Question 11

What word does this sequence represent in ASCII?
108, 105, 110, 101

✅ line
❌ tree
❌ func
❌ ping
❌ lost

Explanation:
108→l, 105→i, 110→n, 101→e ⇒ line.

Question 12

How are strings stored internally in Python 3?

✅ Unicode
❌ ASCII
❌ EBCDIC
❌ UTF-8
❌ Byte Code

Explanation:
In Python 3, all strings are Unicode objects — the actual in-memory representation is abstracted.

Question 13

When reading data across the network in Python 3, what must be used to convert it to the internal string format?

✅ decode()
❌ find()
❌ upper()
❌ trim()
❌ encode()

Explanation:
Data read from a network is bytes — .decode() converts it into a string (Unicode).

🧾 Summary Table

#	✅ Correct Answer	Key Concept
1	file handle	`urlopen()` returns a file-like object
2	`mysock.recv()`	Reads bytes from socket
3	`href="(.+)"`	Regex capture group for URL
4	`socket.socket()`	Equivalent to `open()`
5	Content-Type	MIME type header
6	Check scraping allowed	Respect robots.txt
7	Parses & cleans HTML	Purpose of BeautifulSoup
8	List of `<a>` tags	`soup('a')` returns anchor tags
9	UTF-8	Universal web encoding
10	`*`	ASCII 42
11	line	ASCII translation
12	Unicode	Python 3 strings
13	decode()	Convert bytes → string