Skip to content

Reading Web Data From Python :Using Python to Access Web Data (Python for Everybody Specialization) Answers 2025

Question 1

Which of the following Python data structures is most similar to the value returned in this line of Python:
x = urllib.request.urlopen('http://data.pr4e.org/romeo.txt')

file handle
❌ regular expression
❌ dictionary
❌ list
❌ socket

Explanation:
urlopen() returns a file-like object — you can use .read() or iterate over it like a file handle.


Question 2

In this Python code, which line actually reads the data?

while True:
data = mysock.recv(512)
if (len(data) < 1):

mysock.recv()
socket.socket()
mysock.close()
mysock.connect()
mysock.send()

Explanation:
recv(512) reads 512 bytes of data from the socket — that’s the actual “reading” operation.


Question 3

Which of the following regular expressions would extract the URL from this line of HTML?
<p>Please click <a href="http://www.dr-chuck.com">here</a></p>

href="(.+)"
href=".+"
http://.*
<.*>

Explanation:
href="(.+)" captures what’s inside the quotation marks — i.e., the URL.


Question 4

In this Python code, which line is most like the open() call to read a file?

mysock = socket.socket(socket.AF_INET, socket.SOCK_STREAM)

socket.socket()
mysock.connect()
import socket
mysock.recv()
mysock.send()

Explanation:
socket.socket() creates a socket object (just like open() creates a file handle).


Question 5

Which HTTP header tells the browser the kind of document that is being returned?

Content-Type:
❌ ETag:
❌ Document-Type:
❌ HTML-Document:
❌ Metadata:

Explanation:
Content-Type specifies the MIME type of the document, e.g., text/html, application/json.


Question 6

What should you check before scraping a website?

That the website allows scraping
❌ That it supports GET
❌ That it only has internal links
❌ That it returns HTML

Explanation:
Always check the site’s robots.txt or terms of service to confirm that scraping is permitted.


Question 7

What is the purpose of the BeautifulSoup Python library?

It repairs and parses HTML to make it easier for a program to understand
❌ It optimizes file retrieval
❌ It animates web operations
❌ It builds word clouds
❌ It chooses attractive skins

Explanation:
BeautifulSoup parses broken or messy HTML into a structured form that Python can navigate easily.


Question 8

What ends up in the variable x?

html = urllib.request.urlopen(url).read()
soup = BeautifulSoup(html, 'html.parser')
x = soup('a')

A list of all the anchor tags (<a ...>) in the HTML
❌ True if any anchor tags exist
❌ All CSS files
❌ All paragraphs

Explanation:
soup('a') finds all <a> tags and returns them as a list of BeautifulSoup tag objects.


Question 9

What is the most common Unicode encoding when moving data between systems?

UTF-8
❌ UTF-32
❌ UTF-128
❌ UTF-16
❌ UTF-64

Explanation:
UTF-8 is the standard for web and network data transmission — compact and backward compatible with ASCII.


Question 10

What is the ASCII character with decimal value 42?

*
+
!
/
^

Explanation:
In ASCII, decimal 42 corresponds to the asterisk (*).


Question 11

What word does this sequence represent in ASCII?
108, 105, 110, 101

line
❌ tree
❌ func
❌ ping
❌ lost

Explanation:
108→l, 105→i, 110→n, 101→e ⇒ line.


Question 12

How are strings stored internally in Python 3?

Unicode
❌ ASCII
❌ EBCDIC
❌ UTF-8
❌ Byte Code

Explanation:
In Python 3, all strings are Unicode objects — the actual in-memory representation is abstracted.


Question 13

When reading data across the network in Python 3, what must be used to convert it to the internal string format?

decode()
❌ find()
❌ upper()
❌ trim()
❌ encode()

Explanation:
Data read from a network is bytes.decode() converts it into a string (Unicode).


🧾 Summary Table

# ✅ Correct Answer Key Concept
1 file handle urlopen() returns a file-like object
2 mysock.recv() Reads bytes from socket
3 href="(.+)" Regex capture group for URL
4 socket.socket() Equivalent to open()
5 Content-Type MIME type header
6 Check scraping allowed Respect robots.txt
7 Parses & cleans HTML Purpose of BeautifulSoup
8 List of <a> tags soup('a') returns anchor tags
9 UTF-8 Universal web encoding
10 * ASCII 42
11 line ASCII translation
12 Unicode Python 3 strings
13 decode() Convert bytes → string