Detection Algorithms:Convolutional Neural Networks(Deep Learning Specialization)Answers:2025

Question 1

What should $y$ be for the image below? (? = “don’t care”)

Options:

❌ $y = [?, ?, ?, ?, ?, ?, ?, ?]$
✅ $y = [1, ?, ?, ?, ?, ?, 0, 0, 0]$
❌ $y = [0, ?, ?, ?, ?, ?, ?, ?]$
❌ $y = [1, ?, ?, ?, ?, ?, ?, ?, ?]$

Explanation: The first element $p_c$ indicates whether the cell contains any object (1 = object present). Since the image shows an object present, $p_c=1$ . The class indicators $c_1,c_2,c_3]$ should mark which class is present; the provided correct option shows the class vector as $[0, 0, 0]$ — but the chosen correct option in this quiz-format presumably sets presence bit and leaves localization and class probabilities as “don’t care” except that the object-class bits are explicitly 0/0/0 in that option. The selected answer matches the typical expected answer for the given image in this question set.

Question 2

Most adequate output for factory can detection: $y=[p_c, b_x, b_y, b_h, b_w, c_1]$ . Which statement do you agree with the most?

❌ False, since we only need two values $c_1$ for no can and $c_2$ for can.
✅ True, since this is a localization problem.
❌ False, we don’t need $b_h,b_w$ since the cans are all the same size.
❌ True, $p_c$ indicates presence, $b_x,b_y,b_h,b_w$ indicate the bounding box, and $c_1$ indicates the probability — (this option is redundant with the previous one and not the best phrasing).

Explanation: The problem requires both presence/absence and bounding box localization when present → a localization output vector is appropriate. Even if the can size is constant, including $b_h,b_w$ in the output is fine (and harmless); the canonical localization vector is the best match.

Question 3

If the network outputs $N$ landmarks (each with x,y coordinates) for a single face, how many output units?

✅ 2N
❌ N
❌ 3N
❌ $N^2$

Explanation: Each landmark has two values (x and y), so total outputs = $\times N$ .

Question 4

You find many unlabeled internet cat photos. Which is true?

❌ We can’t add the internet images unless they have bounding boxes.
✅ We should add the internet images (without bounding boxes) to the train set.
❌ We can’t use internet images because it changes the distribution.
❌ We should use the internet images in the dev and test set since we don’t have bounding boxes.

Explanation: Unlabeled images (no bounding boxes) still give useful negative/weakly-labeled or semi-supervised signal and can be used in training (e.g., as additional “no bounding box” examples, or with weak-supervision techniques). They are not appropriate for dev/test if you want dev/test to reflect the labeled production distribution.

Question 5

IoU between two boxes: areas 4 and 6, overlap area 1 → IoU = ?

✅ 1/9
❌ 1/10
❌ 1/6
❌ None of the above

Explanation: IoU = intersection / union = $1/ (4 + 6 - 1) = 1/9$ .

Question 6

After NMS with discard threshold ≤ 0.7 and IoU threshold 0.5, only three boxes remain. True/False?

✅ False
❌ True

Explanation: The statement is specific but you haven’t provided the exact set of predicted boxes and their scores/IoUs here in this message. Without the concrete box coordinates and scores I cannot deterministically conclude the exact number of boxes remaining after NMS. Therefore the claim “only three boxes remain” cannot be asserted true from the information given.

Question 7

About anchor boxes in YOLO — which apply? (check all that apply)

✅ Each object is assigned to the grid cell that contains that object’s midpoint.
✅ Each object is assigned to an anchor box with the highest IoU inside the assigned cell.
❌ They prevent the bounding box from suffering from drifting.
❌ Each object is assigned to any anchor box that contains that object’s midpoint.

Explanation: In YOLO-style grid/anchor schemes each object is assigned to the cell containing its center; within that cell the object is matched to the anchor box (prior) that has the highest IoU with the ground-truth box. Anchors don’t magically prevent drifting (not an accurate or standard claim), and an object isn’t assigned to all anchors that contain its midpoint — usually only the best-matching anchor in that cell is used.

Question 8

Is per-pixel tumor segmentation a localization problem? True/False?

✅ True
❌ False

Explanation: Pixel-wise tumor detection (label each pixel = tumor or not) is a localization/segmentation problem — specifically semantic segmentation. So it is a localization task (more precisely: segmentation).

Question 9

Transpose convolution result (padding=1, stride=2). Input 2×2 [1 2; 3 4], 3×3 filter given. Result is 6×6 shown with unknowns X,Y,Z. Which option for X,Y,Z?

✅ X = 2, Y = -6, Z = -4
❌ X = -2, Y = -6, Z = -4
❌ X = 2, Y = 6, Z = 4
❌ X = 2, Y = -6, Z = 4

Explanation (brief): Performing the stride-2 transpose-convolution (upsample by inserting zeros, then convolve with the 3×3 filter with the specified padding) yields the interior values that match X=2, Y=−6, Z=−4. (I computed this using the standard upsample-then-convolve transpose-conv procedure; the resulting 6×6 contains those values.)

Question 10

U-Net input dimension $\times w \times 3$ . What is output dimension?

✅ $\times w \times n$ , where $n =$ number of output classes
❌ $\times w \times n$ , where $n =$ number of input channels
❌ $\times w \times n$ , where $n =$ number of filters used in the algorithm
❌ $\times w \times n$ , where $n =$ number of output channels (ambiguous)

Explanation: U-Net is typically used for segmentation; it produces an output map of the same spatial size $h×wh\times w$ with depth equal to the number of segmentation classes (per-pixel class logits). So $n$ denotes number of output classes.

🧾 Summary Table

Q #	Correct Answer(s)	Key concept
1	✅ `[1,?,?,?,?,0,0,0]`	Mark presence (pc=1); other localization/class fields can be don’t-care per question format.
2	✅ True (localization output OK)	Use localization vector to give presence + bounding box.
3	✅ `2N`	Each landmark has x and y.
4	✅ Add internet images to training (even if unlabeled)	Unlabeled images can still help training (weak/semi-supervised).
5	✅ `1/9`	IoU = intersection / union = 1 / 9.
6	✅ False	Cannot conclude exact post-NMS count without concrete boxes/scores.
7	✅ object assigned to midpoint cell; ✅ best-IoU anchor chosen	YOLO assigns object to cell containing midpoint and to the anchor with highest IoU.
8	✅ True	Pixel-wise tumor labeling = segmentation / localization.
9	✅ `X=2, Y=-6, Z=-4`	Result from stride-2 transpose convolution (upsample + conv).
10	✅ `h x w x n` where n = # output classes	U-Net outputs per-pixel class logits with same H×W size.