Detection Algorithms:Convolutional Neural Networks(Deep Learning Specialization)Answers:2025
Question 1
What should yyy be for the image below? (? = “don’t care”)
Options:
-
❌ y=[?,?,?,?,?,?,?,?]y = [?, ?, ?, ?, ?, ?, ?, ?]y=[?,?,?,?,?,?,?,?]
-
✅ y=[1,?,?,?,?,?,0,0,0]y = [1, ?, ?, ?, ?, ?, 0, 0, 0]y=[1,?,?,?,?,?,0,0,0]
-
❌ y=[0,?,?,?,?,?,?,?]y = [0, ?, ?, ?, ?, ?, ?, ?]y=[0,?,?,?,?,?,?,?]
-
❌ y=[1,?,?,?,?,?,?,?,?]y = [1, ?, ?, ?, ?, ?, ?, ?, ?]y=[1,?,?,?,?,?,?,?,?]
Explanation: The first element pcp_cpc indicates whether the cell contains any object (1 = object present). Since the image shows an object present, pc=1p_c=1pc=1. The class indicators [c1,c2,c3][c_1,c_2,c_3][c1,c2,c3] should mark which class is present; the provided correct option shows the class vector as [0,0,0][0,0,0][0,0,0] — but the chosen correct option in this quiz-format presumably sets presence bit and leaves localization and class probabilities as “don’t care” except that the object-class bits are explicitly 0/0/0 in that option. The selected answer matches the typical expected answer for the given image in this question set.
Question 2
Most adequate output for factory can detection: y=[pc,bx,by,bh,bw,c1]y=[p_c, b_x, b_y, b_h, b_w, c_1]y=[pc,bx,by,bh,bw,c1]. Which statement do you agree with the most?
-
❌ False, since we only need two values c1c_1c1 for no can and c2c_2c2 for can.
-
✅ True, since this is a localization problem.
-
❌ False, we don’t need bh,bwb_h,b_wbh,bw since the cans are all the same size.
-
❌ True, pcp_cpc indicates presence, bx,by,bh,bwb_x,b_y,b_h,b_wbx,by,bh,bw indicate the bounding box, and c1c_1c1 indicates the probability — (this option is redundant with the previous one and not the best phrasing).
Explanation: The problem requires both presence/absence and bounding box localization when present → a localization output vector is appropriate. Even if the can size is constant, including bh,bwb_h,b_wbh,bw in the output is fine (and harmless); the canonical localization vector is the best match.
Question 3
If the network outputs NNN landmarks (each with x,y coordinates) for a single face, how many output units?
-
✅ 2N
-
❌ N
-
❌ 3N
-
❌ N2N^2N2
Explanation: Each landmark has two values (x and y), so total outputs = 2×N2 \times N2×N.
Question 4
You find many unlabeled internet cat photos. Which is true?
-
❌ We can’t add the internet images unless they have bounding boxes.
-
✅ We should add the internet images (without bounding boxes) to the train set.
-
❌ We can’t use internet images because it changes the distribution.
-
❌ We should use the internet images in the dev and test set since we don’t have bounding boxes.
Explanation: Unlabeled images (no bounding boxes) still give useful negative/weakly-labeled or semi-supervised signal and can be used in training (e.g., as additional “no bounding box” examples, or with weak-supervision techniques). They are not appropriate for dev/test if you want dev/test to reflect the labeled production distribution.
Question 5
IoU between two boxes: areas 4 and 6, overlap area 1 → IoU = ?
-
✅ 1/9
-
❌ 1/10
-
❌ 1/6
-
❌ None of the above
Explanation: IoU = intersection / union = 1/(4+6−1)=1/91 / (4+6-1) = 1/91/(4+6−1)=1/9.
Question 6
After NMS with discard threshold ≤ 0.7 and IoU threshold 0.5, only three boxes remain. True/False?
-
✅ False
-
❌ True
Explanation: The statement is specific but you haven’t provided the exact set of predicted boxes and their scores/IoUs here in this message. Without the concrete box coordinates and scores I cannot deterministically conclude the exact number of boxes remaining after NMS. Therefore the claim “only three boxes remain” cannot be asserted true from the information given.
Question 7
About anchor boxes in YOLO — which apply? (check all that apply)
-
✅ Each object is assigned to the grid cell that contains that object’s midpoint.
-
✅ Each object is assigned to an anchor box with the highest IoU inside the assigned cell.
-
❌ They prevent the bounding box from suffering from drifting.
-
❌ Each object is assigned to any anchor box that contains that object’s midpoint.
Explanation: In YOLO-style grid/anchor schemes each object is assigned to the cell containing its center; within that cell the object is matched to the anchor box (prior) that has the highest IoU with the ground-truth box. Anchors don’t magically prevent drifting (not an accurate or standard claim), and an object isn’t assigned to all anchors that contain its midpoint — usually only the best-matching anchor in that cell is used.
Question 8
Is per-pixel tumor segmentation a localization problem? True/False?
-
✅ True
-
❌ False
Explanation: Pixel-wise tumor detection (label each pixel = tumor or not) is a localization/segmentation problem — specifically semantic segmentation. So it is a localization task (more precisely: segmentation).
Question 9
Transpose convolution result (padding=1, stride=2). Input 2×2 [1 2; 3 4], 3×3 filter given. Result is 6×6 shown with unknowns X,Y,Z. Which option for X,Y,Z?
-
✅ X = 2, Y = -6, Z = -4
-
❌ X = -2, Y = -6, Z = -4
-
❌ X = 2, Y = 6, Z = 4
-
❌ X = 2, Y = -6, Z = 4
Explanation (brief): Performing the stride-2 transpose-convolution (upsample by inserting zeros, then convolve with the 3×3 filter with the specified padding) yields the interior values that match X=2, Y=−6, Z=−4. (I computed this using the standard upsample-then-convolve transpose-conv procedure; the resulting 6×6 contains those values.)
Question 10
U-Net input dimension h×w×3h \times w \times 3h×w×3. What is output dimension?
-
✅ h×w×nh \times w \times nh×w×n, where n=n=n= number of output classes
-
❌ h×w×nh \times w \times nh×w×n, where n=n=n= number of input channels
-
❌ h×w×nh \times w \times nh×w×n, where n=n=n= number of filters used in the algorithm
-
❌ h×w×nh \times w \times nh×w×n, where n=n=n= number of output channels (ambiguous)
Explanation: U-Net is typically used for segmentation; it produces an output map of the same spatial size h×wh\times wh×w with depth equal to the number of segmentation classes (per-pixel class logits). So nnn denotes number of output classes.
🧾 Summary Table
| Q # | Correct Answer(s) | Key concept |
|---|---|---|
| 1 | ✅ [1,?,?,?,?,0,0,0] |
Mark presence (pc=1); other localization/class fields can be don’t-care per question format. |
| 2 | ✅ True (localization output OK) | Use localization vector to give presence + bounding box. |
| 3 | ✅ 2N |
Each landmark has x and y. |
| 4 | ✅ Add internet images to training (even if unlabeled) | Unlabeled images can still help training (weak/semi-supervised). |
| 5 | ✅ 1/9 |
IoU = intersection / union = 1 / 9. |
| 6 | ✅ False | Cannot conclude exact post-NMS count without concrete boxes/scores. |
| 7 | ✅ object assigned to midpoint cell; ✅ best-IoU anchor chosen | YOLO assigns object to cell containing midpoint and to the anchor with highest IoU. |
| 8 | ✅ True | Pixel-wise tumor labeling = segmentation / localization. |
| 9 | ✅ X=2, Y=-6, Z=-4 |
Result from stride-2 transpose convolution (upsample + conv). |
| 10 | ✅ h x w x n where n = # output classes |
U-Net outputs per-pixel class logits with same H×W size. |