The Basics of ConvNets:Convolutional Neural Networks(Deep Learning Specialization)Answers:2025
Question 1
What do you think applying this filter to a grayscale image will do?
Kernel: [−1−12−121211]\begin{bmatrix}-1 & -1 & 2\\-1 & 2 & 1\\2 & 1 & 1\end{bmatrix}−1−12−121211
-
❌ Detect vertical edges.
-
❌ Detecting image contrast.
-
❌ Detect horizontal edges.
-
✅ Detect 45-degree edges.
Explanation: The pattern of positive weights concentrated toward one corner and negative weights toward the opposite corner makes the kernel respond strongly to diagonal intensity changes (i.e., edges at about 45°). This kernel accentuates intensity differences along a diagonal orientation rather than purely vertical or horizontal changes.
Question 2
Input: 128 × 128 grayscale. First hidden layer: 256 fully-connected neurons. How many parameters in that hidden layer (including biases)?
-
❌ 12,582,912
-
✅ 4,194,560
-
❌ 4,194,304
-
❌ 12,583,168
Explanation: Input size = 128×128=16,384128\times128=16,384128×128=16,384. Each neuron has 16,384 weights + 1 bias = 16,385 parameters. Total = 256×16,385=4,194,560256 \times 16,385 = 4,194,560256×16,385=4,194,560.
Question 3
Input: 300×300 color (RGB). Convolutional layer: 100 filters, each 5×5. How many parameters (including biases)?
-
✅ 7,600
-
❌ 2,600
-
❌ 7,500
-
❌ 2,501
Explanation: Each filter size = 5×5×3=755\times5\times3 = 755×5×3=75 weights; plus 1 bias = 76 params per filter. For 100 filters: 100×76=7,600100\times76 = 7,600100×76=7,600.
Question 4
Input volume: 121×121×16. Use 32 filters of 4×4, stride 3, no padding. Output volume?
-
❌ 118×118×16
-
✅ 40×40×32
-
❌ 40×40×16
-
❌ 118×118×32
Explanation: Spatial output size = ⌊121−43⌋+1=40\left\lfloor\frac{121-4}{3}\right\rfloor + 1 = 40⌊3121−4⌋+1=40. Depth = number of filters = 32. So output = 40×40×32.
Question 5
Input: 15×15×8, pad = 2. Dimension after padding?
-
❌ 17×17×10
-
✅ 19×19×8
-
❌ 17×17×8
-
❌ 19×19×12
Explanation: Padding 2 adds 2 pixels on each side: new width = 15+2×2=1915 + 2\times2 = 1915+2×2=19, same for height. Depth (channels) unchanged = 8. So 19×19×8.
Question 6
Input: 63×63×16, convolve with 7×7 filters, stride 1, want “same” convolution. What is padding?
-
✅ 3
-
❌ 7
-
❌ 1
-
❌ 2
Explanation: For “same” with odd filter size fff, use p=(f−1)/2p = (f-1)/2p=(f−1)/2. Here p=(7−1)/2=3p=(7-1)/2=3p=(7−1)/2=3.
Question 7
Input: 32×32×16, apply max pooling with stride 2 and filter size 2. Output volume?
-
✅ 16×16×16
-
❌ 15×15×16
-
❌ 32×32×8
-
❌ 16×16×8
Explanation: Pooling reduces each spatial dimension by a factor of 2 when filter=2 and stride=2: 32/2=1632/2 = 1632/2=16. Depth unchanged = 16.
Question 8
Which of the following are hyperparameters of the pooling layers? (Choose all that apply)
-
❌ b[l]b^{[l]}b[l] bias.
-
❌ W[l]W^{[l]}W[l] weights.
-
✅ Stride.
-
✅ Whether it is max or average.
Explanation: Pooling layers have no learnable weights or biases. Their hyperparameters include the pooling window size (not listed), the stride, and the type (max vs. average).
Question 9
Which statements about parameter sharing in ConvNets are true? (Check all that apply)
-
✅ It allows parameters learned for one task to be shared even for a different task (transfer learning).
-
✅ It reduces the total number of parameters, thus reducing overfitting.
-
✅ It allows a feature detector to be used in multiple locations throughout the whole input image/input volume.
-
❌ It allows gradient descent to set many of the parameters to zero, thus making the connections sparse.
Explanation:
-
Sharing convolutional filters makes it possible to reuse learned features for other tasks (transfer learning).
-
Weight sharing greatly reduces parameters vs. fully connected layers, helping generalization.
-
The same filter is applied across spatial locations, so a feature detector is used everywhere.
-
However, parameter sharing does not imply GD sets many parameters to zero to create sparsity — that’s a different concept (sparsity/regularization), so the last statement is false.
Question 10
The sparsity of connections and weight sharing are mechanisms that allow us to use fewer parameters in a convolutional layer making it possible to train a network with smaller training sets. True/False?
-
❌ False
-
✅ True
Explanation: Convolutional layers use sparse connectivity (filters are small) and weight sharing (same filter at all locations). Together these reduce the number of parameters dramatically compared to fully-connected layers, which helps training with smaller datasets.
🧾 Summary Table
| Q # | Correct Answer(s) | Key concept |
|---|---|---|
| 1 | ✅ Detect 45-degree edges | Kernel emphasizes diagonal contrast → detects diagonal edges. |
| 2 | ✅ 4,194,560 | Params = neurons × (input_size + bias). |
| 3 | ✅ 7,600 | Conv params = filters × (f×f×depth + 1). |
| 4 | ✅ 40×40×32 | Output spatial size = ⌊(121−4)/3⌋ + 1 = 40, depth = 32. |
| 5 | ✅ 19×19×8 | Padding adds 2 on each side → 15+4=19. |
| 6 | ✅ 3 | same conv padding p=(f−1)/2 for odd f. |
| 7 | ✅ 16×16×16 | Pool halves spatial dims with stride=2, same depth. |
| 8 | ✅ Stride; ✅ Max vs Average | Pooling has no weights/biases; hyperparams are stride/type/window. |
| 9 | ✅ (1,2,3 true; 4 false) | Parameter sharing reduces params, enables reuse/transfer, spatial sharing. |
| 10 | ✅ True | Sparse connections + weight sharing → fewer params → easier training with less data. |