The Basics of ConvNets:Convolutional Neural Networks(Deep Learning Specialization)Answers:2025

Question 1

What do you think applying this filter to a grayscale image will do?
Kernel: $[−1−12−121211]\begin{bmatrix}-1 & -1 & 2\\-1 & 2 & 1\\2 & 1 & 1\end{bmatrix}$

❌ Detect vertical edges.
❌ Detecting image contrast.
❌ Detect horizontal edges.
✅ Detect 45-degree edges.

Explanation: The pattern of positive weights concentrated toward one corner and negative weights toward the opposite corner makes the kernel respond strongly to diagonal intensity changes (i.e., edges at about 45°). This kernel accentuates intensity differences along a diagonal orientation rather than purely vertical or horizontal changes.

Question 2

Input: 128 × 128 grayscale. First hidden layer: 256 fully-connected neurons. How many parameters in that hidden layer (including biases)?

❌ 12,582,912
✅ 4,194,560
❌ 4,194,304
❌ 12,583,168

Explanation: Input size = $128×128=16,384128\times128=16,384$ . Each neuron has 16,384 weights + 1 bias = 16,385 parameters. Total = $256 \times 16,385 = 4,194,560$ .

Question 3

Input: 300×300 color (RGB). Convolutional layer: 100 filters, each 5×5. How many parameters (including biases)?

✅ 7,600
❌ 2,600
❌ 7,500
❌ 2,501

Explanation: Each filter size = $5×5×3=755\times5\times3 = 75$ weights; plus 1 bias = 76 params per filter. For 100 filters: $100×76=7,600100\times76 = 7,600$ .

Question 4

Input volume: 121×121×16. Use 32 filters of 4×4, stride 3, no padding. Output volume?

❌ 118×118×16
✅ 40×40×32
❌ 40×40×16
❌ 118×118×32

Explanation: Spatial output size = $⌊121−43⌋+1=40\left\lfloor\frac{121-4}{3}\right\rfloor + 1 = 40$ . Depth = number of filters = 32. So output = 40×40×32.

Question 5

Input: 15×15×8, pad = 2. Dimension after padding?

❌ 17×17×10
✅ 19×19×8
❌ 17×17×8
❌ 19×19×12

Explanation: Padding 2 adds 2 pixels on each side: new width = $2\times2 = 19$ , same for height. Depth (channels) unchanged = 8. So 19×19×8.

Question 6

Input: 63×63×16, convolve with 7×7 filters, stride 1, want “same” convolution. What is padding?

✅ 3
❌ 7
❌ 1
❌ 2

Explanation: For “same” with odd filter size $f$ , use $p = (f - 1) /2$ . Here $p = (7 - 1) /2 = 3$ .

Question 7

Input: 32×32×16, apply max pooling with stride 2 and filter size 2. Output volume?

✅ 16×16×16
❌ 15×15×16
❌ 32×32×8
❌ 16×16×8

Explanation: Pooling reduces each spatial dimension by a factor of 2 when filter=2 and stride=2: $32/2 = 16$ . Depth unchanged = 16.

Question 8

Which of the following are hyperparameters of the pooling layers? (Choose all that apply)

❌ $b^{[l]}$ bias.
❌ $W^{[l]}$ weights.
✅ Stride.
✅ Whether it is max or average.

Explanation: Pooling layers have no learnable weights or biases. Their hyperparameters include the pooling window size (not listed), the stride, and the type (max vs. average).

Question 9

Which statements about parameter sharing in ConvNets are true? (Check all that apply)

✅ It allows parameters learned for one task to be shared even for a different task (transfer learning).
✅ It reduces the total number of parameters, thus reducing overfitting.
✅ It allows a feature detector to be used in multiple locations throughout the whole input image/input volume.
❌ It allows gradient descent to set many of the parameters to zero, thus making the connections sparse.

Explanation:

Sharing convolutional filters makes it possible to reuse learned features for other tasks (transfer learning).
Weight sharing greatly reduces parameters vs. fully connected layers, helping generalization.
The same filter is applied across spatial locations, so a feature detector is used everywhere.
However, parameter sharing does not imply GD sets many parameters to zero to create sparsity — that’s a different concept (sparsity/regularization), so the last statement is false.

Question 10

The sparsity of connections and weight sharing are mechanisms that allow us to use fewer parameters in a convolutional layer making it possible to train a network with smaller training sets. True/False?

❌ False
✅ True

Explanation: Convolutional layers use sparse connectivity (filters are small) and weight sharing (same filter at all locations). Together these reduce the number of parameters dramatically compared to fully-connected layers, which helps training with smaller datasets.

🧾 Summary Table

Q #	Correct Answer(s)	Key concept
1	✅ Detect 45-degree edges	Kernel emphasizes diagonal contrast → detects diagonal edges.
2	✅ 4,194,560	Params = neurons × (input_size + bias).
3	✅ 7,600	Conv params = filters × (f×f×depth + 1).
4	✅ 40×40×32	Output spatial size = ⌊(121−4)/3⌋ + 1 = 40, depth = 32.
5	✅ 19×19×8	Padding adds 2 on each side → 15+4=19.
6	✅ 3	same conv padding p=(f−1)/2 for odd f.
7	✅ 16×16×16	Pool halves spatial dims with stride=2, same depth.
8	✅ Stride; ✅ Max vs Average	Pooling has no weights/biases; hyperparams are stride/type/window.
9	✅ (1,2,3 true; 4 false)	Parameter sharing reduces params, enables reuse/transfer, spatial sharing.
10	✅ True	Sparse connections + weight sharing → fewer params → easier training with less data.