Quiz 2:Regression Models (Data Science Specialization) Answers 2025 :

Question 1
Which of the following is stored in the ‘cache’ during forward propagation for later use in backward propagation?

✅ Z[l]
❌ b[l]
❌ W[l]

Explanation: The cache stores intermediate values computed during forward prop that are needed for backprop — typically Z[l] and A[l] (and sometimes W[l]/b[l] are kept in parameters, not the cache). Z[l] is the usual item put in cache for layer l.

Question 2
Among the following, which ones are “hyperparameters”? (Check all that apply.)

✅ size of the hidden layers n[l]
❌ activation values a[l]
❌ bias vectors b[l]
✅ number of layers L
❌ weight matrices W[l]
✅ learning rate α
✅ number of iterations

Explanation: Hyperparameters are settings chosen before training: architecture choices (number of layers, units per layer), learning rate, number of training iterations (epochs). Activation values a[l] and parameters W[l], b[l] are learned during training (not hyperparameters).

Question 3
Which of the following is more likely related to the early layers of a deep neural network?

✅ Low-level features (edges, simple patterns)
❌ High-level semantics (faces, objects)
❌ Final classification scores
❌ Task-specific complex features

Explanation: Early convolutional / network layers tend to learn generic, low-level features (edges, textures), while deeper layers learn higher-level, task-specific representations.

Question 4
We cannot use vectorization to calculate da[l] in backprop — we must use a for loop over all examples. True/False?

❌ True
✅ False

Explanation: False — backprop can (and should) be vectorized across the entire minibatch. No need for an explicit for-loop over examples; vectorized operations are used for efficiency.

Question 5
Given layer_dims = [n_x, 4, 3, 2, 1], which loop initializes parameters correctly?

✅ for i in range(1, len(layer_dims)): parameter[‘W’+str(i)] = np.random.randn(layer_dims[i], layer_dims[i-1]) * 0.01; parameter[‘b’+str(i)] = np.random.randn(layer_dims[i], 1) * 0.01

❌ for i in range(1, len(layer_dims)/2): …
❌ for i in range(1, len(layer_dims)/2): … (wrong indices)
❌ for i in range(1, len(layer_dims)): parameter[‘W’+str(i)] = np.random.randn(layer_dims[i-1], layer_dims[i]) …

Explanation: Correct shapes are W[i] -> (layer_dims[i], layer_dims[i-1]) and b[i] -> (layer_dims[i], 1). The loop from 1..len(layer_dims)-1 is the standard way.

Question 6
Consider the neural network pictured. How many layers does this network have?

❌ The number of layers L is 6
✅ The number of layers L is 5.
❌ The number of layers L is 4.
❌ The number of layers L is 2.

Explanation: Count input layer + hidden layers + output layer or follow the statement/context in the question — the correct selection is 5 layers (as per the diagram/context).

Question 7
True/False: During forward prop you compute A[l] = gl. During backward prop you calculate dA[l] from Z[l].

❌ False
✅ True

Explanation: True — backward propagation uses cached values like Z[l] (through g') when computing gradients w.r.t. activations and parameters.

Question 8
A shallow NN with one hidden layer and 6 units can compute any function that a NN with 2 hidden layers and 6 hidden units can compute. True/False?

❌ True
✅ False

Explanation: False — depth can provide representational power. Two hidden layers with 6 units can represent some functions more compactly than a single hidden layer with 6 units.

Question 9
Consider the 1-hidden-layer network (specific sizes). Which statements are True? (Check all that apply.)

✅ W[1] will have shape (4, 2)
✅ b[1] will have shape (2, 1)
✅ W[2] will have shape (1, 4)
❌ W[1] will have shape (2, 4)
✅ b[2] will have shape (1, 1)
❌ b[1] will have shape (4, 1)
❌ b[2] will have shape (4, 1)
❌ W[2] will have shape (4, 1)

Explanation: Shapes follow the rules: W[l] is (n[l], n[l-1]); b[l] is (n[l],1). The checked ones match those formulas for the described sizes.

Question 10
What are the dimensions of Z[1] and A[1] for the 1-hidden-layer network?

❌ Z[1] and A[1] are (4, 1)
✅ Z[1] and A[1] are (2, m)
❌ Z[1] and A[1] are (4, m)
❌ Z[1] and A[1] are (2, 1)

Explanation: For a hidden layer with 2 units and m training examples, Z[1] and A[1] are (2, m).

🧾 Summary Table

Q#	✅ Correct Answer	Key Concept
1	Z[1]	Cache stores intermediate `Z[l]` for backprop
2	size of hidden layers; number of layers; learning rate; iterations	Hyperparameters (architecture & training settings)
3	Low-level features (edges)	Early layers learn generic low-level features
4	False	Backprop can be vectorized across examples
5	for i in range(1,len(layer_dims)): W(shape=(layer_dims[i],layer_dims[i-1]))	Proper parameter initialization shapes
6	L = 5	Count layers (input + hidden + output)
7	True	dA[l] calculation uses cached Z[l]
8	False	Depth can increase representational power
9	W[1] (4,2); b[1] (2,1); W[2] (1,4); b[2] (1,1)	Correct parameter shapes using n[l] rules
10	(2, m)	Dimensions of layer activations for 2-unit hidden layer