Skip to content

Quiz 2:Regression Models (Data Science Specialization) Answers 2025 :


  1. Question 1
    Which of the following is stored in the ‘cache’ during forward propagation for later use in backward propagation?

Z[l]
❌ b[l]
❌ W[l]

Explanation: The cache stores intermediate values computed during forward prop that are needed for backprop — typically Z[l] and A[l] (and sometimes W[l]/b[l] are kept in parameters, not the cache). Z[l] is the usual item put in cache for layer l.


  1. Question 2
    Among the following, which ones are “hyperparameters”? (Check all that apply.)

size of the hidden layers n[l]
❌ activation values a[l]
❌ bias vectors b[l]
number of layers L
❌ weight matrices W[l]
learning rate α
number of iterations

Explanation: Hyperparameters are settings chosen before training: architecture choices (number of layers, units per layer), learning rate, number of training iterations (epochs). Activation values a[l] and parameters W[l], b[l] are learned during training (not hyperparameters).


  1. Question 3
    Which of the following is more likely related to the early layers of a deep neural network?

Low-level features (edges, simple patterns)
❌ High-level semantics (faces, objects)
❌ Final classification scores
❌ Task-specific complex features

Explanation: Early convolutional / network layers tend to learn generic, low-level features (edges, textures), while deeper layers learn higher-level, task-specific representations.


  1. Question 4
    We cannot use vectorization to calculate da[l] in backprop — we must use a for loop over all examples. True/False?

True
False

Explanation: False — backprop can (and should) be vectorized across the entire minibatch. No need for an explicit for-loop over examples; vectorized operations are used for efficiency.


  1. Question 5
    Given layer_dims = [n_x, 4, 3, 2, 1], which loop initializes parameters correctly?

for i in range(1, len(layer_dims)): parameter[‘W’+str(i)] = np.random.randn(layer_dims[i], layer_dims[i-1]) * 0.01; parameter[‘b’+str(i)] = np.random.randn(layer_dims[i], 1) * 0.01

❌ for i in range(1, len(layer_dims)/2): …
❌ for i in range(1, len(layer_dims)/2): … (wrong indices)
❌ for i in range(1, len(layer_dims)): parameter[‘W’+str(i)] = np.random.randn(layer_dims[i-1], layer_dims[i]) …

Explanation: Correct shapes are W[i] -> (layer_dims[i], layer_dims[i-1]) and b[i] -> (layer_dims[i], 1). The loop from 1..len(layer_dims)-1 is the standard way.


  1. Question 6
    Consider the neural network pictured. How many layers does this network have?

❌ The number of layers L is 6
The number of layers L is 5.
❌ The number of layers L is 4.
❌ The number of layers L is 2.

Explanation: Count input layer + hidden layers + output layer or follow the statement/context in the question — the correct selection is 5 layers (as per the diagram/context).


  1. Question 7
    True/False: During forward prop you compute A[l] = gl. During backward prop you calculate dA[l] from Z[l].

❌ False
True

Explanation: True — backward propagation uses cached values like Z[l] (through g') when computing gradients w.r.t. activations and parameters.


  1. Question 8
    A shallow NN with one hidden layer and 6 units can compute any function that a NN with 2 hidden layers and 6 hidden units can compute. True/False?

❌ True
False

Explanation: False — depth can provide representational power. Two hidden layers with 6 units can represent some functions more compactly than a single hidden layer with 6 units.


  1. Question 9
    Consider the 1-hidden-layer network (specific sizes). Which statements are True? (Check all that apply.)

W[1] will have shape (4, 2)
b[1] will have shape (2, 1)
W[2] will have shape (1, 4)
❌ W[1] will have shape (2, 4)
b[2] will have shape (1, 1)
❌ b[1] will have shape (4, 1)
❌ b[2] will have shape (4, 1)
❌ W[2] will have shape (4, 1)

Explanation: Shapes follow the rules: W[l] is (n[l], n[l-1]); b[l] is (n[l],1). The checked ones match those formulas for the described sizes.


  1. Question 10
    What are the dimensions of Z[1] and A[1] for the 1-hidden-layer network?

❌ Z[1] and A[1] are (4, 1)
Z[1] and A[1] are (2, m)
❌ Z[1] and A[1] are (4, m)
❌ Z[1] and A[1] are (2, 1)

Explanation: For a hidden layer with 2 units and m training examples, Z[1] and A[1] are (2, m).


🧾 Summary Table

Q# ✅ Correct Answer Key Concept
1 Z[1] Cache stores intermediate Z[l] for backprop
2 size of hidden layers; number of layers; learning rate; iterations Hyperparameters (architecture & training settings)
3 Low-level features (edges) Early layers learn generic low-level features
4 False Backprop can be vectorized across examples
5 for i in range(1,len(layer_dims)): W(shape=(layer_dims[i],layer_dims[i-1])) Proper parameter initialization shapes
6 L = 5 Count layers (input + hidden + output)
7 True dA[l] calculation uses cached Z[l]
8 False Depth can increase representational power
9 W[1] (4,2); b[1] (2,1); W[2] (1,4); b[2] (1,1) Correct parameter shapes using n[l] rules
10 (2, m) Dimensions of layer activations for 2-unit hidden layer