Quiz 2:Regression Models (Data Science Specialization) Answers 2025 :
-
Question 1
Which of the following is stored in the ‘cache’ during forward propagation for later use in backward propagation?
✅ Z[l]
❌ b[l]
❌ W[l]
Explanation: The cache stores intermediate values computed during forward prop that are needed for backprop — typically Z[l] and A[l] (and sometimes W[l]/b[l] are kept in parameters, not the cache). Z[l] is the usual item put in cache for layer l.
-
Question 2
Among the following, which ones are “hyperparameters”? (Check all that apply.)
✅ size of the hidden layers n[l]
❌ activation values a[l]
❌ bias vectors b[l]
✅ number of layers L
❌ weight matrices W[l]
✅ learning rate α
✅ number of iterations
Explanation: Hyperparameters are settings chosen before training: architecture choices (number of layers, units per layer), learning rate, number of training iterations (epochs). Activation values a[l] and parameters W[l], b[l] are learned during training (not hyperparameters).
-
Question 3
Which of the following is more likely related to the early layers of a deep neural network?
✅ Low-level features (edges, simple patterns)
❌ High-level semantics (faces, objects)
❌ Final classification scores
❌ Task-specific complex features
Explanation: Early convolutional / network layers tend to learn generic, low-level features (edges, textures), while deeper layers learn higher-level, task-specific representations.
-
Question 4
We cannot use vectorization to calculate da[l] in backprop — we must use a for loop over all examples. True/False?
❌ True
✅ False
Explanation: False — backprop can (and should) be vectorized across the entire minibatch. No need for an explicit for-loop over examples; vectorized operations are used for efficiency.
-
Question 5
Givenlayer_dims = [n_x, 4, 3, 2, 1], which loop initializes parameters correctly?
✅ for i in range(1, len(layer_dims)): parameter[‘W’+str(i)] = np.random.randn(layer_dims[i], layer_dims[i-1]) * 0.01; parameter[‘b’+str(i)] = np.random.randn(layer_dims[i], 1) * 0.01
❌ for i in range(1, len(layer_dims)/2): …
❌ for i in range(1, len(layer_dims)/2): … (wrong indices)
❌ for i in range(1, len(layer_dims)): parameter[‘W’+str(i)] = np.random.randn(layer_dims[i-1], layer_dims[i]) …
Explanation: Correct shapes are W[i] -> (layer_dims[i], layer_dims[i-1]) and b[i] -> (layer_dims[i], 1). The loop from 1..len(layer_dims)-1 is the standard way.
-
Question 6
Consider the neural network pictured. How many layers does this network have?
❌ The number of layers L is 6
✅ The number of layers L is 5.
❌ The number of layers L is 4.
❌ The number of layers L is 2.
Explanation: Count input layer + hidden layers + output layer or follow the statement/context in the question — the correct selection is 5 layers (as per the diagram/context).
-
Question 7
True/False: During forward prop you compute A[l] = gl. During backward prop you calculate dA[l] from Z[l].
❌ False
✅ True
Explanation: True — backward propagation uses cached values like Z[l] (through g') when computing gradients w.r.t. activations and parameters.
-
Question 8
A shallow NN with one hidden layer and 6 units can compute any function that a NN with 2 hidden layers and 6 hidden units can compute. True/False?
❌ True
✅ False
Explanation: False — depth can provide representational power. Two hidden layers with 6 units can represent some functions more compactly than a single hidden layer with 6 units.
-
Question 9
Consider the 1-hidden-layer network (specific sizes). Which statements are True? (Check all that apply.)
✅ W[1] will have shape (4, 2)
✅ b[1] will have shape (2, 1)
✅ W[2] will have shape (1, 4)
❌ W[1] will have shape (2, 4)
✅ b[2] will have shape (1, 1)
❌ b[1] will have shape (4, 1)
❌ b[2] will have shape (4, 1)
❌ W[2] will have shape (4, 1)
Explanation: Shapes follow the rules: W[l] is (n[l], n[l-1]); b[l] is (n[l],1). The checked ones match those formulas for the described sizes.
-
Question 10
What are the dimensions of Z[1] and A[1] for the 1-hidden-layer network?
❌ Z[1] and A[1] are (4, 1)
✅ Z[1] and A[1] are (2, m)
❌ Z[1] and A[1] are (4, m)
❌ Z[1] and A[1] are (2, 1)
Explanation: For a hidden layer with 2 units and m training examples, Z[1] and A[1] are (2, m).
🧾 Summary Table
| Q# | ✅ Correct Answer | Key Concept |
|---|---|---|
| 1 | Z[1] | Cache stores intermediate Z[l] for backprop |
| 2 | size of hidden layers; number of layers; learning rate; iterations | Hyperparameters (architecture & training settings) |
| 3 | Low-level features (edges) | Early layers learn generic low-level features |
| 4 | False | Backprop can be vectorized across examples |
| 5 | for i in range(1,len(layer_dims)): W(shape=(layer_dims[i],layer_dims[i-1])) | Proper parameter initialization shapes |
| 6 | L = 5 | Count layers (input + hidden + output) |
| 7 | True | dA[l] calculation uses cached Z[l] |
| 8 | False | Depth can increase representational power |
| 9 | W[1] (4,2); b[1] (2,1); W[2] (1,4); b[2] (1,1) | Correct parameter shapes using n[l] rules |
| 10 | (2, m) | Dimensions of layer activations for 2-unit hidden layer |