Module 2 Graded Quiz: Basics of Deep Learning :Introduction to Deep Learning & Neural Networks with Keras (IBM AI Engineering Professional Certificate) Answers 2025
1. Question 1
Which algorithm updates weights & biases by minimizing cost iteratively?
-
❌ Logistic descent algorithm
-
❌ Vanishing gradient algorithm
-
✅ Gradient descent algorithm
-
❌ Activation function algorithm
Explanation:
Gradient descent adjusts weights by following the negative gradient of the loss.
2. Question 2
Correct weight update rule for gradient descent?
-
❌ w → w + η * ∂J/∂w
-
✅ w → w – η * ∂J/∂w
-
❌ w → w – η * b * ∂J/∂w
-
❌ w → w + b – η * ∂J/∂w
Explanation:
Weights move against the gradient direction to reduce cost.
3. Question 3
Activation function = zero for negative inputs, x for positive inputs:
-
❌ Softmax
-
❌ Sigmoid
-
✅ ReLU function
-
❌ Tanh
-
❌ Linear
Explanation:
ReLU(x) = max(0, x).
4. Question 4
Activation function returns negative for negative & positive for positive, centered at zero:
-
❌ Binary step
-
❌ Sigmoid
-
❌ ReLU
-
✅ Hyperbolic tangent (tanh)
Explanation:
tanh(x) outputs in the range [-1, 1].
5. Question 5
Where to use softmax in a 10-class classifier?
-
❌ Custom-defined layer
-
❌ Hidden layer
-
✅ Output layer
-
❌ Input layer
Explanation:
Softmax converts output logits into class probabilities.
6. Question 6
Correct backpropagation sequence?
-
❌ a, c, b, d
-
❌ a, b, c, d
-
✅ c, a, b, d
-
❌ c, b, a, d
Explanation:
-
Forward pass
-
Compute loss
-
Backprop gradients
-
Update until convergence
7. Question 7
Why deep layers fail to learn?
-
❌ Activation saturation
-
❌ Overfitting
-
✅ Exponential decay of gradients (vanishing gradients)
-
❌ Exploding gradients
Explanation:
Gradients shrink as they move backward → learning slows or stops.
8. Question 8
Correct chain rule decomposition for weight update?
-
✅ Upstream gradient × local gradient of weighted sum wrt weight
-
❌ Only use activation derivatives
-
❌ Use current layer only
-
❌ Loss gradient × activation derivative × input
Explanation:
Chain rule:
dLdw=dLdz⋅dzdw\frac{dL}{dw} = \frac{dL}{dz} \cdot \frac{dz}{dw}dwdL=dzdL⋅dwdz
9. Question 9
Why ReLU helps mitigate vanishing gradients?
-
✅ ReLU has a constant nonzero derivative (1) for positive inputs
-
❌ Requires fewer parameters
-
❌ Introduces sparsity
-
❌ Output bounded prevents explosion
Explanation:
Derivative of ReLU is 1, so gradients do not vanish.
10. Question 10
Correct neuron computation:
-
✅ output = f(Σ(wi × xi) + b)
-
❌ f(Σ(wi × xi)) + b
-
❌ Σ(wi × f(xi)) + b
-
❌ Σ(f(wi × xi)) + b
Explanation:
Neural computation = weighted sum + bias → activation function.
🧾 Summary Table
| Q# | Correct Answer |
|---|---|
| 1 | Gradient descent algorithm |
| 2 | w → w – η ∂J/∂w |
| 3 | ReLU |
| 4 | Tanh |
| 5 | Output layer |
| 6 | c, a, b, d |
| 7 | Vanishing gradients (exponential decay) |
| 8 | Upstream gradient × local gradient |
| 9 | ReLU has constant derivative for positive inputs |
| 10 | output = f(Σ(wi·xi) + b) |