1. Question 1

Which algorithm updates weights & biases by minimizing cost iteratively?

❌ Logistic descent algorithm
❌ Vanishing gradient algorithm
✅ Gradient descent algorithm
❌ Activation function algorithm

Explanation:
Gradient descent adjusts weights by following the negative gradient of the loss.

2. Question 2

Correct weight update rule for gradient descent?

❌ w → w + η * ∂J/∂w
✅ w → w – η * ∂J/∂w
❌ w → w – η * b * ∂J/∂w
❌ w → w + b – η * ∂J/∂w

Explanation:
Weights move against the gradient direction to reduce cost.

3. Question 3

Activation function = zero for negative inputs, x for positive inputs:

❌ Softmax
❌ Sigmoid
✅ ReLU function
❌ Tanh
❌ Linear

Explanation:
ReLU(x) = max(0, x).

4. Question 4

Activation function returns negative for negative & positive for positive, centered at zero:

❌ Binary step
❌ Sigmoid
❌ ReLU
✅ Hyperbolic tangent (tanh)

Explanation:
tanh(x) outputs in the range [-1, 1].

5. Question 5

Where to use softmax in a 10-class classifier?

❌ Custom-defined layer
❌ Hidden layer
✅ Output layer
❌ Input layer

Explanation:
Softmax converts output logits into class probabilities.

6. Question 6

Correct backpropagation sequence?

❌ a, c, b, d
❌ a, b, c, d
✅ c, a, b, d
❌ c, b, a, d

Explanation:

Forward pass
Compute loss
Backprop gradients
Update until convergence

7. Question 7

Why deep layers fail to learn?

❌ Activation saturation
❌ Overfitting
✅ Exponential decay of gradients (vanishing gradients)
❌ Exploding gradients

Explanation:
Gradients shrink as they move backward → learning slows or stops.

8. Question 8

Correct chain rule decomposition for weight update?

✅ Upstream gradient × local gradient of weighted sum wrt weight
❌ Only use activation derivatives
❌ Use current layer only
❌ Loss gradient × activation derivative × input

Explanation:
Chain rule:
$dLdw=dLdz⋅dzdw\frac{dL}{dw} = \frac{dL}{dz} \cdot \frac{dz}{dw}$

9. Question 9

Why ReLU helps mitigate vanishing gradients?

✅ ReLU has a constant nonzero derivative (1) for positive inputs
❌ Requires fewer parameters
❌ Introduces sparsity
❌ Output bounded prevents explosion

Explanation:
Derivative of ReLU is 1, so gradients do not vanish.

10. Question 10

Correct neuron computation:

✅ output = f(Σ(wi × xi) + b)
❌ f(Σ(wi × xi)) + b
❌ Σ(wi × f(xi)) + b
❌ Σ(f(wi × xi)) + b

Explanation:
Neural computation = weighted sum + bias → activation function.

🧾 Summary Table

Q#	Correct Answer
1	Gradient descent algorithm
2	w → w – η ∂J/∂w
3	ReLU
4	Tanh
5	Output layer
6	c, a, b, d
7	Vanishing gradients (exponential decay)
8	Upstream gradient × local gradient
9	ReLU has constant derivative for positive inputs
10	output = f(Σ(wi·xi) + b)

Module 2 Graded Quiz: Basics of Deep Learning :Introduction to Deep Learning & Neural Networks with Keras (IBM AI Engineering Professional Certificate) Answers 2025