Skip to content

Module 2 Graded Quiz: Basics of Deep Learning :Introduction to Deep Learning & Neural Networks with Keras (IBM AI Engineering Professional Certificate) Answers 2025

1. Question 1

Which algorithm updates weights & biases by minimizing cost iteratively?

  • ❌ Logistic descent algorithm

  • ❌ Vanishing gradient algorithm

  • Gradient descent algorithm

  • ❌ Activation function algorithm

Explanation:
Gradient descent adjusts weights by following the negative gradient of the loss.


2. Question 2

Correct weight update rule for gradient descent?

  • ❌ w → w + η * ∂J/∂w

  • w → w – η * ∂J/∂w

  • ❌ w → w – η * b * ∂J/∂w

  • ❌ w → w + b – η * ∂J/∂w

Explanation:
Weights move against the gradient direction to reduce cost.


3. Question 3

Activation function = zero for negative inputs, x for positive inputs:

  • ❌ Softmax

  • ❌ Sigmoid

  • ReLU function

  • ❌ Tanh

  • ❌ Linear

Explanation:
ReLU(x) = max(0, x).


4. Question 4

Activation function returns negative for negative & positive for positive, centered at zero:

  • ❌ Binary step

  • ❌ Sigmoid

  • ❌ ReLU

  • Hyperbolic tangent (tanh)

Explanation:
tanh(x) outputs in the range [-1, 1].


5. Question 5

Where to use softmax in a 10-class classifier?

  • ❌ Custom-defined layer

  • ❌ Hidden layer

  • Output layer

  • ❌ Input layer

Explanation:
Softmax converts output logits into class probabilities.


6. Question 6

Correct backpropagation sequence?

  • ❌ a, c, b, d

  • ❌ a, b, c, d

  • c, a, b, d

  • ❌ c, b, a, d

Explanation:

  1. Forward pass

  2. Compute loss

  3. Backprop gradients

  4. Update until convergence


7. Question 7

Why deep layers fail to learn?

  • ❌ Activation saturation

  • ❌ Overfitting

  • Exponential decay of gradients (vanishing gradients)

  • ❌ Exploding gradients

Explanation:
Gradients shrink as they move backward → learning slows or stops.


8. Question 8

Correct chain rule decomposition for weight update?

  • Upstream gradient × local gradient of weighted sum wrt weight

  • ❌ Only use activation derivatives

  • ❌ Use current layer only

  • ❌ Loss gradient × activation derivative × input

Explanation:
Chain rule:
dLdw=dLdz⋅dzdw\frac{dL}{dw} = \frac{dL}{dz} \cdot \frac{dz}{dw}


9. Question 9

Why ReLU helps mitigate vanishing gradients?

  • ReLU has a constant nonzero derivative (1) for positive inputs

  • ❌ Requires fewer parameters

  • ❌ Introduces sparsity

  • ❌ Output bounded prevents explosion

Explanation:
Derivative of ReLU is 1, so gradients do not vanish.


10. Question 10

Correct neuron computation:

  • output = f(Σ(wi × xi) + b)

  • ❌ f(Σ(wi × xi)) + b

  • ❌ Σ(wi × f(xi)) + b

  • ❌ Σ(f(wi × xi)) + b

Explanation:
Neural computation = weighted sum + bias → activation function.


🧾 Summary Table

Q# Correct Answer
1 Gradient descent algorithm
2 w → w – η ∂J/∂w
3 ReLU
4 Tanh
5 Output layer
6 c, a, b, d
7 Vanishing gradients (exponential decay)
8 Upstream gradient × local gradient
9 ReLU has constant derivative for positive inputs
10 output = f(Σ(wi·xi) + b)