Hyperparameter tuning, Batch Normalization, Programming Frameworks:Improving Deep: Neural Networks: Hyperparameter Tuning, Regularization and Optimization:(Deep Learning Specialization) Answers:2025

Question 1

Which of the following are true about hyperparameter search?

✅ Choosing random values for hyperparameters is convenient since we might not know which are most important.
❌ When using random values they must always be uniformly distributed.
❌ Choosing grid values is better when number of hyperparameters is high.
✅ When sampling from a grid, the number of values per hyperparameter is larger than when using random values.

Explanation:
Random search is efficient when we don’t know which hyperparameters matter most.
Uniform sampling isn’t always ideal — log scale is often used.
Grid search becomes inefficient for many hyperparameters.

Question 2

In a project with limited computational resources, which three hyperparameters would you choose to tune?

❌ ε in Adam
✅ α (learning rate)
✅ mini-batch size
✅ β (momentum parameter)
❌ β₁, β₂ in Adam

Explanation:
The most sensitive hyperparameters are learning rate, mini-batch size, and momentum β.
Adam’s β₁, β₂ and ε usually work well with default values.

Question 3

Even if enough computational power is available for tuning, it is always better to “babysit” one model (“Panda strategy”). True/False?

✅ False
❌ True

Explanation:
Automated hyperparameter search (many models) is more efficient than manually fine-tuning one model (“Panda strategy”).

Question 4

Knowing α ∈ [0.00001, 1.0], which is the recommended way to sample α?

✅ r = -4*np.random.rand(); α = 10r**
❌ r = np.random.rand(); α = 0.00001 + r0.99999
❌ r = -5np.random.rand(); α = 10r
❌ r = np.random.rand(); α = 10r

Explanation:
Learning rate is best sampled on a log scale since its range spans several magnitudes.

Question 5

Finding good hyperparameters is time-consuming, so you should do it once at the start and never again. True/False?

✅ False
❌ True

Explanation:
Hyperparameters often need retuning when data, architecture, or problem changes.

Question 6

When using batch normalization, it’s OK to drop W[l] from forward propagation. True/False?

✅ False
❌ True

Explanation:
Batch norm normalizes after computing Z = W[l]A[l-1] + b[l];
W[l] is still needed to compute Z and can’t be omitted.

Question 7

When using normalization, if σ is very small, normalization may fail due to division by zero. True/False?

✅ True
❌ False

Explanation:
When σ ≈ 0, rounding errors can make division unstable.
That’s why a small ε (epsilon) is added for numerical stability.

Question 8

Which of the following are true about batch normalization?

❌ β[l] and γ[l] are hyperparameters tuned by random sampling.
✅ γ[l] and β[l] set the variance and mean of ẑ[l].
❌ z_norm = (z – μ) / σ² (wrong formula)
✅ When using batch norm, γ[l] and β[l] are learned (trainable) parameters.

Explanation:
γ[l] and β[l] control scaling and shifting after normalization and are learned by gradient descent, not manually tuned.

Question 9

At test time, we turn off Batch Norm to avoid random predictions. True/False?

✅ False
❌ True

Explanation:
At test time, Batch Norm uses running averages (mean & variance) computed during training — it’s not “turned off”.

Question 10

Which statements about deep learning programming frameworks are true?

✅ They allow you to code deep learning algorithms with fewer lines of code.
✅ Good governance helps keep open-source frameworks fair and open long-term.
❌ They require cloud-based machines to run.

Explanation:
Frameworks (like TensorFlow, PyTorch, Keras) simplify coding.
They run locally or on cloud — not limited to cloud systems.

🧾 Summary Table

Q#	✅ Correct Answer	Key Concept
1	Random search + grid has fixed points	Random search efficiency
2	α, mini-batch size, β (momentum)	Key tunable hyperparameters
3	False	Better to run multiple models (not babysit one)
4	r = -4np.random.rand(); α = 10*r	Log-scale sampling for learning rate
5	False	Hyperparameters must be re-tuned as project evolves
6	False	W[l] can’t be dropped in batch norm
7	True	Small σ may cause division instability
8	γ, β learned; control variance/mean	Batch norm introduces trainable params
9	False	Batch norm stays active with stored running averages
10	Frameworks simplify DL, governance matters	Frameworks ease dev; cloud not required