Skip to content

Hyperparameter tuning, Batch Normalization, Programming Frameworks:Improving Deep: Neural Networks: Hyperparameter Tuning, Regularization and Optimization:(Deep Learning Specialization) Answers:2025

Question 1

Which of the following are true about hyperparameter search?

Choosing random values for hyperparameters is convenient since we might not know which are most important.
❌ When using random values they must always be uniformly distributed.
❌ Choosing grid values is better when number of hyperparameters is high.
When sampling from a grid, the number of values per hyperparameter is larger than when using random values.

Explanation:
Random search is efficient when we don’t know which hyperparameters matter most.
Uniform sampling isn’t always ideal — log scale is often used.
Grid search becomes inefficient for many hyperparameters.


Question 2

In a project with limited computational resources, which three hyperparameters would you choose to tune?

❌ ε in Adam
α (learning rate)
mini-batch size
β (momentum parameter)
❌ β₁, β₂ in Adam

Explanation:
The most sensitive hyperparameters are learning rate, mini-batch size, and momentum β.
Adam’s β₁, β₂ and ε usually work well with default values.


Question 3

Even if enough computational power is available for tuning, it is always better to “babysit” one model (“Panda strategy”). True/False?

False
❌ True

Explanation:
Automated hyperparameter search (many models) is more efficient than manually fine-tuning one model (“Panda strategy”).


Question 4

Knowing α ∈ [0.00001, 1.0], which is the recommended way to sample α?

r = -4*np.random.rand(); α = 10r**
❌ r = np.random.rand(); α = 0.00001 + r0.99999
❌ r = -5
np.random.rand(); α = 10r
❌ r = np.random.rand(); α = 10
r

Explanation:
Learning rate is best sampled on a log scale since its range spans several magnitudes.


Question 5

Finding good hyperparameters is time-consuming, so you should do it once at the start and never again. True/False?

False
❌ True

Explanation:
Hyperparameters often need retuning when data, architecture, or problem changes.


Question 6

When using batch normalization, it’s OK to drop W[l] from forward propagation. True/False?

False
❌ True

Explanation:
Batch norm normalizes after computing Z = W[l]A[l-1] + b[l];
W[l] is still needed to compute Z and can’t be omitted.


Question 7

When using normalization, if σ is very small, normalization may fail due to division by zero. True/False?

True
❌ False

Explanation:
When σ ≈ 0, rounding errors can make division unstable.
That’s why a small ε (epsilon) is added for numerical stability.


Question 8

Which of the following are true about batch normalization?

❌ β[l] and γ[l] are hyperparameters tuned by random sampling.
γ[l] and β[l] set the variance and mean of ẑ[l].
❌ z_norm = (z – μ) / σ² (wrong formula)
When using batch norm, γ[l] and β[l] are learned (trainable) parameters.

Explanation:
γ[l] and β[l] control scaling and shifting after normalization and are learned by gradient descent, not manually tuned.


Question 9

At test time, we turn off Batch Norm to avoid random predictions. True/False?

False
❌ True

Explanation:
At test time, Batch Norm uses running averages (mean & variance) computed during training — it’s not “turned off”.


Question 10

Which statements about deep learning programming frameworks are true?

They allow you to code deep learning algorithms with fewer lines of code.
Good governance helps keep open-source frameworks fair and open long-term.
❌ They require cloud-based machines to run.

Explanation:
Frameworks (like TensorFlow, PyTorch, Keras) simplify coding.
They run locally or on cloud — not limited to cloud systems.


🧾 Summary Table

Q# ✅ Correct Answer Key Concept
1 Random search + grid has fixed points Random search efficiency
2 α, mini-batch size, β (momentum) Key tunable hyperparameters
3 False Better to run multiple models (not babysit one)
4 r = -4*np.random.rand(); α = 10**r Log-scale sampling for learning rate
5 False Hyperparameters must be re-tuned as project evolves
6 False W[l] can’t be dropped in batch norm
7 True Small σ may cause division instability
8 γ, β learned; control variance/mean Batch norm introduces trainable params
9 False Batch norm stays active with stored running averages
10 Frameworks simplify DL, governance matters Frameworks ease dev; cloud not required