Skip to content

Practical Aspects of Deep Learning:Improving Deep Neural Networks: Hyperparameter Tuning, Regularization and Optimization:(Deep Learning Specialization) Answers:2025

Question 1

If you have 10,000 examples, how would you split the train/dev/test set? Choose the best option.

❌ 98% train. 1% dev. 1% test.
60% train. 20% dev. 20% test.
❌ 33% train. 33% dev. 33% test.

Explanation:
A common, balanced split is 60/20/20 (or 70/15/15). 60/20/20 gives enough training data while keeping sizeable dev/test sets to evaluate generalization.


Question 2

The dev and test set should:

❌ Be identical to each other (same (x,y) pairs)
Come from the same distribution
❌ Have the same number of examples
❌ Come from different distributions

Explanation:
Dev and test should both be drawn from the same underlying distribution (so performance on dev reflects expected test performance). They must not be identical examples.


Question 3

If your Neural Network model seems to have high variance, what of the following would be promising things to try?

❌ Increase the number of units in each hidden layer
Get more training data
❌ Get more test data
Add regularization
❌ Make the Neural Network deeper

Explanation:
High variance (overfitting) is helped by more training data and stronger regularization (dropout, L2, etc.). Increasing capacity (units/depth) typically worsens variance; more test data doesn’t address overfitting.


Question 4

Your classifier gets training error 0.1% and dev error 11%. Which statements are true? (Check all that apply.)

❌ The model is overfitting the development set.
The model is overfitting the training set.
❌ The model has a very high bias.
The model has a high variance.

Explanation:
Very low train error but much higher dev error ⇒ model fits training data too closely (overfits) and exhibits high variance, not high bias.


Question 5

In every case it is a good practice to use dropout when training a deep neural network because it can help to prevent overfitting. True/False?

False
❌ True

Explanation:
Dropout often helps, but not always — some architectures/training regimes or small datasets can suffer when using dropout. It’s a useful tool, but not universally required.


Question 6

True or False: In L2 regularization, the lambda hyperparameter directly influences the calculations used by the model to make predictions during testing.

❌ True
False

Explanation:
Lambda affects training (penalizes large weights) and thus indirectly affects the learned weights. During testing the model uses those learned weights; lambda itself does not directly change the prediction formula at test time.


Question 7

Which of the following are true about dropout?

❌ In practice, it eliminates units of each layer with a probability of keep_prob.
It helps to reduce overfitting.
❌ It helps to reduce the bias of a model.
In practice, it eliminates units of each layer with a probability of 1 – keep_prob.

Explanation:
Dropout randomly removes units during training with probability (1 − keep_prob), which reduces overfitting. It typically increases bias slightly while reducing variance.


Question 8

Increasing keep_prob from 0.5 to 0.6 will likely cause the following: (Check two)

❌ Increasing the regularization effect
Reducing the regularization effect
❌ Causing the neural network to end up with a higher training set error
Causing the neural network to end up with a lower training set error

Explanation:
Higher keep_prob means fewer units dropped (less regularization). Less regularization usually lowers training error (model fits training data better).


Question 9

Which techniques are useful for reducing variance (overfitting)? (Check all that apply.)

❌ Gradient Checking
Data augmentation
Dropout
L2 regularization
❌ Exploding gradient
❌ Vanishing gradient
❌ Xavier initialization

Explanation:
Data augmentation, dropout, and L2 are standard methods to reduce overfitting. Xavier initialization helps training stability, not directly variance reduction. Gradient checking, exploding/vanishing gradients are unrelated to regularization.


Question 10

Which expression correctly normalizes the input x?

❌ x = x / σ
❌ x = (1/m) Σ (x(i))²
❌ x = (1/m) Σ x(i)
x = (x − μ) / σ

Explanation:
Standard normalization (z-score) subtracts mean μ and divides by standard deviation σ: (x − μ)/σ.


🧾 Summary Table

Q# ✅ Correct Answer Key Concept
1 60% / 20% / 20% Balanced train/dev/test split for 10k examples
2 Come from same distribution Dev/test must be drawn from same data distribution
3 Get more data; Add regularization Remedies for high variance (overfitting)
4 Overfitting training set; High variance Low train error + high dev error ⇒ overfit/high variance
5 False Dropout helps but isn’t always appropriate
6 False λ affects training, not the prediction formula at test time
7 Helps reduce overfitting; removes units with prob 1−keep_prob Correct behavior and effect of dropout
8 Reduce regularization; lower training error Increasing keep_prob reduces regularization strength
9 Data augmentation; Dropout; L2 regularization Common variance-reduction techniques
10 (x − μ) / σ Standard (z-score) normalization