Practical Aspects of Deep Learning:Improving Deep Neural Networks: Hyperparameter Tuning, Regularization and Optimization:(Deep Learning Specialization) Answers:2025

Question 1

If you have 10,000 examples, how would you split the train/dev/test set? Choose the best option.

❌ 98% train. 1% dev. 1% test.
✅ 60% train. 20% dev. 20% test.
❌ 33% train. 33% dev. 33% test.

Explanation:
A common, balanced split is 60/20/20 (or 70/15/15). 60/20/20 gives enough training data while keeping sizeable dev/test sets to evaluate generalization.

Question 2

The dev and test set should:

❌ Be identical to each other (same (x,y) pairs)
✅ Come from the same distribution
❌ Have the same number of examples
❌ Come from different distributions

Explanation:
Dev and test should both be drawn from the same underlying distribution (so performance on dev reflects expected test performance). They must not be identical examples.

Question 3

If your Neural Network model seems to have high variance, what of the following would be promising things to try?

❌ Increase the number of units in each hidden layer
✅ Get more training data
❌ Get more test data
✅ Add regularization
❌ Make the Neural Network deeper

Explanation:
High variance (overfitting) is helped by more training data and stronger regularization (dropout, L2, etc.). Increasing capacity (units/depth) typically worsens variance; more test data doesn’t address overfitting.

Question 4

Your classifier gets training error 0.1% and dev error 11%. Which statements are true? (Check all that apply.)

❌ The model is overfitting the development set.
✅ The model is overfitting the training set.
❌ The model has a very high bias.
✅ The model has a high variance.

Explanation:
Very low train error but much higher dev error ⇒ model fits training data too closely (overfits) and exhibits high variance, not high bias.

Question 5

In every case it is a good practice to use dropout when training a deep neural network because it can help to prevent overfitting. True/False?

✅ False
❌ True

Explanation:
Dropout often helps, but not always — some architectures/training regimes or small datasets can suffer when using dropout. It’s a useful tool, but not universally required.

Question 6

True or False: In L2 regularization, the lambda hyperparameter directly influences the calculations used by the model to make predictions during testing.

❌ True
✅ False

Explanation:
Lambda affects training (penalizes large weights) and thus indirectly affects the learned weights. During testing the model uses those learned weights; lambda itself does not directly change the prediction formula at test time.

Question 7

Which of the following are true about dropout?

❌ In practice, it eliminates units of each layer with a probability of keep_prob.
✅ It helps to reduce overfitting.
❌ It helps to reduce the bias of a model.
✅ In practice, it eliminates units of each layer with a probability of 1 – keep_prob.

Explanation:
Dropout randomly removes units during training with probability (1 − keep_prob), which reduces overfitting. It typically increases bias slightly while reducing variance.

Question 8

Increasing keep_prob from 0.5 to 0.6 will likely cause the following: (Check two)

❌ Increasing the regularization effect
✅ Reducing the regularization effect
❌ Causing the neural network to end up with a higher training set error
✅ Causing the neural network to end up with a lower training set error

Explanation:
Higher keep_prob means fewer units dropped (less regularization). Less regularization usually lowers training error (model fits training data better).

Question 9

Which techniques are useful for reducing variance (overfitting)? (Check all that apply.)

❌ Gradient Checking
✅ Data augmentation
✅ Dropout
✅ L2 regularization
❌ Exploding gradient
❌ Vanishing gradient
❌ Xavier initialization

Explanation:
Data augmentation, dropout, and L2 are standard methods to reduce overfitting. Xavier initialization helps training stability, not directly variance reduction. Gradient checking, exploding/vanishing gradients are unrelated to regularization.

Question 10

Which expression correctly normalizes the input x?

❌ x = x / σ
❌ x = (1/m) Σ (x(i))²
❌ x = (1/m) Σ x(i)
✅ x = (x − μ) / σ

Explanation:
Standard normalization (z-score) subtracts mean μ and divides by standard deviation σ: (x − μ)/σ.

🧾 Summary Table

Q#	✅ Correct Answer	Key Concept
1	60% / 20% / 20%	Balanced train/dev/test split for 10k examples
2	Come from same distribution	Dev/test must be drawn from same data distribution
3	Get more data; Add regularization	Remedies for high variance (overfitting)
4	Overfitting training set; High variance	Low train error + high dev error ⇒ overfit/high variance
5	False	Dropout helps but isn’t always appropriate
6	False	λ affects training, not the prediction formula at test time
7	Helps reduce overfitting; removes units with prob 1−keep_prob	Correct behavior and effect of dropout
8	Reduce regularization; lower training error	Increasing keep_prob reduces regularization strength
9	Data augmentation; Dropout; L2 regularization	Common variance-reduction techniques
10	(x − μ) / σ	Standard (z-score) normalization