Practical Aspects of Deep Learning:Improving Deep Neural Networks: Hyperparameter Tuning, Regularization and Optimization:(Deep Learning Specialization) Answers:2025
Question 1
If you have 10,000 examples, how would you split the train/dev/test set? Choose the best option.
❌ 98% train. 1% dev. 1% test.
✅ 60% train. 20% dev. 20% test.
❌ 33% train. 33% dev. 33% test.
Explanation:
A common, balanced split is 60/20/20 (or 70/15/15). 60/20/20 gives enough training data while keeping sizeable dev/test sets to evaluate generalization.
Question 2
The dev and test set should:
❌ Be identical to each other (same (x,y) pairs)
✅ Come from the same distribution
❌ Have the same number of examples
❌ Come from different distributions
Explanation:
Dev and test should both be drawn from the same underlying distribution (so performance on dev reflects expected test performance). They must not be identical examples.
Question 3
If your Neural Network model seems to have high variance, what of the following would be promising things to try?
❌ Increase the number of units in each hidden layer
✅ Get more training data
❌ Get more test data
✅ Add regularization
❌ Make the Neural Network deeper
Explanation:
High variance (overfitting) is helped by more training data and stronger regularization (dropout, L2, etc.). Increasing capacity (units/depth) typically worsens variance; more test data doesn’t address overfitting.
Question 4
Your classifier gets training error 0.1% and dev error 11%. Which statements are true? (Check all that apply.)
❌ The model is overfitting the development set.
✅ The model is overfitting the training set.
❌ The model has a very high bias.
✅ The model has a high variance.
Explanation:
Very low train error but much higher dev error ⇒ model fits training data too closely (overfits) and exhibits high variance, not high bias.
Question 5
In every case it is a good practice to use dropout when training a deep neural network because it can help to prevent overfitting. True/False?
✅ False
❌ True
Explanation:
Dropout often helps, but not always — some architectures/training regimes or small datasets can suffer when using dropout. It’s a useful tool, but not universally required.
Question 6
True or False: In L2 regularization, the lambda hyperparameter directly influences the calculations used by the model to make predictions during testing.
❌ True
✅ False
Explanation:
Lambda affects training (penalizes large weights) and thus indirectly affects the learned weights. During testing the model uses those learned weights; lambda itself does not directly change the prediction formula at test time.
Question 7
Which of the following are true about dropout?
❌ In practice, it eliminates units of each layer with a probability of keep_prob.
✅ It helps to reduce overfitting.
❌ It helps to reduce the bias of a model.
✅ In practice, it eliminates units of each layer with a probability of 1 – keep_prob.
Explanation:
Dropout randomly removes units during training with probability (1 − keep_prob), which reduces overfitting. It typically increases bias slightly while reducing variance.
Question 8
Increasing keep_prob from 0.5 to 0.6 will likely cause the following: (Check two)
❌ Increasing the regularization effect
✅ Reducing the regularization effect
❌ Causing the neural network to end up with a higher training set error
✅ Causing the neural network to end up with a lower training set error
Explanation:
Higher keep_prob means fewer units dropped (less regularization). Less regularization usually lowers training error (model fits training data better).
Question 9
Which techniques are useful for reducing variance (overfitting)? (Check all that apply.)
❌ Gradient Checking
✅ Data augmentation
✅ Dropout
✅ L2 regularization
❌ Exploding gradient
❌ Vanishing gradient
❌ Xavier initialization
Explanation:
Data augmentation, dropout, and L2 are standard methods to reduce overfitting. Xavier initialization helps training stability, not directly variance reduction. Gradient checking, exploding/vanishing gradients are unrelated to regularization.
Question 10
Which expression correctly normalizes the input x?
❌ x = x / σ
❌ x = (1/m) Σ (x(i))²
❌ x = (1/m) Σ x(i)
✅ x = (x − μ) / σ
Explanation:
Standard normalization (z-score) subtracts mean μ and divides by standard deviation σ: (x − μ)/σ.
🧾 Summary Table
| Q# | ✅ Correct Answer | Key Concept |
|---|---|---|
| 1 | 60% / 20% / 20% | Balanced train/dev/test split for 10k examples |
| 2 | Come from same distribution | Dev/test must be drawn from same data distribution |
| 3 | Get more data; Add regularization | Remedies for high variance (overfitting) |
| 4 | Overfitting training set; High variance | Low train error + high dev error ⇒ overfit/high variance |
| 5 | False | Dropout helps but isn’t always appropriate |
| 6 | False | λ affects training, not the prediction formula at test time |
| 7 | Helps reduce overfitting; removes units with prob 1−keep_prob | Correct behavior and effect of dropout |
| 8 | Reduce regularization; lower training error | Increasing keep_prob reduces regularization strength |
| 9 | Data augmentation; Dropout; L2 regularization | Common variance-reduction techniques |
| 10 | (x − μ) / σ | Standard (z-score) normalization |