Autonomous Driving (Quiz Case Study):Structuring Machine Learning Projects(Deep Learning Specialization)Answers:2025

Question 1

What is the first thing you do? (each step ≈ a few days)

❌ Invest a few days in thinking on potential difficulties, and then some more days brainstorming about possible solutions, before training any model.
❌ Spend some time searching the internet for the data most similar to the conditions you expect on production.
✅ Train a basic model and do error analysis.
❌ Spend a few days collecting more data using the front-facing camera of your car, to better understand how much data per unit time you can collect.

Explanation: Start with a simple baseline model and perform error analysis — this quickly reveals the most important problems (data issues, label noise, model bias, etc.). It gives concrete evidence to guide where to invest time next (more data, better labels, architecture). The other choices are reasonable activities but come after you’ve run a quick baseline and examined errors.

Question 2

True/False: For the output layer, a softmax activation would be a good choice because this is a multi-task learning problem.

❌ True
✅ False

Explanation: This is a multi-label / multi-task setting (an image can contain multiple signs/lights simultaneously). Softmax enforces a single exclusive choice across outputs and is inappropriate. Use independent sigmoid outputs (or per-task heads) and a loss that sums across present labels.

Question 3

Which 500 mistaken images should you manually examine, one image at a time?

❌ 500 images of the train set, on which the algorithm made a mistake.
❌ 500 images of the test set, on which the algorithm made a mistake.
✅ 500 images of the training-dev set, on which the algorithm made a mistake.
❌ 500 images of the dev set, on which the algorithm made a mistake.

Explanation: The training-dev set (examples drawn from the training distribution but not used for training) helps diagnose issues with the model on the training distribution without test/dev leakage. Examining train errors is less informative (they might be fixed by further training), and test/dev are reserved for evaluation — manual per-example labeling is best done on training-dev.

Question 4

True/False: In multi-task learning, if some examples have missing labels (e.g., [0 ? 1 1 ?]), the learning algorithm cannot use those examples.

❌ True
✅ False

Explanation: You can use examples with partially missing labels. During loss computation, simply ignore the missing labels (compute loss only on available labels). Such examples still provide learning signal for the tasks with labels provided.

Question 5

How should you split the dataset into train/dev/test sets?

❌ Mix all the 100,000 images with the 900,000 images you found online. Shuffle everything. Split into 600k train / 200k dev / 200k test.
❌ Choose the training set to be the 900,000 internet images + 20,000 car images; split remaining 80k equally for dev/test.
❌ Mix all 100k + 900k; split into 980k train / 10k dev / 10k test.
✅ Choose the training set to be the 900,000 internet images + 80,000 car images. The 20,000 remaining images will be split equally in dev and test sets.

Explanation: Dev and test should reflect the production distribution (car camera). Use a dev/test drawn from your car images to evaluate real-world performance. Keep a substantial number of car images out of training for dev/test (20k here) while using many internet images plus most car images in training to learn broadly. The chosen split ensures dev/test match the distribution you care about.

Question 6

Given errors: Training 8.8%, Training-Dev 9.1%, Dev 14.3%, Test 14.8%. Human-level ≈ 0.5%. Which are true?

❌ You have a large variance problem because your model is not generalizing well to data from the same training distribution.
✅ You have a large avoidable-bias problem because your training error is quite a bit higher than the human-level error.
✅ You have a large data-mismatch problem because your model does a lot better on the training-dev set than on the dev set.
❌ You have a large variance problem because your training error is quite higher than the human-level error.

Explanation: Training error (8.8%) much higher than human-level (0.5%) → avoidable bias. Training-dev (9.1%) ≪ dev (14.3%) → data mismatch between training (mixed internet+car) and dev (car-only). Variance relates to gap between training and training-dev; here that gap is small, so variance is not the dominant issue.

Question 7

Given Training 2%, Training-Dev 2.3%, Dev 1.3%, Test 1.1%. Human-level ≈ 0.5%. A friend believes the car camera images (Dev/Test) have a lower Bayes error than the mixed training images. What do you think?

✅ Your friend is likely correct.
❌ There’s insufficient information to determine if your friend is correct or not.
❌ Your friend is likely incorrect.

Explanation: Dev/Test (car images) have lower measured error (≈1.1–1.3%) than training/training-dev (≈2–2.3%). That suggests the car-camera distribution may indeed be an easier distribution (lower Bayes error) than the mixed internet+car training data. While not airtight proof, the observed pattern supports your friend’s claim.

Question 8

Dev errors breakdown (total dev error 15.3%): incorrectly labeled 4.1%, foggy pictures 8.0%, raindrops 2.2%, other 1.0%. Should the team’s highest priority be bringing more foggy pictures into training to address the 8.0%?

✅ First start with the sources of error that are least costly to fix.
❌ True because it is the largest category of errors. We should always prioritize the largest category.
❌ False because it depends on how easy it is to add foggy data. If collecting foggy data is very hard/costly, it might not be worth it.
❌ True because it is greater than the other error categories added together.

Explanation: You should prioritize fixes by cost-effectiveness — start with sources that are inexpensive to correct or have big impact per unit cost (e.g., fixing label errors can be cheap and high-impact). Although fog is the largest single category, if it’s expensive to collect realistic foggy images, it may not be the best first move.

Question 9

Dev error 15.3%; incorrectly labeled 4.1%; partially occluded 7.2%; etc. True/False: If you fix incorrectly labeled data you will reduce overall dev error to 11.2%?

✅ False
❌ True

Explanation: You cannot conclude the dev error will exactly drop to 11.2%. The 4.1% is the fraction of dev examples mislabelled, but fixing labels may change how many examples are counted as errors in subtle ways, and other interactions (relabeling might change evaluation or require re-training) exist. So you can’t guarantee the exact subtraction.

Question 10

Using synthesized fog from 1,000 fog pictures added to clean images — which statement do you agree with?

❌ Adding synthesized images that look like real foggy pictures taken from the front-facing camera won’t help because it will introduce avoidable bias.
❌ As long as the synthesized fog looks realistic to the human eye, you can be confident it’s accurately capturing the distribution of real foggy images.
✅ There is little risk of overfitting to the 1,000 pictures of fog as long as you are combining it with a much larger (much greater than 1,000) set of clean/non-foggy images.

Explanation: The first statement is incorrect — realistic augmentation can help reduce errors on foggy images. The second is overconfident: human perception isn’t a perfect guarantee that the synthesized distribution matches the real-world fog distribution. The third is the best practical position: if you augment widely and mix with large clean data, overfitting to those 1,000 templates is less likely (but still check performance on held-out real foggy data).

Question 11

After correcting dev labels, must you also correct test labels so dev and test have same distribution? (Options)

❌ False, the test set should be changed, but also the train set to keep same distribution between train/dev/test.
✅ False, the test set shouldn’t be changed since we want to know how the model performs with uncorrected or original data.
❌ True, we must keep dev and test with the same distribution. The labels in the training set should be fixed only in case of a systematic error.

Explanation: You should not alter the test set simply to match a modified dev set — the test set should remain an untouched held-out evaluation of performance on the original target distribution (unless you have identified and want to correct systematic labeling errors across all sets). So leave test alone to get an honest final evaluation.

Question 12

Client asks to add dog detection with a relatively small dog dataset. Which do you agree with most?

❌ You will have to re-train the whole model now including the dogs’ data.
❌ You should train a single new model for the dogs’ task, and leave the previous model as it is.
❌ Using pre-trained weights can severely hinder the ability of the model to detect dogs since they have too many learned features.
✅ You can use weights pre-trained on the original data, and fine-tune with the data now including the dogs.

Explanation: Fine-tuning a pre-trained model (transfer learning) is the efficient approach when you have a small dataset for a related task. It leverages learned features and typically yields faster convergence and better performance than training from scratch.

Question 13

Colleague has 30k examples per class to classify stop signs as speed limit/not. He thought of using your model with transfer learning but noticed you use multi-task learning, hence he can’t use your model. True/False?

✅ False
❌ True

Explanation: Multi-task models can still be used for transfer learning; you can reuse shared layers or the feature extractor and fine-tune a single-task head for the colleague’s binary classification problem. Multi-task nature doesn’t prevent transfer.

Question 14

Which approach is a better example of an end-to-end approach?

❌ Approach 2 (detect traffic light first, then determine color)
✅ Approach 1 (input image → directly predict red/green presence)

Explanation: An end-to-end approach maps raw input directly to the desired output in one learned model (Approach 1). Approach 2 decomposes into separate steps (object detection + color classification), which is not strictly end-to-end.

Question 15

Approach A vs B: Approach A is more promising than B if you have a ________.

✅ Large training set.
❌ Multi-task learning problem.
❌ Problem with a high Bayes error.
❌ Large bias problem.

Explanation: End-to-end learning (Approach A) tends to outperform a staged pipeline when you have a large labeled training set because the model can learn the necessary intermediate representations automatically. If data is small or you have strong prior structure, a pipeline (Approach B) might be preferable.

🧾 Summary Table

Q #	Correct Answer(s)	Key concept
1	✅ Train a basic model and do error analysis.	Start with a simple baseline + error analysis to prioritize work.
2	✅ False	Multi-label/multi-task needs independent sigmoids, not softmax.
3	✅ Training-dev mistaken images	Examine training-dev to diagnose training-distribution errors without leakage.
4	✅ False	Use examples with partially missing labels by ignoring missing components in loss.
5	✅ Train = internet + 80k car; dev/test = 10k/10k car	Dev/test must match production distribution (car camera).
6	✅ Avoidable bias; ✅ Data-mismatch	High training error → bias; train-dev vs dev gap → data mismatch.
7	✅ Friend likely correct	Lower dev/test errors suggest car images may have lower Bayes error.
8	✅ First start with least costly fixes	Prioritize fixes by cost-effectiveness, not only error frequency.
9	✅ False	Fixing mislabeled data does not guarantee exact arithmetic reduction.
10	✅ Little risk of overfitting if combined with much larger clean dataset	Realistic augmentation helps; be cautious but usually safe if mixed well.
11	✅ False (do not change test)	Leave test untouched to preserve honest final evaluation.
12	✅ Fine-tune pre-trained weights with dog data	Transfer learning / fine-tuning is best for small new-data tasks.
13	✅ False	Multi-task model can be reused via transfer learning.
14	✅ Approach 1	End-to-end = direct input→output mapping.
15	✅ Large training set	End-to-end needs lots of data to learn intermediate tasks implicitly.