Skip to content

Week 4 Quiz :Natural Language Processing in TensorFlow (DeepLearning.AI TensorFlow Developer Professional Certificate) Answers 2025

1. Question 1

What function creates one-hot encoded arrays of labels?

  • tf.keras.utils.to_categorical

  • ❌ tf.keras.utils.SequenceEnqueuer

  • ❌ tf.keras.utils.img_to_array

  • ❌ tf.keras.preprocessing.text.one_hot

Explanation:
to_categorical() converts integer labels into one-hot encoded vectors, which is required for multi-class classification.


2. Question 2

Major drawback of word-based training vs character-based?

  • ❌ Character generation is more accurate

  • ✅ Because there are far more words in a typical corpus than characters, it is much more memory intensive

  • ❌ No drawback

  • ❌ Word-based is more accurate

Explanation:
Word vocabularies can reach tens of thousands, making:
✔ memory heavier
✔ embeddings larger
✔ output layers huge
✔ training slower

Characters usually: 40–100 symbols only.


3. Question 3

Critical steps in preparing input sequences?

  • ✅ Generating subphrases using n_gram_sequences

  • ❌ Converting the seed text with texts_to_sequences

  • ❌ Splitting into training/testing (not part of core sequence prep)

  • ✅ Pre-padding the subphrase sequences

Correct Steps:

  1. Create n-gram subphrases

  2. Pre-pad sequences so they are equal length

Explanation:
These are essential for sequence prediction model training.


4. Question 4

Why does predicting more words cause gibberish?

  • ❌ Probability compounds

  • ❌ Matching fewer known phrases

  • ❌ Likelihood doesn’t change

  • ✅ Because you are more likely to hit words not in the training set

Explanation:
The model becomes less confident as predictions move forward → probability of selecting uncommon or unseen words increases, and small prediction errors accumulate → gibberish grows.


5. Question 5

Do we use a sigmoid output layer with one neuron per word?

  • ❌ True

  • ✅ False

Explanation:
For next-word prediction:
✔ Use Softmax
✔ One neuron per word
✔ Softmax gives a proper probability distribution across all words.

Sigmoid is for binary classification, not multi-class.


🧾 Summary Table

Q# Correct Answer Key Concept
1 to_categorical One-hot encoding
2 Word-based is more memory-intensive Vocabulary size impact
3 n-gram + pre-padding Preparing sequences
4 More chance of unseen words Prediction drift
5 False — use softmax NLP multi-class output layer