1. Question 1

What function creates one-hot encoded arrays of labels?

✅ tf.keras.utils.to_categorical
❌ tf.keras.utils.SequenceEnqueuer
❌ tf.keras.utils.img_to_array
❌ tf.keras.preprocessing.text.one_hot

Explanation:
to_categorical() converts integer labels into one-hot encoded vectors, which is required for multi-class classification.

2. Question 2

Major drawback of word-based training vs character-based?

❌ Character generation is more accurate
✅ Because there are far more words in a typical corpus than characters, it is much more memory intensive
❌ No drawback
❌ Word-based is more accurate

Explanation:
Word vocabularies can reach tens of thousands, making:
✔ memory heavier
✔ embeddings larger
✔ output layers huge
✔ training slower

Characters usually: 40–100 symbols only.

3. Question 3

Critical steps in preparing input sequences?

✅ Generating subphrases using n_gram_sequences
❌ Converting the seed text with texts_to_sequences
❌ Splitting into training/testing (not part of core sequence prep)
✅ Pre-padding the subphrase sequences

Correct Steps:

Create n-gram subphrases
Pre-pad sequences so they are equal length

Explanation:
These are essential for sequence prediction model training.

4. Question 4

Why does predicting more words cause gibberish?

❌ Probability compounds
❌ Matching fewer known phrases
❌ Likelihood doesn’t change
✅ Because you are more likely to hit words not in the training set

Explanation:
The model becomes less confident as predictions move forward → probability of selecting uncommon or unseen words increases, and small prediction errors accumulate → gibberish grows.

5. Question 5

Do we use a sigmoid output layer with one neuron per word?

❌ True
✅ False

Explanation:
For next-word prediction:
✔ Use Softmax
✔ One neuron per word
✔ Softmax gives a proper probability distribution across all words.

Sigmoid is for binary classification, not multi-class.

🧾 Summary Table

Q#	Correct Answer	Key Concept
1	to_categorical	One-hot encoding
2	Word-based is more memory-intensive	Vocabulary size impact
3	n-gram + pre-padding	Preparing sequences
4	More chance of unseen words	Prediction drift
5	False — use softmax	NLP multi-class output layer