Graded Quiz: Word2Vec and Sequence-to-Sequence Models :Gen AI Foundational Models for NLP & Language Understanding (IBM AI Engineering Professional Certificate) Answers 2025
1. CBOW context + target at position t = 2 (“she loves watching football”)
Sentence indexed:
t=0: she
t=1: loves
t=2: watching
t=3: football
Window size = 1 → context = immediate neighbors
❌ Context: loves, football; Target: watching
❌ Context: she, football; Target: loves, watching
✅ Context: she, watching; Target word: loves
❌ Context: loves, watching; Target: she, football
Explanation:
At position t = 1 (word = loves), context words are the ones right before and after: she and watching.
2. Difference between CBOW and Skip-gram
❌ Differ only in loss
❌ CBOW predicts target using target
✅ CBOW predicts the target word from context; skip-gram predicts context words from the target word
❌ Skip-gram creates one-hot vectors
Explanation:
CBOW = context → target
Skip-gram = target → context
3. Which model uses time series and remembers past information?
❌ Feedforward neural network
❌ GANs
❌ Word2Vec
✅ Recurrent neural networks (RNNs)
Explanation:
RNNs are designed to handle sequential data and maintain memory through hidden states.
4. Purpose of BOS token in decoder training
❌ Replacement for unknown words
❌ Generate entire output at once
✅ Signals the decoder to start generating the output sequence
❌ Terminates the sequence
Explanation:
The BOS token tells the decoder where to begin generation.
5. Which component generates each translated word sequentially?
❌ Last token of input
❌ Encoder embedding layer
❌ Softmax alone
✅ The decoder module with RNN cells
Explanation:
The decoder RNN processes hidden states step-by-step to produce each output token.
6. How is perplexity computed?
✅ Exponential of average cross-entropy loss
❌ Multiplies all predicted probabilities
❌ Averages squared error
❌ Vocabulary size divided by predicted tokens
Explanation:
Perplexity = exp(cross-entropy) and measures how well a language model predicts text.
🧾 Summary Table
| Q# | Correct Answer | Key Concept |
|---|---|---|
| 1 | she, watching → loves | CBOW context/target |
| 2 | CBOW: context→target, SG: target→context | Word2Vec models |
| 3 | RNNs | Sequential memory |
| 4 | BOS starts decoder generation | Seq2Seq training |
| 5 | Decoder RNN | Translation step-by-step |
| 6 | exp(cross-entropy) | Perplexity metric |