Skip to content

Graded Quiz: Fundamental Concepts of Transformer Architecture :Generative AI Language Modeling with Transformers (IBM AI Engineering Professional Certificate) Answers 2025

1. What does self-attention primarily allow a model to do?

❌ Eliminate unimportant words
❌ Generate paraphrases
❌ Identify parts of speech
Represent each word using its surrounding context

Explanation:
Self-attention computes relationships between all tokens, enabling each word to encode meaning based on its context.


2. What parameter influences positional encoding values across embedding dimensions?

❌ Counts input tokens
❌ Tracks where each word appears
❌ Indicates phase offset
Determines the frequency of sine and cosine waves

Explanation:
Sinusoidal positional encoding uses varying frequencies across dimensions, controlled by the formula’s denominator term.


3. How does attention ensure “chat” maps to “cat”?

❌ Randomly pick value vectors
❌ Multiply values and keys only
❌ Replace query with key vector
Match the query vector with the transposed key matrix and retrieve the corresponding value

Explanation:
Attention = softmax(QKᵀ)V → the Q·Kᵀ scores determine which value vector (translation) is selected.


4. Role of multi-head attention in summarization?

Apply multiple scaled dot-product attention operations in parallel on different representation subspaces
❌ Mask future tokens everywhere
❌ Apply a single attention mechanism
❌ Multiply vectors without scaling

Explanation:
Multi-head attention lets the model focus on multiple aspects of meaning simultaneously.


5. What is the correct second step after creating embeddings?

❌ Normalize sentence length
❌ Create token index mapping
❌ Extract word features
Apply positional encoding to embeddings

Explanation:
Transformers have no inherent notion of order, so positional encoding must be added immediately after embeddings.


🧾 Summary Table

Q# Correct Answer Key Concept
1 Represent context Purpose of self-attention
2 Frequency of sin/cos Positional encoding math
3 QKᵀ → value How attention selects outputs
4 Parallel attention heads Multi-head attention role
5 Positional encoding Transformer pipeline step