Graded Quiz: Transformers in Keras :Deep Learning with Keras and Tensorflow (IBM AI Engineering Professional Certificate) Answers 2025
1. Question 1
Primary purpose of multi-head self-attention:
-
✅ To process different parts of the input sequence in parallel
-
❌ Sequential processing
-
❌ Reduce training time
-
❌ Ensure equal output size
Explanation:
Multi-head attention lets the model learn different relationships in parallel.
2. Question 2
Purpose of feedforward layers in Transformers:
-
❌ Focus on sequence parts
-
❌ Weigh importance of words
-
✅ Transform the data after self-attention (non-linear transformation)
-
❌ Compute attention weights
Explanation:
Feedforward networks refine and project attention outputs.
3. Question 3
How Transformers handle temporal dependencies:
-
❌ Convolutions
-
❌ Recurrent connections
-
✅ Positional encodings maintain order information
-
❌ Zero-mean normalization
Explanation:
Transformers have no recurrence → position is injected via encoding.
4. Question 4
Loss function in:
model.compile(optimizer='adam', loss='mse')
-
❌ MinMaxScaler
-
❌ MultiHeadAttention
-
❌ Adam
-
✅ mean squared error
Explanation:mse = Mean Squared Error.
5. Question 5
Role of softmax in attention:
-
❌ Dimensionality reduction
-
✅ Normalize attention scores to probabilities
-
❌ Provide non-linearity
-
❌ Compute dot products
Explanation:
Softmax turns raw attention scores into a probability distribution.
6. Question 6
Mechanism used by transformers to convert speech to text:
-
❌ Layers
-
✅ Spectrograms
-
❌ Patches
-
❌ Images
Explanation:
Audio is converted into spectrograms before being fed to transformer models.
7. Question 7
Which method applies self-attention and combines heads?
-
❌ TransformerBlock class
-
❌ split_heads
-
❌ MultiHeadSelfAttention class
-
✅ call method
Explanation:
The call() method executes the forward pass, including computing and combining heads.
8. Question 8
Converts text to numerical format:
-
❌ lstm.model
-
❌ Sequential
-
✅ TextVectorization
-
❌ Vectorizer
Explanation:
TextVectorization converts raw text → integer sequences.
9. Question 9
RNN model using SimpleRNN + Dense:
-
✅
model = Sequential([
SimpleRNN(50, activation='relu', input_shape=(time_window, 1)),
Dense(1)
])
-
❌ Using lstm
-
❌ RNN without Dense
-
❌ RNN with Dense but wrong layer type
Explanation:
SimpleRNN is the basic RNN cell.
10. Question 10
Purpose of:
def attention(self, query, key, value)
-
✅ Compute attention scores + weighted sum of values
-
❌ Apply self-attention and combine heads
-
❌ Define multi-head mechanism
-
❌ Split into multiple heads
Explanation:
Attention = softmax(QKᵀ)V.
🧾 Summary Table
| Q# | Correct Answer |
|---|---|
| 1 | Parallel processing via multi-head attention |
| 2 | Transform data after self-attention |
| 3 | Positional encoding |
| 4 | mean squared error |
| 5 | Normalize scores to probabilities |
| 6 | Spectrograms |
| 7 | call method |
| 8 | TextVectorization |
| 9 | SimpleRNN + Dense model |
| 10 | Compute attention scores & weighted sum |