Skip to content

Graded Quiz: Transformers in Keras :Deep Learning with Keras and Tensorflow (IBM AI Engineering Professional Certificate) Answers 2025

1. Question 1

Primary purpose of multi-head self-attention:

  • To process different parts of the input sequence in parallel

  • ❌ Sequential processing

  • ❌ Reduce training time

  • ❌ Ensure equal output size

Explanation:
Multi-head attention lets the model learn different relationships in parallel.


2. Question 2

Purpose of feedforward layers in Transformers:

  • ❌ Focus on sequence parts

  • ❌ Weigh importance of words

  • Transform the data after self-attention (non-linear transformation)

  • ❌ Compute attention weights

Explanation:
Feedforward networks refine and project attention outputs.


3. Question 3

How Transformers handle temporal dependencies:

  • ❌ Convolutions

  • ❌ Recurrent connections

  • Positional encodings maintain order information

  • ❌ Zero-mean normalization

Explanation:
Transformers have no recurrence → position is injected via encoding.


4. Question 4

Loss function in:

model.compile(optimizer='adam', loss='mse')
  • ❌ MinMaxScaler

  • ❌ MultiHeadAttention

  • ❌ Adam

  • mean squared error

Explanation:
mse = Mean Squared Error.


5. Question 5

Role of softmax in attention:

  • ❌ Dimensionality reduction

  • Normalize attention scores to probabilities

  • ❌ Provide non-linearity

  • ❌ Compute dot products

Explanation:
Softmax turns raw attention scores into a probability distribution.


6. Question 6

Mechanism used by transformers to convert speech to text:

  • ❌ Layers

  • Spectrograms

  • ❌ Patches

  • ❌ Images

Explanation:
Audio is converted into spectrograms before being fed to transformer models.


7. Question 7

Which method applies self-attention and combines heads?

  • ❌ TransformerBlock class

  • ❌ split_heads

  • ❌ MultiHeadSelfAttention class

  • call method

Explanation:
The call() method executes the forward pass, including computing and combining heads.


8. Question 8

Converts text to numerical format:

  • ❌ lstm.model

  • ❌ Sequential

  • TextVectorization

  • ❌ Vectorizer

Explanation:
TextVectorization converts raw text → integer sequences.


9. Question 9

RNN model using SimpleRNN + Dense:

model = Sequential([
SimpleRNN(50, activation='relu', input_shape=(time_window, 1)),
Dense(1)
])
  • ❌ Using lstm

  • ❌ RNN without Dense

  • ❌ RNN with Dense but wrong layer type

Explanation:
SimpleRNN is the basic RNN cell.


10. Question 10

Purpose of:

def attention(self, query, key, value)
  • Compute attention scores + weighted sum of values

  • ❌ Apply self-attention and combine heads

  • ❌ Define multi-head mechanism

  • ❌ Split into multiple heads

Explanation:
Attention = softmax(QKᵀ)V.


🧾 Summary Table

Q# Correct Answer
1 Parallel processing via multi-head attention
2 Transform data after self-attention
3 Positional encoding
4 mean squared error
5 Normalize scores to probabilities
6 Spectrograms
7 call method
8 TextVectorization
9 SimpleRNN + Dense model
10 Compute attention scores & weighted sum