When stacking LSTMs, how do you feed the next LSTM in the sequence?

Explanation:
Only LSTM layers feeding into another LSTM must output a sequence → set return_sequences=True.

How does an LSTM capture long-range meaning?

Explanation:
LSTMs maintain memory through a cell state that flows across time steps.

Best way to avoid overfitting in NLP datasets?

Explanation:
Overfitting isn’t solved by architecture choice → use regularization, dropout, augmentation, more data.

Which Keras layer allows LSTMs to read forward and backward?

Explanation:
Bidirectional() wraps an LSTM so it processes the sequence forward and backward.

Why does sequence matter for semantics?

Explanation:
Word order changes meaning entirely:
“dog bites man” ≠ “man bites dog”.

How do RNNs help understand sequence meaning?

Explanation:
RNNs pass hidden states between time steps → sequence-aware understanding.

Output shape of a bidirectional LSTM with 64 units?

Explanation:
Bidirectional doubles the units:
64 forward + 64 backward = 128.

Sentence = 120 tokens → Conv1D with 128 filters, kernel size = 5.
What is output shape?

Explanation:
Conv1D output length (no padding):
120 − 5 + 1 = 116
Filters = 128 channels.

🧾 Summary Table

Q#	Correct Answer	Key Concept
1	return_sequences=True for stacked LSTMs	Stacking LSTMs
2	Cell state carries info	LSTM long-range dependencies
3	None of the above	Overfitting not solved by architecture
4	Bidirectional	Forward + backward context
5	Sequence dictates meaning	NLP semantics
6	Carry meaning between cells	RNN sequence modeling
7	(None, 128)	Bidirectional doubles units
8	(None, 116, 128)	Conv1D output shape

Week 3 Quiz :Natural Language Processing in TensorFlow (DeepLearning.AI TensorFlow Developer Professional Certificate) Answers 2025