Skip to content

Graded Quiz: Introduction to Reinforcement Learning with Keras :Deep Learning with Keras and Tensorflow (IBM AI Engineering Professional Certificate) Answers 2025

1. Question 1

Primary objective of Q-learning:

  • To learn a policy that maximizes the cumulative reward over time

  • ❌ Minimize immediate reward

  • ❌ Ignore future rewards

  • ❌ Clustering

Explanation:
Q-learning is all about maximizing long-term cumulative reward.


2. Question 2

Role of Q-value function:

  • ❌ Terminal state probability

  • Expected utility of taking action a in state s

  • ❌ Record sequence of actions

  • ❌ Count steps

Explanation:
Q(s, a) predicts how good an action is in a given state.


3. Question 3

Exploration rate (ε) refers to:

  • ❌ Speed of Q update

  • ❌ Reward discount

  • ❌ Reset frequency

  • Probability of selecting a random action

Explanation:
ε controls exploration vs exploitation.


4. Question 4

Why use a neural network for Q-values in large spaces?

  • ❌ Simplify action choice

  • ❌ No reward needed

  • ❌ Increase computation

  • Q-tables become impossible to store → use neural networks

Explanation:
Deep Q-learning approximates Q(s, a) when table-based methods fail.


5. Question 5

Balancing exploration and exploitation:

  • ❌ K-means

  • ❌ Backpropagation

  • ❌ SGD

  • Epsilon-greedy policy

Explanation:
ε-greedy selects random actions with probability ε.


6. Question 6

Key DQN innovation:

  • ❌ Single network

  • ❌ Immediate rewards only

  • ❌ Continuous action

  • Experience replay + target networks

Explanation:
Replay breaks correlations; target network stabilizes training.


7. Question 7

Purpose of replay buffer:

  • ❌ Store Q-values

  • Store experiences for random sampling

  • ❌ Reset environment

  • ❌ Reduce learning rate

Explanation:
Random sampling avoids correlated updates.


8. Question 8

Target network updates:

  • ❌ More frequent

  • ❌ Never updated

  • ❌ Same frequency

  • Less frequently than the primary network

Explanation:
Slow updates → stable targets.


9. Question 9

Role of Bellman equation:

  • ❌ Calculate immediate reward

  • ❌ Initialize weights

  • Update Q-values using immediate + discounted future reward

  • ❌ Determine action count

Explanation:
Bellman equation defines Q-value updates.


10. Question 10

Significance of discount factor γ:

  • Importance of future rewards

  • ❌ Learning rate

  • ❌ Normalize Q-values

  • ❌ Exploration rate

Explanation:
γ ∈ [0,1] controls how much future rewards matter.


🧾 Summary Table

Q# Correct Answer
1 Maximize cumulative reward
2 Expected utility Q(s,a)
3 Probability of random action (ε)
4 Replace impractical Q-table
5 Epsilon-greedy
6 Replay buffer + target networks
7 Random sampling of experiences
8 Update less frequently
9 Update Q-values (Bellman equation)
10 Weight future rewards (γ)