rl-easy21

My solutions for David Silver's UCL course on Reinforcement Learning. http://www0.cs.ucl.ac.uk/staff/d.silver/web/Teaching.html

Discussion

What are the pros and cons of bootstrapping in Easy21?

TD-learning has smaller variance but might be biases. Comparing it to the unbiased estimation of the MC method using MSE, we can see that the estimated Q function is not far off and while it converged much faster.

Would you expect bootstrapping to help more in blackjack or Easy21 ?Why?

In blackjack we have an Ace which might make an episode a bit longer. Easy21 has red cards which also make the episodes longer. It is not straightforward, but I believe on average, Easy21 episodes will be longer (considering the chances of getting an Ace or a red card). As the episodes will be longer, bootstrapping which has less variance will help it more.

What are the pros and cons of function approximation in Easy21?

Faster convergence (as we have less parameters) but worse approximation.

How would you modify the function approximator suggested in this section

to get better results in Easy21? As the total return is {-1,0,1} we can add a tanh layer that will make sure the output is within these limits.

Name		Name	Last commit message	Last commit date
Latest commit History 2 Commits
figures		figures
README.md		README.md
environment.py		environment.py
linear_func_approximation.py		linear_func_approximation.py
mc_control.py		mc_control.py
td_learning.py		td_learning.py

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

rl-easy21

Discussion

What are the pros and cons of bootstrapping in Easy21?

Would you expect bootstrapping to help more in blackjack or Easy21 ?Why?

What are the pros and cons of function approximation in Easy21?

How would you modify the function approximator suggested in this section

About

Releases

Packages

Languages

yonilev/rl-easy21

Folders and files

Latest commit

History

Repository files navigation

rl-easy21

Discussion

What are the pros and cons of bootstrapping in Easy21?

Would you expect bootstrapping to help more in blackjack or Easy21 ?Why?

What are the pros and cons of function approximation in Easy21?

How would you modify the function approximator suggested in this section

About

Resources

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages