Skip to content

Reinforcement Learning as applied to a simplified blackjack game: Easy21

Notifications You must be signed in to change notification settings

tybens/rl-easy21

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

7 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

Reinforcement Learning on Easy21

Reinforcement Learning is applied to Easy21. This is an assignment as part of David Silver's Reinforcement Learning Course at UCL. The assignment can be found here.

Monte-Carlo Control

python3 monteCarlo.py

The agent played 1 Million games (episodes) to obtain the following Value function:

Visualized as a heatmap:

The optimal policy chosen by selecting the actions with the highest value:

TD Learning

python3 temporalDifference.py

The MSE of Q, the state-action function, over the course of episodic learning. For each lambda, 10,000 Episodes have been measured against the Monte-Carlo 1 Million state-action function, saved in Q.dill:

Mean Squared Error after 1,000 episodes for different lambdas:

The optimal policy as derived from 10,000 episodes of TD(lambda = 0.3):

Linear Function Approximation

python3 lfa.py

The matrix lookup-table approach of the previous models are replaced by coarse coding function approximator. This reduces the 420 state-action combinations down to 36.

Releases

No releases published

Packages

No packages published

Languages