Skip to content

Tabular TD control in MAZE environment using Q-Learning, SARSA, and Expected SARSA

License

Notifications You must be signed in to change notification settings

u84819482/Nano-RL

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

28 Commits
 
 
 
 
 
 
 
 

Repository files navigation

Minimal implemention of Tabular TD control using SARSA, Expected SARSA, and Q-Learning in a MAZE environment.

The TD control algorithms are based on Bellman equations. SARSA and Expected SARSA use a sample-based version of the Bellman equation and learn q_pi. Q-Learning uses the Bellman optimality equation and it learns q*. The three learning algorithms are shown below, by highlighting their targets.

TD_control_algorithms


When it comes to environment, the MAZE dimensions can be varied, as well as its cell content: standard 'Blocks' in black, 'Holes' in red with nominally high penalty, and 'Food' in green with nominally high reward. As an example, a 10x15 MAZE with B/H/F fractions of (0.3, 0.1, 0.1) is shown below. The cell placements can be randomly varied while keeping the same fractions. The example in the notebook focuses on a smaller maze to shorten the training.

Maze


The MAZE solver offers an opportunity to compare the three TD methods across a vast design space as a function of reward assignments, cell fractions, epsilon, gamma, and alpha. When the impact of 'Holes' and 'Food' is low, either due to their low fraction or low absolute reward, the three methods behave comparably. Otherwise, the training episodes get proportionally longer and the performance of the three methods tends to differ more.

To assist such investigations, the provided code offers several visualizations, e.g., average score vs. episodes:

Scores_vs_Episodes


Agent's trajectory for a specific episode before and after training:

Example_trajectory


Animation of Agent's moves for a specific episode before and after training (more GIF examples in 'images' folder):

Expected SARSA_last_exp_episode_1 Expected SARSA_last_exp_episode_30


References:

About

Tabular TD control in MAZE environment using Q-Learning, SARSA, and Expected SARSA

Topics

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published