GitHub - u84819482/Nano-RL: Tabular TD control in MAZE environment using Q-Learning, SARSA, and Expected SARSA

Minimal implemention of Tabular TD control using SARSA, Expected SARSA, and Q-Learning in a MAZE environment.

The TD control algorithms are based on Bellman equations. SARSA and Expected SARSA use a sample-based version of the Bellman equation and learn q_pi. Q-Learning uses the Bellman optimality equation and it learns q*. The three learning algorithms are shown below, by highlighting their targets.

When it comes to environment, the MAZE dimensions can be varied, as well as its cell content: standard 'Blocks' in black, 'Holes' in red with nominally high penalty, and 'Food' in green with nominally high reward. As an example, a 10x15 MAZE with B/H/F fractions of (0.3, 0.1, 0.1) is shown below. The cell placements can be randomly varied while keeping the same fractions. The example in the notebook focuses on a smaller maze to shorten the training.

The MAZE solver offers an opportunity to compare the three TD methods across a vast design space as a function of reward assignments, cell fractions, epsilon, gamma, and alpha. When the impact of 'Holes' and 'Food' is low, either due to their low fraction or low absolute reward, the three methods behave comparably. Otherwise, the training episodes get proportionally longer and the performance of the three methods tends to differ more.

To assist such investigations, the provided code offers several visualizations, e.g., average score vs. episodes:

Agent's trajectory for a specific episode before and after training:

Animation of Agent's moves for a specific episode before and after training (more GIF examples in 'images' folder):

References:

Name		Name	Last commit message	Last commit date
Latest commit History 28 Commits
images		images
LICENSE		LICENSE
README.md		README.md
Tabular_TD_MAZE_solver.ipynb		Tabular_TD_MAZE_solver.ipynb

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

About

Releases

Packages

Languages

License

u84819482/Nano-RL

Folders and files

Latest commit

History

Repository files navigation

About

Topics

Resources

License

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages