This project demonstrates a number of common reinforcement learning (RL) algorithms, applied on Sutton & Barto's cliff walking problem. The aim is to aid understanding of RL mechanisms in a comprehensive environment. For this purpose, the code is relatively integrated and hard-coded. I intermittedly add new algorithms and refactor the code.
The project currently contains the following algorithms:
- Q-learning
- SARSA
- Deep Q-learning
- Discrete policy gradient
- Deep policy gradient
Neural network approaches are incorporated using TensorFlow.
My series of blog posts at Towards Data Science provides descriptions and interpretations of the implemented algorithms and their results:
Q-learning and SARSA
Monte Carlo learning
Discrete Policy Gradient
Deep Q-Learning
Deep Policy Gradient