Value Iteration, Policy Iteration and Q learning in Frozen lake gym env
The goal of this game is to go from the starting state (S) to the goal state (G) by walking only on frozen tiles (F) and avoid holes (H). However, the ice is slippery, so you won't always move in the direction you intend (stochastic environment).
These instructions will get you a copy of the project up and running on your local machine for development and testing purposes. See deployment for notes on how to deploy the project on a live system.
-
Python 3.6.10
-
gym >= 0.15.4
-
numpy >= 1.16.2
-
matplotlib >= 3.1.1
.
├── src # Python scripts
│ ├── value_iteration.py # VI algorithm
│ ├── policy_iteration.py # PI algorithm
│ ├── q_learning.py # Q-learning algorithm
│ └── utils.py # Utility sets
├── images # Results
└── README.md
There are 3 methods you can try, namely policy iteration, value iteration, and Q-learning, with corresponding file name.
ex. if you want to try policy iteration, just do
python policy_iteration.py
The resulting image would show the average success rate versus the number of episode.
Average success rate of value iteration algorithm over 50 episodes.
- Arthur Hsieh - Initial work - arthur960304