This repository contains attempts to solve various challanges from https://gym.openai.com/
Use python 3.6, because tensorflow does not work with 3.7
Run any of the main files (run_*.py
)
For frozen lake there is no main file. run it as follows:
python -m app.frozen_lake.frozen_lake_v0_keras
To avoid having to set up keras and tensorflow with dependencies the simulations can be run in docker (without GUI).
To enter a shell in a docker image perform the following steps:
-
docker build . -t keras_image
-
docker run -it keras_image bash
After this the simulations can be launched as normally.
Write it
Several different policies are implemented in the policy. They all have different advantages.
Takes random actions all the time, can be used for benchmarking.
Write it!
Implements Q-learning and keeps a (Q) value for each possible state. This value gets updated as the algorithm learns more about the problem. This policy works well when there is a discrete number (which is not too big) of states. When the states are not discrete or too many use DQN or DDQN instead.
Implements a Deep Q network
(DQN) algorithm. This is similar to the QTablePolicy except that instead of storing a (Q) value for every possible state it approximates using a (deep) Neural network. The network takes the state as input and generates the Q value as output.
In the case of the QTablePolicy the Q table was continuously updated as we gained more knowledge about the environment. In the DQN case the neural network is continuously updated to better approximate the proper Q value.
- This agent uses Experience Replay to improve performance.
- The weights of the model can be saved after training.
- A potential function can be used.
Write it up!
Write it!
A potential function is a function that puts a value on a specific state of the environment. The value should represent how "good" it is to be in that state. Using a potential function is good when the environment is complicated and it is hard/unlikely that the agent will reach a state which gives a reward using random exploration. The potential function will aid the algorithm to know which states are more likely to lead to a future reward.
The absolute value of a potential function is not important but the difference in value between 2 consequtive states is important. This is because the potential of one state is always compared to that of another.
When using Q-learning
this potential difference
(between states) is used in combination with the reward that can be received. Therefore it is important that the potential difference
is not (much?) bigger than the true rewards so that it overshadows them. After all the only purpose of the potential function
is to guide the algorithm to the rewards. The values must also be "big enough" to be relevant in comparison with the rewards.