Implementation of Policy Gradient in Tensorflow2.0
- This code referenced skeleton code, which is in TensorFlow 1.x, offered in the course CS294-112 github
- Implemented for both continuous and discrete action spaces with reward-to-go option.
- Implemented q function normalization to decrease variation.
- python3
- Tensorflow2.0
- Tensorflow_probability (you need to use nightly build to use it in TF2.0)
- numpy
- gym
*available arguments : --discount, --n_experiment, --n_iter, --seed, --batch, --learning_rate, --exp_name. --render
$ python PG_in_tf2.0.py CartPole-v0 --exp_name Test_disc --seed 1 --render
$ python PG_in_tf2.0.py Pendulum-v0 --exp_name Test_cont --seed 1
$ python plot.py Data/your_exp_name
$ python plot.py Data/your_exp_name -f
CartPole-v0 iteration : 100 number of experiments : 2
Wonjun Son / Github