Skip to content

wsonv/PolicyGradient_in_tensorflow2.0

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

11 Commits
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

PolicyGradient_in_tensorflow2.0

Implementation of Policy Gradient in Tensorflow2.0

  • This code referenced skeleton code, which is in TensorFlow 1.x, offered in the course CS294-112 github
  • Implemented for both continuous and discrete action spaces with reward-to-go option.
  • Implemented q function normalization to decrease variation.

Prerequisites

  • python3
  • Tensorflow2.0
  • Tensorflow_probability (you need to use nightly build to use it in TF2.0)
  • numpy
  • gym

Example Usage

Training

*available arguments : --discount, --n_experiment, --n_iter, --seed, --batch, --learning_rate, --exp_name. --render

$ python PG_in_tf2.0.py CartPole-v0 --exp_name Test_disc --seed 1 --render
$ python PG_in_tf2.0.py Pendulum-v0 --exp_name Test_cont --seed 1

Plotting

$ python plot.py Data/your_exp_name
$ python plot.py Data/your_exp_name -f

Example Result

CartPole-v0 iteration : 100 number of experiments : 2 CartPole

Author

Wonjun Son / Github

About

Implemented Policy Gradient in Tensorflow2.0

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Languages