PolicyGradient_in_tensorflow2.0

Implementation of Policy Gradient in Tensorflow2.0

This code referenced skeleton code, which is in TensorFlow 1.x, offered in the course CS294-112 github
Implemented for both continuous and discrete action spaces with reward-to-go option.
Implemented q function normalization to decrease variation.

Prerequisites

python3
Tensorflow2.0
Tensorflow_probability (you need to use nightly build to use it in TF2.0)
numpy
gym

Example Usage

Training

*available arguments : --discount, --n_experiment, --n_iter, --seed, --batch, --learning_rate, --exp_name. --render

$ python PG_in_tf2.0.py CartPole-v0 --exp_name Test_disc --seed 1 --render
$ python PG_in_tf2.0.py Pendulum-v0 --exp_name Test_cont --seed 1

Plotting

$ python plot.py Data/your_exp_name
$ python plot.py Data/your_exp_name -f

Example Result

CartPole-v0 iteration : 100 number of experiments : 2

Author

Wonjun Son / Github

Name		Name	Last commit message	Last commit date
Latest commit History 11 Commits
result		result
LICENSE		LICENSE
PG_in_tf2.0.py		PG_in_tf2.0.py
README.md		README.md
logg.py		logg.py
plot.py		plot.py

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

PolicyGradient_in_tensorflow2.0

Prerequisites

Example Usage

Training

Plotting

Example Result

Author

About

Releases

Packages

Languages

License

wsonv/PolicyGradient_in_tensorflow2.0

Folders and files

Latest commit

History

Repository files navigation

PolicyGradient_in_tensorflow2.0

Prerequisites

Example Usage

Training

Plotting

Example Result

Author

About

Resources

License

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages