CartPole-v0 defines "solving" as getting average reward of 195.0 over 100 consecutive trials.
Run python play.py
to check.
- Python 3.6
- Pipenv
$ pipenv install
$ pipenv shell
$ pipenv run python train.py
$ pipenv run python play.py
$ tensorboard --logdir=logs
Currently I get an error: tensorflow/tensorflow#32384