How to run

This repo contains the code for the implementation of Distributional Policy Optimization: An Alternative Approach for Continuous Control (NeurIPS 2019). The theoretical framework is named DPO (Distributional Policy Optimization), whereas the Deep Learning approach to attaining it is named GAC (Generative Actor Critic).

How to run

An example of how to run the code is provided below. The exact hyper-parameters per each domain are provided in the appendix of the paper.

main.py --visualize --env-name Hopper-v2 --training_actor_samples 32 --noise normal --batch_size 128 --noise_scale 0.2 --print --num_steps 1000000 --target_policy exponential --train_frequency 2048 --replay_size 200000

Visualizing

You may visualize the run by adding the flag --visualize and starting a visdom server as follows:

python3.6 -m visdom.server

Requirements

mujoco - see explanation here: https://github.com/openai/mujoco-py
gym
numpy
tqdm - for tracking experiment time left
visdom - for visualization of the learning process

Performance

The graphs below are taken from the paper and compare the performance of our proposed method to various baselines. The best performing method is the Autoregressive network.

Name		Name	Last commit message	Last commit date
Latest commit History 5 Commits
noises		noises
policies		policies
LICENSE		LICENSE
README.md		README.md
graphs.png		graphs.png
main.py		main.py
network.py		network.py
normalized_actions.py		normalized_actions.py
replay_memory.py		replay_memory.py
utils.py		utils.py

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

How to run

Visualizing

Requirements

Performance

About

Releases

Packages

Contributors 2

Languages

License

tesslerc/GAC

Folders and files

Latest commit

History

Repository files navigation

How to run

Visualizing

Requirements

Performance

About

Topics

Resources

License

Stars

Watchers

Forks

Releases

Packages 0

Contributors 2

Languages

Packages