a basic implementation of the A3C algorithm, with continuous control.
- Command line argument to specify scenario file
- How to save output files (optionally save output file)
- Non-Zero Entropy
There is a large issue with float64 used by the ate3 env and float32 used by universe starter agent
- Python 2.7 or 3.5
- Golang
- six (for py2/3 compatibility)
- TensorFlow 0.12
- tmux (the start script opens up a tmux session with multiple windows)
- htop (shown in one of the tmux windows)
- gym
- gym[atari]
- libjpeg-turbo (
brew install libjpeg-turbo
) - universe
- opencv-python
- numpy
- scipy
conda create --name universe-starter-agent python=3.5
source activate universe-starter-agent
brew install tmux htop cmake golang libjpeg-turbo # On Linux use sudo apt-get install -y tmux htop cmake golang libjpeg-dev
pip install "gym[atari]"
pip install universe
pip install six
pip install tensorflow
conda install -y -c https://conda.binstar.org/menpo opencv3
conda install -y numpy
conda install -y scipy
Add the following to your .bashrc
so that you'll have the correct environment when the train.py
script spawns new bash shells
source activate universe-starter-agent
python train.py --num-workers 2 --log-dir /tmp/ate3
The command above will train an agent on ATE3 simulator.
It will see two workers that will be learning in parallel (--num-workers
flag) and will output intermediate results into given directory.
The code will launch the following processes:
- worker-0 - a process that runs policy gradient
- worker-1 - a process identical to process-1, that uses different random noise from the environment
- ps - the parameter server, which synchronizes the parameters among the different workers
- tb - a tensorboard process for convenient display of the statistics of learning
Once you start the training process, it will create a tmux session with a window for each of these processes. You can connect to them by typing tmux a
in the console.
Once in the tmux session, you can see all your windows with ctrl-b w
.
To switch to window number 0, type: ctrl-b 0
. Look up tmux documentation for more commands.
To access TensorBoard to see various monitoring metrics of the agent, open http://localhost:12345/ in a browser.
Other commands:
python worker.py --job-name ps --num-workers 1
- creates a single parameter server
python worker.py --job-name worker --num-workers 1
- creates a single worker to interact with environment