Skip to content
RL library for researchers. Hackability, readability, and great experiment management are this library's top priority.
Python Shell
Branch: master
Clone or download
Permalink
Type Name Latest commit message Commit time
Failed to load latest commit information.
benchmark A2C # Fix the logits error Jun 8, 2019
cleanrl
others Master # Update gitignore to ignore tensorflow files Jun 7, 2019
.gitignore Master # ignore wandb outputs in git Aug 14, 2019
LICENSE Initial commit Jun 7, 2019
README.md Update README.md Oct 27, 2019
demo.gif Master # Update Oct 7, 2019
demo2.gif Master # Update Oct 7, 2019
setup.py Master # Update config Oct 16, 2019
wandb.png

README.md

CleanRL (Clean Implementation of RL Algorithms)

This project is WIP currently at 0.1 release, expect breaking changes.

This repository focuses on a clean and minimal implementation of reinforcement learning algorithms that focuses on easy experimental research. The highlight features of this repo are:

  • Most algorithms are self-contained in single files with a common dependency file common.py that handles different gym spaces.

  • Easy logging of training processes using Tensorboard and Integration with wandb.com to log experiments on the cloud. Check out https://cleanrl.costa.sh.

  • Hackable and being able to debug directly in Python’s interactive shell (Especially if you use the Spyder editor from Anaconda :) ).

  • Convenient use of commandline arguments for hyper-parameters tuning.

  • Benchmarked in many types of games. https://cleanrl.costa.sh

wandb.png

Get started

To run experiments locally, give the following a try:

$ git clone https://github.com/vwxyzjn/cleanrl.git && cd cleanrl
$ pip install -e .
$ cd cleanrl
$ python a2c.py \
    --seed 1 \
    --gym-id CartPole-v0 \
    --total-timesteps 50000 \
# open another temrminal and enter `cd cleanrl/cleanrl`
$ tensorboard --logdir runs

demo.gif

To use wandb integration, sign up an account at https://wandb.com and copy the API key. Then run

$ cd cleanrl
$ pip install wandb
$ wandb login ${WANBD_API_KEY}
$ python a2c.py \
    --seed 1 \
    --gym-id CartPole-v0 \
    --total-timesteps 50000 \
    --prod-mode True \
    --wandb-project-name cleanrltest 
# Then go to https://app.wandb.ai/${WANDB_USERNAME}/cleanrltest/

Checkout the demo sites at https://app.wandb.ai/costa-huang/cleanrltest

demo2.gif

User's Guide for Researcher (Please read this if consider using CleanRL)

CleanRL focuses on early and mid stages of RL research, where one would try to understand ideas and do hacky experimentation with the algorithms. If your goal does not include messing with different parts of RL algorithms, perhaps library like stable-baselines, ray, or catalyst would be more suited for your use cases since they are built to be highly optimized, concurrent and fast.

CleanRL, however, is built to provide a simplified and streamlined approach to conduct RL experiment. Let's give an example. Say you are interested in implementing the GAE (Generalized Advantage Estimation) technique to see if it improves the A2C's performance on CartPole-v0. The workflow roughly looks like this:

  1. Make a copy of cleanrl/cleanrl/a2c.py to cleanrl/cleanrl/experiments/a2c_gae.py
  2. Implement the GAE technique. This should relatively simple because you don't have to navigate into dozens of files and find the some function named compute_advantages()
  3. Run python cleanrl/cleanrl/experiments/a2c_gae.py in the terminal or using an interactive shell like Spyder. The latter gives you the ability to stop the program at any time and execute arbitrary code; so you can program on the fly.
  4. Open another terminal and type tensorboard --logdir cleanrl/cleanrl/experiments/runs and checkout the episode_rewards, losses/policy_loss, etc. If something appears not right, go to step 2 and continue.
  5. If the technique works, you want to see if it works with other games such as Taxi-v3 or different parameters as well. Execute
    $ wandb login ${WANBD_API_KEY}
    $ for seed in {1..2}
        do
            (sleep 0.3 && nohup python a2c_gae.py \
            --seed $seed \
            --gym-id CartPole-v0 \
            --total-timesteps 30000 \
            --wandb-project-name myRLproject \
            --prod-mode True
            ) >& /dev/null &
        done
    $ for seed in {1..2}
        do
            (sleep 0.3 && nohup python a2c_gae.py \
            --seed $seed \
            --gym-id Taxi-v3 \   # different env
            --total-timesteps 30000 \
            --gamma 0.8 \ # a lower discount factor
            --wandb-project-name myRLproject \
            --prod-mode True
            ) >& /dev/null &
        done
    
    And then you can monitor the performances and keep good track of all the parameters used in your experiments
  6. Continue this process

This pipline described above should give you an idea of how to use CleanRL for your research.

Feature TODOs:

References

I have been heavily inspired by the many repos and blog posts. Below contains a incomplete list of them.

The following ones helped me a lot with the continuous action space handling:

You can’t perform that action at this time.