Skip to content

vwxyzjn/cleanrl

master
Switch branches/tags
Code

Latest commit

* Prototype the actor's output action scaling to the action space of the environment

* TD3 and DDPG's action scale and bias move to GPU if needs be

* Fixed formatting

* pre-commit fixed formating

* action_scale and action_bias in TD3 / DDPG and SAC finally using register_buffer

* Fixed the = self.register_buffer code artifact

* TD3 adjusted the exploration noise for the policy during the rollout based on action_bias and action_scale

* Removed obsolete next obs actions clamping

* td3 format fixed by pre-cmmit

* cosmatic change: make `handle_timeout_termination` explicit

* Quick fix

* update docs

* Update benchmark

* update docs

* Update docs

* DDPG and TD3: got rid of max_action, exploration noise is sampled from distribution that is centered at the mena of the action space boundaries

* Update docs

* Updated TD3 and DDPG regarding the action_mean and action_scale usage

* Reduced needless device copy when sampling action for follouts in DDPG and TD3

* Quick fix

Co-authored-by: Costa Huang <costa.huang@outlook.com>
15df5c0 1

Git stats

Files

Permalink
Failed to load latest commit information.

CleanRL (Clean Implementation of RL Algorithms)

tests ci Code style: black Imports: isort

CleanRL is a Deep Reinforcement Learning library that provides high-quality single-file implementation with research-friendly features. The implementation is clean and simple, yet we can scale it to run thousands of experiments using AWS Batch. The highlight features of CleanRL are:

  • 📜 Single-file implementation
    • Every detail about an algorithm variant is put into a single standalone file.
    • For example, our ppo_atari.py only has 340 lines of code but contains all implementation details on how PPO works with Atari games, so it is a great reference implementation to read for folks who do not wish to read an entire modular library.
  • 📊 Benchmarked Implementation (7+ algorithms and 34+ games at https://benchmark.cleanrl.dev)
  • 📈 Tensorboard Logging
  • 🪛 Local Reproducibility via Seeding
  • 🎮 Videos of Gameplay Capturing
  • 🧫 Experiment Management with Weights and Biases
  • 💸 Cloud Integration with docker and AWS

You can read more about CleanRL in our technical paper and documentation.

Good luck have fun 🚀

⚠️ NOTE: CleanRL is not a modular library and therefore it is not meant to be imported. At the cost of duplicate code, we make all implementation details of a DRL algorithm variant easy to understand, so CleanRL comes with its own pros and cons. You should consider using CleanRL if you want to 1) understand all implementation details of an algorithm's varaint or 2) prototype advanced features that other modular DRL libraries do not support (CleanRL has minimal lines of code so it gives you great debugging experience and you don't have do a lot of subclassing like sometimes in modular DRL libraries).

Get started

Prerequisites:

  • Python 3.8-3.9 (not yet 3.10)
  • Poetry

To run experiments locally, give the following a try:

git clone https://github.com/vwxyzjn/cleanrl.git && cd cleanrl
poetry install

# alternatively, you could use `poetry shell` and do
# `python run cleanrl/ppo.py`
poetry run python cleanrl/ppo.py \
    --seed 1 \
    --env-id CartPole-v0 \
    --total-timesteps 50000

# open another temrminal and enter `cd cleanrl/cleanrl`
tensorboard --logdir runs

To use experiment tracking with wandb, run

wandb login # only required for the first time
poetry run python cleanrl/ppo.py \
    --seed 1 \
    --env-id CartPole-v0 \
    --total-timesteps 50000 \
    --track \
    --wandb-project-name cleanrltest

To run training scripts in other games:

poetry shell

# classic control
python cleanrl/dqn.py --env-id CartPole-v1
python cleanrl/ppo.py --env-id CartPole-v1
python cleanrl/c51.py --env-id CartPole-v1

# atari
poetry install -E atari
python cleanrl/dqn_atari.py --env-id BreakoutNoFrameskip-v4
python cleanrl/c51_atari.py --env-id BreakoutNoFrameskip-v4
python cleanrl/ppo_atari.py --env-id BreakoutNoFrameskip-v4

# NEW: 3-4x side-effects free speed up with envpool's atari (only available to linux)
poetry install -E envpool
python cleanrl/ppo_atari_envpool.py --env-id BreakoutNoFrameskip-v4
# Learn Pong-v5 in ~5-10 mins
# Side effects such as lower sample efficiency might occur
poetry run python ppo_atari_envpool.py --clip-coef=0.2 --num-envs=16 --num-minibatches=8 --num-steps=128 --update-epochs=3

# pybullet
poetry install -E pybullet
python cleanrl/td3_continuous_action.py --env-id MinitaurBulletDuckEnv-v0
python cleanrl/ddpg_continuous_action.py --env-id MinitaurBulletDuckEnv-v0
python cleanrl/sac_continuous_action.py --env-id MinitaurBulletDuckEnv-v0

# procgen
poetry install -E procgen
python cleanrl/ppo_procgen.py --env-id starpilot
python cleanrl/ppg_procgen.py --env-id starpilot

# ppo + lstm
python cleanrl/ppo_atari_lstm.py --env-id BreakoutNoFrameskip-v4
python cleanrl/ppo_memory_env_lstm.py

You may also use a prebuilt development environment hosted in Gitpod:

Open in Gitpod

Algorithms Implemented

Overview

Algorithm Variants Implemented
Proximal Policy Gradient (PPO) ppo.py, docs
ppo_atari.py, docs
ppo_continuous_action.py, docs
ppo_atari_lstm.py, docs
ppo_atari_envpool.py, docs
ppo_procgen.py, docs
ppo_atari_multigpu.py, docs
ppo_pettingzoo_ma_atari.py, docs
Deep Q-Learning (DQN) dqn.py, docs
dqn_atari.py, docs
Categorical DQN (C51) c51.py, docs
c51_atari.py, docs
Soft Actor-Critic (SAC) sac_continuous_action.py, docs
Deep Deterministic Policy Gradient (DDPG) ddpg_continuous_action.py, docs
Twin Delayed Deep Deterministic Policy Gradient (TD3) td3_continuous_action.py, docs
Phasic Policy Gradient (PPG) ppg_procgen.py, docs

Open RL Benchmark

CleanRL has a sub project called Open RL Benchmark (https://benchmark.cleanrl.dev/), where we have tracked thousands of experiments across domains. The benchmark is interactive, and researchers can easily query information such as GPU utilization and videos of an agent's gameplay that are normally hard to acquire in other RL benchmarks. Here are some screenshots.

Support and get involved

We have a Discord Community for support. Feel free to ask questions. Posting in Github Issues and PRs are also welcome. Also our past video recordings are available at YouTube

Citing CleanRL

If you use CleanRL in your work, please cite our technical paper:

@article{huang2021cleanrl,
    title={CleanRL: High-quality Single-file Implementations of Deep Reinforcement Learning Algorithms}, 
    author={Shengyi Huang and Rousslan Fernand Julien Dossa and Chang Ye and Jeff Braga},
    year={2021},
    eprint={2111.08819},
    archivePrefix={arXiv},
    primaryClass={cs.LG}
}