ppo.cpp

Proximal Policy Optimization (PPO) written in C++.

Collect simulated experience with multi-threaded batch environments implemented in C++ and train policy + value function with PyTorch C++ (LibTorch).

Train humanoid

Train humanoid locomotion behavior in ~10 minutes (reward $\approx$ 5000) using 20 threads with Intel Core i9-14900K CPU to collect simulated experience from MuJoCo (C) and Nvidia RTX 4090 to train a neural network policy and value function with PyTorch (LibTorch) on Ubuntu 22.04.4 LTS.

From the build/ directory, run:

./run --env humanoid --train --visualize --checkpoint humanoid --device {cpu|cuda|mps}

The saved policy can be visualized:

./run --env humanoid --load humanoid_{x}_{y} --visualize --device {cpu|cuda|mps}

Visualize pretrained policy (requires Apple ARM CPU):

./run --env humanoid --load pretrained/humanoid_apple_arm --visualize --device cpu

Installation

ppo.cpp should work with Ubuntu and macOS.

Dependencies: abseil, libtorch, mujoco

Prerequisites

Operating system specific dependencies:

macOS

Install Xcode.

Install ninja:

brew install ninja

Ubuntu

sudo apt-get update && sudo apt-get install cmake libgl1-mesa-dev libxinerama-dev libxcursor-dev libxrandr-dev libxi-dev ninja-build clang-12

Clone ppo.cpp

git clone https://github.com/thowell/ppo.cpp

LibTorch

LibTorch (ie PyTorch C++) should automatically be installed by CMake. Manual installation can be performed (perform steps 1 and 2 below to create a /build directory first):

macOS

Install LibTorch 2.3.1 download and extract to ppo.cpp/build.

If you encounter warnings for malicious software for Torch: System Settings -> Security & Privacy -> Allow

You might also need to:

brew install libomp

install_name_tool -add_rpath /opt/homebrew/opt/libomp/lib PATH_TO/ppo.cpp/libtorch/lib/libtorch_cpu.dylib

Ubuntu

Install LibTorch CUDA 12.1 2.3.1 download and extract to ppo.cpp/build

Build and Run

Change directory:

cd ppo.cpp

Create and change to build directory:

mkdir build
cd build

Configure:

macOS

cmake .. -DCMAKE_BUILD_TYPE:STRING=Release -G Ninja

Ubuntu

cmake .. -DCMAKE_BUILD_TYPE:STRING=Release -G Ninja -DCMAKE_C_COMPILER:STRING=clang-12 -DCMAKE_CXX_COMPILER:STRING=clang++-12

Build

cmake --build . --config=Release

Build and Run ppo.cpp using VSCode

VSCode and 2 of its extensions (CMake Tools and C/C++) can simplify the build process.

Open the cloned directory ppo.cpp.
Configure the project with CMake (a pop-up should appear in VSCode)
Set compiler to clang-12.
Build and run the ppo target in "release" mode (VSCode defaults to "debug").

Command-line interface

Setup:

--env: humanoid
--train: train policy and value function with PPO
--checkpoint: filename in checkpoint/ to save policy
--load: provide string in checkpoint/ directory to load policy from checkpoint
--visualize: visualize policy

Hardware settings:

--num_threads: number of threads/workers for collecting simulation experience [default: 20]
--device: learning device [default: cpu, cuda, mps]
--device_sim: simulation device [default: cpu, cuda, mps]
--device_type: data type for device [default: float]
--device_sim_type: data type for device_sim [default: double]

PPO settings:

--num_envs: number of parallel learning environments for collecting simulation experience
--num_steps: number of environment steps for each environment used for learning
--minibatch_size: size of minibatch
--learning_rate: initial learning rate for policy and value function optimizer
--max_env_steps: total number of environment steps to collect
--anneal_lr: flag to anneal learning rate
--kl_threshold: maximum KL divergence between old and new policies
--gamma: discount factor for rewards
--gae_lambda: factor for Generalized Advantage Estimation
--update_epochs: number of iterations complete batch of experience is used to improve policy and value function
--norm_adv: flag for normalizing advantages
--clip_coef: value for PPO clip parameter
--clip_vloss: flag for clipping value function loss
--ent_coef: weight for entropy loss
--vf_coef: weight for value function loss
--max_grad_norm: maximum value for global L2-norm of parameter gradients
--optimizer_eps: epsilson value for Adam optimizer
--normalize_observation: normalize observations with running statistics
--normalize_reward: normalize rewards with running statistics

Evaluation settings:

--num_eval_envs: number of environments for evaluating policy performance
--max_eval_steps: number of simulation steps (per environment) for evaluating policy performance
--num_iter_per_eval: number of iterations per policy evaluation

Notes

This repository was developed to:

understand the Proximal Policy Optimization algorithm
understand the details of Gym environments, including autoresets, normalization, batch environments, etc
understand the normal distribution neural network policy formulation for continuous control environments
gain experience with PyTorch C++ API
experiment with code generation tools that are useful for improving development times, including: ChatGPT and Claude
gain a better understanding of where performance bottlenecks exist for PPO
gain a better understanding of how MuJoCo models can be modified to improve steps/time
gain more experience using CMake

MuJoCo models use resources from IsaacGym environments, MuJoCo Menagerie, MJX Tutorial, and dm_control

PPO implementation is based on cleanrl: ppo_continuous_action.py.

Name		Name	Last commit message	Last commit date
Latest commit History 1 Commit
assets		assets
checkpoint		checkpoint
cmake		cmake
environments/mujoco		environments/mujoco
ppo		ppo
.gitignore		.gitignore
CMakeLists.txt		CMakeLists.txt
LICENSE		LICENSE
README.md		README.md
run.cpp		run.cpp

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

ppo.cpp

Train humanoid

Installation

Prerequisites

macOS

Ubuntu

Clone ppo.cpp

LibTorch

macOS

Ubuntu

Build and Run

macOS

Ubuntu

Build and Run ppo.cpp using VSCode

Command-line interface

Notes

About

Releases

Packages

Languages

License

thowell/ppo.cpp

Folders and files

Latest commit

History

Repository files navigation

ppo.cpp

Train humanoid

Installation

Prerequisites

macOS

Ubuntu

Clone ppo.cpp

LibTorch

macOS

Ubuntu

Build and Run

macOS

Ubuntu

Build and Run ppo.cpp using VSCode

Command-line interface

Notes

About

Topics

Resources

License

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages