Skip to content

upkie/ppo_balancer

Repository files navigation

PPO balancer

upkie

The PPO balancer is a feedforward neural network policy trained by reinforcement learning with a sim-to-real pipeline. Like the MPC balancer and PID balancer, it balances Upkie with straight legs. Training uses the UpkieGroundVelocity gym environment and the PPO implementation from Stable Baselines3.

An overview video of the training pipeline is given in this video: Sim-to-real RL pipeline for Upkie wheeled bipeds.

Installation

On your machine

Install pixi.

On your Upkie

The PPO balancer uses pixi-pack to pack a standalone Python environment to run policies on your Upkie. First, create environment.tar on your machine and upload it by:

make pack_pixi_env
make upload

Then, unpack the remote environment:

$ ssh user@your-upkie
user@your-upkie:~$ cd ppo_balancer
user@your-upkie:ppo_balancer$ make unpack_pixi_env

Usage

To run the deployed policy on your Upkie:

make run_agent

Before that, to test the policy on your machine:

pixi run agent

Here we assumed the spine is already up and running, for instance by running ./start_simulation.sh on your machine, or by starting a pi3hat spine on the robot.

Training a new policy

First, check that training progresses one rollout at a time:

pixi run show_training

Once this works, train for real with more environments and no GUI:

pixi run train <nb_envs>

Adjust the number nb_envs of parallel environments based on the time/fps series. The series is reported to the command line (or to TensorBoard if you configure UPKIE_TRAINING_PATH as detailed below). Increase or decrease the number of environments until you find the sweet spot that maximizes FPS on your machine.

TensorBoard

The repository comes with a training directory that will store logs each time a new policy is learned. Set the UPKIE_TRAINING_PATH environment variable to enable this:

export UPKIE_TRAINING_PATH="${HOME}/src/ppo_balancer/training"

Trainings will be grouped automatically by day. You can start TensorBoard for today by:

pixi run tensorboard

Advanced usage

To run a policy saved to a custom path, use for instance:

python ppo_balancer/run.py --policy ppo_balancer/training/2023-11-15/final.zip

See also

About

Train a balancing policy for Upkie by reinforcement learning

Topics

Resources

License

Stars

Watchers

Forks

Contributors 3

  •  
  •  
  •