Languages: English | 简体中文
Train robot RL without a GPU simulation backend.
UniLab uses CPU simulation + shared-memory runtime + GPU learning instead of coupling simulation and learning inside one GPU-resident pipeline.
┌───────────────────┐ ┌─────────────────────────┐
│ CPU Physics Sim │ Unified Shared Memory │ GPU Policy Training │
│ MuJoCo/Motrix │ ─────────────────────────▶ │ PPO / SAC / TD3 │
│ Multithread Step │ SharedReplayBuffer │ CUDA / MPS / ROCm / XPU │
└───────────────────┘ └─────────────────────────┘
Start with the Quick Demo below to run the primary training command from this repository.
Conda and pip users should still follow the repository uv workflow for now; see install for the current boundaries.
# 0. If uv is not installed
curl -LsSf https://astral.sh/uv/install.sh | sh
# 1. Clone the repository
git clone https://github.com/unilabsim/UniLab.git
cd UniLab
# 2. Install dependencies
# Choose exactly one command for your platform; do not run all three.
# Linux CUDA or macOS
make setup-motrix
# Without shell completion setup: uv sync --extra motrix
# If `make` is not installed: uv sync --extra motrix && uv run --no-sync unilab-complete install
# Linux AMD / ROCm
# make sync-rocm
# Linux Intel Arc / iGPU
# make sync-xpu
# 3. Run a first PPO training job
uv run train --algo ppo --task go2_joystick_flat --sim motrixThis is the first-level training entrypoint. It routes to the registered go2_joystick_flat/motrix task owner config and keeps backend selection in the CLI flags.
For evaluation and demo playback:
uv run eval --algo ppo --task go2_joystick_flat --sim motrix --load-run -1
# Headless Motrix video export for Linux/server runs
uv run eval --algo ppo --task go2_joystick_flat --sim motrix --load-run -1 --render-mode record
# Demo playback from a local trained checkpoint
uv run demoOn macOS / MacBook, the UniLab CLI routes Motrix interactive playback through mxpython when needed. Motrix defaults to interactive playback; use --render-mode record for headless video export or --render-mode none to skip playback. Detailed script-level commands are documented under docs/sphinx/source/zh_CN/user_guide/.
Prefer a guided, step-by-step experience? Open the notebooks in Jupyter:
- Demo Notebook: local checkpoint playback via
uv run demo - PPO Training Walkthrough: end-to-end guide from config preview to training to playback, with explanations for beginners
Notebooks require a local environment (no Colab support) — MuJoCo needs local compute.
These are example repository runs for documented commands and hardware setups. They are useful as concrete entrypoints and reported timings, but they are not yet a formal benchmark manifest.
uv run train --algo sac --task g1_walk_flat --sim motrixuv run train --algo sac --task g1_sac_wbt --sim mujocouv run train --algo ppo --task sharpa_inhand --sim mujoco --profile horaMore training commands, script-level entrypoints, resume flow, and W&B details are in 03 Training Guide.
Use uv run train for training, uv run eval for checkpoint playback, and uv run demo for the local demo preset. These commands are the first-level training interface and keep algorithm, task, and backend selection explicit.
See 03 Training Guide for the algorithm matrix, log directory layout, Hydra overrides, script-level entrypoints, and demo flags.
Use docs/README.md as the documentation index. High-signal entrypoints:
- Getting Started: installation, Docker runtime, dependency setup, and first-run commands
- Training Guide: training, playback, resume flow, Hydra overrides, and W&B
- Simulation Backends: generated MuJoCo / Motrix support matrix
- Development Standard: contracts, layering, and validation boundaries
- ADR Index: accepted architecture decisions
