This repository contains the training code for GTP and baseline offline reinforcement learning methods on D4RL and OGBench tasks. The main maintained entry points live in trainers/, with reproducibility scripts under scripts/.
agents/: policy, critic, trajectory, diffusion, consistency, and flow model implementations.trainers/: command-line training entry points for GTP and baseline methods.utils/: dataset samplers, logging, PyTorch helpers, and shared utilities.scripts/: organized experiment launchers for examples, D4RL, OGBench, and legacy runs.download_data.py: D4RL dataset downloader and pickle converter.
Generated datasets, logs, WandB files, and training results are ignored by git.
The recommended setup uses uv:
uv syncIf you already activated a virtual environment, use:
uv sync --activeD4RL is sensitive to dependency versions. In particular, this repo pins gym==0.23.1 and cython<3 for compatibility with d4rl and mujoco-py.
After installation, a quick import check is:
uv run python -c "import gym, mujoco_py, d4rl; print(gym.__version__)"For headless MuJoCo machines, set:
export MUJOCO_GL=eglDownload and convert the D4RL datasets with:
uv run python download_data.pyRaw D4RL .hdf5 files are cached by D4RL under ~/.d4rl/datasets. Converted pickle files are written to dataset/ in this repository.
OGBench experiments use the OGBench data directory passed through --dataset_dir, which defaults to ~/.ogbench/data.
Run GTP directly on a D4RL task:
uv run python -m trainers.gtp \
--env_name walker2d-medium-expert-v2 \
--exp gtp_demo \
--seed 0 \
--save_best_model \
--lr_decayRun Flow Q-Learning directly:
uv run python -m trainers.flow_ql \
--env_name walker2d-medium-expert-v2 \
--exp flowql_demo \
--seed 0OGBench state-based cube example:
uv run python -m trainers.gtp \
--env_name cube-single-play-singletask-task1-v0 \
--dataset_backend ogbench \
--exp gtp_cube_single_task1 \
--seed 0OGBench visual cube example:
uv run python -m trainers.gtp \
--env_name visual-cube-single-play-singletask-task1-v0 \
--dataset_backend ogbench \
--dataset_dir ~/.ogbench/data \
--exp gtp_visual_cube_single_task1 \
--seed 0Additional maintained entry points include:
uv run python -m trainers.diffusion_ql
uv run python -m trainers.consistency_ql
uv run python -m trainers.offline_rlSee scripts/README.md for the organized launchers. Common examples:
bash scripts/examples/train_gtp_kitchen.sh
MAX_PARALLEL=5 bash scripts/d4rl/run_gtp_sweep.sh
bash scripts/ogbench/run_cube_single_paper.sh
TRAINER=flow_ql bash scripts/ogbench/run_visual_cube_single.shMost scripts accept environment-variable overrides such as DEVICE, SEED_LIST, TASK_IDS, NUM_EPOCHS, DATASET_DIR, and MAX_PARALLEL. Extra trainer flags can be passed at the end of the script command.
Training uses WandB through utils/wandb_logger.py. If WANDB_API_KEY or WANDB_MODE=online is set, runs sync online. Otherwise, the logger defaults to offline mode.
Useful overrides:
export WANDB_PROJECT=gtp
export WANDB_ENTITY=<your-entity>
export WANDB_MODE=offlineIf you find this repository helpful in your research, please consider citing our paper:
@inproceedings{feng2026offline,
title={Offline Reinforcement Learning with Generative Trajectory Policies},
author={Feng, Xinsong and Tang, Leshu and Wang, Chenan and Chen, Haipeng},
booktitle={International Conference on Machine Learning (ICML)},
year={2026},
}