Skip to content

wmd3i/gtp

Repository files navigation

GTP

This repository contains the training code for GTP and baseline offline reinforcement learning methods on D4RL and OGBench tasks. The main maintained entry points live in trainers/, with reproducibility scripts under scripts/.

Repository Layout

  • agents/: policy, critic, trajectory, diffusion, consistency, and flow model implementations.
  • trainers/: command-line training entry points for GTP and baseline methods.
  • utils/: dataset samplers, logging, PyTorch helpers, and shared utilities.
  • scripts/: organized experiment launchers for examples, D4RL, OGBench, and legacy runs.
  • download_data.py: D4RL dataset downloader and pickle converter.

Generated datasets, logs, WandB files, and training results are ignored by git.

Environment Setup

The recommended setup uses uv:

uv sync

If you already activated a virtual environment, use:

uv sync --active

D4RL is sensitive to dependency versions. In particular, this repo pins gym==0.23.1 and cython<3 for compatibility with d4rl and mujoco-py.

After installation, a quick import check is:

uv run python -c "import gym, mujoco_py, d4rl; print(gym.__version__)"

For headless MuJoCo machines, set:

export MUJOCO_GL=egl

Data

Download and convert the D4RL datasets with:

uv run python download_data.py

Raw D4RL .hdf5 files are cached by D4RL under ~/.d4rl/datasets. Converted pickle files are written to dataset/ in this repository.

OGBench experiments use the OGBench data directory passed through --dataset_dir, which defaults to ~/.ogbench/data.

Training

Run GTP directly on a D4RL task:

uv run python -m trainers.gtp \
  --env_name walker2d-medium-expert-v2 \
  --exp gtp_demo \
  --seed 0 \
  --save_best_model \
  --lr_decay

Run Flow Q-Learning directly:

uv run python -m trainers.flow_ql \
  --env_name walker2d-medium-expert-v2 \
  --exp flowql_demo \
  --seed 0

OGBench state-based cube example:

uv run python -m trainers.gtp \
  --env_name cube-single-play-singletask-task1-v0 \
  --dataset_backend ogbench \
  --exp gtp_cube_single_task1 \
  --seed 0

OGBench visual cube example:

uv run python -m trainers.gtp \
  --env_name visual-cube-single-play-singletask-task1-v0 \
  --dataset_backend ogbench \
  --dataset_dir ~/.ogbench/data \
  --exp gtp_visual_cube_single_task1 \
  --seed 0

Additional maintained entry points include:

uv run python -m trainers.diffusion_ql
uv run python -m trainers.consistency_ql
uv run python -m trainers.offline_rl

Experiment Scripts

See scripts/README.md for the organized launchers. Common examples:

bash scripts/examples/train_gtp_kitchen.sh
MAX_PARALLEL=5 bash scripts/d4rl/run_gtp_sweep.sh
bash scripts/ogbench/run_cube_single_paper.sh
TRAINER=flow_ql bash scripts/ogbench/run_visual_cube_single.sh

Most scripts accept environment-variable overrides such as DEVICE, SEED_LIST, TASK_IDS, NUM_EPOCHS, DATASET_DIR, and MAX_PARALLEL. Extra trainer flags can be passed at the end of the script command.

Logging

Training uses WandB through utils/wandb_logger.py. If WANDB_API_KEY or WANDB_MODE=online is set, runs sync online. Otherwise, the logger defaults to offline mode.

Useful overrides:

export WANDB_PROJECT=gtp
export WANDB_ENTITY=<your-entity>
export WANDB_MODE=offline

Citation

If you find this repository helpful in your research, please consider citing our paper:

@inproceedings{feng2026offline,
  title={Offline Reinforcement Learning with Generative Trajectory Policies},
  author={Feng, Xinsong and Tang, Leshu and Wang, Chenan and Chen, Haipeng},
  booktitle={International Conference on Machine Learning (ICML)},
  year={2026},
}

About

Official implementation: Offline Reinforcement Learning with Generative Trajectory Policies (ICML 2026)

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

 
 
 

Contributors