GitHub - zebrajack/GoFAR: Official repository for Paper "Offline Goal-Conditioned Reinforcement Learning via f-Advantage Regression"

How Far I'll Go:
Offline Goal-Conditioned Reinforcement Learning via
f-Advantage Regression

[Project Page] [Paper]

Jason Yecheng Ma¹, Jason Yan¹, Dinesh Jayaraman¹, Osbert Bastani¹

¹University of Pennsylvania

This is a PyTorch implementation of our paper How Far I'll Go: Offline Goal-Conditioned Reinforcement Learning via F-Advantage Regression; this code can be used to reproduce Section 5.1 and 5.2 of the paper.

Here is a teaser video comparing GoFAR against state-of-art offline GCRL algorithms on a real robot!

SetUp

Requirements

MuJoCo=2.0.0

Setup Instructions

Create conda environment and activate it:

conda env create -f environment.yml
conda activate gofar
pip install --upgrade numpy
pip install torch==1.10.0 torchvision==0.11.1 torchaudio===0.10.0 gym==0.17.3

(Optionally) install the Robel environment for the D'Claw experiment.
Download the offline dataset here and place /offline_data in the project root directory.

Experiments

We provide commands for reproducing the main GCRL results (Table 1), the ablations (Figure 3), and the stochastic offline GCRL experiment (Figure 4).

The main results (Table 1) can be reproduced by the following command:

mpirun -np 1 python train.py --env $ENV --method $METHOD

Flags and Parameters	Description
`--env $ENV`	offline GCRL tasks: `FetchReach, FetchPush, FetchPick, FetchSlide, HandReach, DClawTurn`
`--method $METHOD`	offline GCRL algorithms: `gofar, gcsl, wgcsl, actionablemodel, ddpg`

To run the ablations (Figure 3), we can adjust some relevant command arguments. For example, to disable HER, we can do

mpirun -np 1 python train.py --env $ENV --method $METHOD --relabel False

Note that gofar defaults to not using HER, so this command is only relevant to the baselines. Relevant flags are listed here:

Flags and Parameters	Description
`--relabel`	whether hindsight experience replay is enabled: `True`, `False`
`--relabel_percent`	The fraction of minibatch transitions that has relabeled goals: `0.0, 0.2, 0.5, 1.0`; these are the hyperparameters attempted in the paper, you may try other fractions too.
`--f`	Choices of f-divergence for GoFAR: `kl, chi`.
`--reward_type`	Choices of reward function for GoFAR: `disc, binary`.

The following command will run the stochastic environment experiment (Figure 4):

mpirun -np 1 python train.py --env FetchReach --method $METHOD --noise True --noise-eps $NOISE_EPS

where $NOISE_EPS can be chosen from 0.5, 1.0, 1.5.

Acknowledgement:

We borrowed some code from the following repositories:

Name		Name	Last commit message	Last commit date
Latest commit History 7 Commits
envs		envs
her_modules		her_modules
media		media
mpi_utils		mpi_utils
rl_modules		rl_modules
.gitignore		.gitignore
README.md		README.md
environment.yml		environment.yml
train.py		train.py

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

envs

envs

her_modules

her_modules

media

media

mpi_utils

mpi_utils

rl_modules

rl_modules

.gitignore

.gitignore

README.md

README.md

environment.yml

environment.yml

train.py

train.py

Repository files navigation

How Far I'll Go:
Offline Goal-Conditioned Reinforcement Learning via
f-Advantage Regression

[Project Page] [Paper]

SetUp

Requirements

Setup Instructions

Experiments

Acknowledgement:

About

Releases

Packages

Languages

zebrajack/GoFAR

Folders and files

Latest commit

History

Repository files navigation

How Far I'll Go:Offline Goal-Conditioned Reinforcement Learning via f-Advantage Regression

SetUp

Requirements

Setup Instructions

Experiments

Acknowledgement:

About

Resources

Stars

Watchers

Forks

Languages

How Far I'll Go:
Offline Goal-Conditioned Reinforcement Learning via
f-Advantage Regression