Provably Good Batch Reinforcement Learning Without Great Exploration

Code for the algorithm Pessimistic Q-learning (PQL) in our paper Provably Good Batch Reinforcement Learning Without Great Exploration. Pessimistic Q-learning (PQL) is a deep batch reinforcement learning algorithm that is based on the pessimistic Q iteration (PQI) and policy iteration (PPI) algorithm with provable guarantees in the paper. Please see the paper for more details.

This code is built on top of the implementation of BCQ and implementation of BEAR. The method is tested in 3 MuJoCo continuous control tasks in the D4RL benchmark. Thus it requires to install MuJoCo and D4RL first to run this code.

In order to run the experiment of PQL in the paper, first a script need to load the dataset from D4RL and transform it to our data loader's form:

python load_dataset.py

Then please use:

python train.py --env=Hopper-v2 --dataset=d4rl-hopper-medium-v0
python train.py --env=HalfCheetah-v2 --dataset=d4rl-halfcheetah-medium-v0 --ql_noise=0.0 
python train.py --env=Walker2d-v2 --dataset=d4rl-walker2d-medium-v0 --ql_noise=0.0

If you use this code in your research, please cite our paper:

Citation

@misc{liu2020provably,
      title={Provably Good Batch Reinforcement Learning Without Great Exploration}, 
      author={Yao Liu and Adith Swaminathan and Alekh Agarwal and Emma Brunskill},
      year={2020},
      eprint={2007.08202},
      archivePrefix={arXiv},
      primaryClass={cs.LG}
}

Name		Name	Last commit message	Last commit date
Latest commit History 4 Commits
.gitattributes		.gitattributes
.gitignore		.gitignore
BCQ.py		BCQ.py
BEAR.py		BEAR.py
DDPG.py		DDPG.py
LICENSE		LICENSE
README.md		README.md
load_dataset.py		load_dataset.py
train.py		train.py
utils.py		utils.py

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

.gitattributes

.gitattributes

.gitignore

.gitignore

BCQ.py

BCQ.py

BEAR.py

BEAR.py

DDPG.py

DDPG.py

LICENSE

LICENSE

README.md

README.md

load_dataset.py

load_dataset.py

train.py

train.py

utils.py

utils.py

Repository files navigation

Provably Good Batch Reinforcement Learning Without Great Exploration

Citation

About

Releases

Packages

Languages

License

yaoliucs/PQL

Folders and files

Latest commit

History

Repository files navigation

Provably Good Batch Reinforcement Learning Without Great Exploration

Citation

About

Resources

License

Stars

Watchers

Forks

Languages