Learning from an Exploring Demonstrator: Optimal Reward Estimation for Bandits

This repository provides implementation for experiments as described in the paper:

Learning from an Exploring Demonstrator: Optimal Reward Estimation for Bandits
Wenshuo Guo, Kumar Krishna Agrawal, Aditya Grover, Vidya Muthukumar, Ashwin Pananjady

Requirements

pip install -e . -r requirements.txt

Experiments

Experiments are organized in notebooks. Please see example usage

jupyter-lab two_arms.ipynb

To cite this paper:

@article{guo2021exploringdemonstrator,
    title={Learning from an Exploring Demonstrator: Optimal Reward Estimation for Bandits},
    author={Wenshuo Guo, Kumar Krishna Agrawal, Aditya Grover, Vidya Muthukumar, Ashwin Pananjady},
    journal={arXiv preprint arXiv:2106.14866}.
    year={2021}
}

Acknowledgements

Notebook for the error-landscape in the battery dataset builds significantly on https://github.com/chueh-ermon/battery-fast-charging-optimization

Name		Name	Last commit message	Last commit date
Latest commit History 2 Commits
bax		bax
notebooks		notebooks
README.md		README.md
requirements.txt		requirements.txt
setup.py		setup.py

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

bax

bax

notebooks

notebooks

README.md

README.md

requirements.txt

requirements.txt

setup.py

setup.py

Repository files navigation

Learning from an Exploring Demonstrator: Optimal Reward Estimation for Bandits

Requirements

Experiments

Acknowledgements

About

Releases

Packages

Contributors 2

Languages

wenshuoguo/inverse-bandit-code-release

Folders and files

Latest commit

History

Repository files navigation

Learning from an Exploring Demonstrator: Optimal Reward Estimation for Bandits

Requirements

Experiments

Acknowledgements

About

Resources

Stars

Watchers

Forks

Languages