This repository provides implementation for experiments as described in the paper:
Learning from an Exploring Demonstrator: Optimal Reward Estimation for Bandits
Wenshuo Guo, Kumar Krishna Agrawal, Aditya Grover, Vidya Muthukumar, Ashwin Pananjady
pip install -e . -r requirements.txt
Experiments are organized in notebooks. Please see example usage
jupyter-lab two_arms.ipynb
To cite this paper:
@article{guo2021exploringdemonstrator,
title={Learning from an Exploring Demonstrator: Optimal Reward Estimation for Bandits},
author={Wenshuo Guo, Kumar Krishna Agrawal, Aditya Grover, Vidya Muthukumar, Ashwin Pananjady},
journal={arXiv preprint arXiv:2106.14866}.
year={2021}
}
Notebook for the error-landscape in the battery dataset builds significantly on https://github.com/chueh-ermon/battery-fast-charging-optimization