The bandit.py
module includes several simple multi-arm bandit
environments.
The policies.py
module implements a number of standard multi-arm bandit
policies.
-
Bandits
- MAB: Bernoulli, Multinomial, and Gaussian payout distributions
- Contextual MAB: Linear contextual bandits
-
Policies
- Epsilon-greedy
- UCB1 (Auer, Cesa-Bianchi, & Fisher, 2002)
- Conjugate Thompson sampler for Bernoulli bandits (Thompson, 1933; Chapelle & Li, 2010)
- LinUCB (Li, Chu, Langford, & Schapire, 2010)