Skip to content

Latest commit

 

History

History
47 lines (43 loc) · 15.7 KB

algorithms.md

File metadata and controls

47 lines (43 loc) · 15.7 KB

Available algorithms

: thoroughly-tested. In many cases, we verified against known values and/or reproduced results from papers.

~: implemented but lightly tested.

X: known problems; please see github issues.

Algorithms Category Reference Status
Information Set Monte Carlo Tree Search (IS-MCTS) Search Cowley et al. '12 ~
Minimax (and Alpha-Beta) Search Search Wikipedia1, Wikipedia2, Knuth and Moore '75
Monte Carlo Tree Search Search Wikipedia, UCT paper, Coulom '06, Cowling et al. survey
Lemke-Howson (via nashpy) Opt. Wikipedia, Shoham & Leyton-Brown '09
ADIDAS Opt. Gemp et al '22 ~
Sequence-form linear programming Opt. Koller, Megiddo, and von Stengel '94,
Shoham & Leyton-Brown '09
Counterfactual Regret Minimization (CFR) Tabular Zinkevich et al '08, Neller & Lanctot '13
CFR against a best responder (CFR-BR) Tabular Johanson et al '12
Exploitability / Best response Tabular Shoham & Leyton-Brown '09
External sampling Monte Carlo CFR Tabular Lanctot et al. '09, Lanctot '13
Fixed Strategy Iteration CFR (FSICFR) Tabular Neller & Hnath '11 ~
Mean-field Ficticious Play for MFG Tabular Perrin et. al. '20 ~
Online Mirror Descent for MFG Tabular Perolat et. al. '21 ~
Outcome sampling Monte Carlo CFR Tabular Lanctot et al. '09, Lanctot '13
Q-learning Tabular Sutton & Barto '18
SARSA Tabular Sutton & Barto '18
Policy Iteration Tabular Sutton & Barto '18
Restricted Nash Response (RNR) Tabular Johanson et al '08 ~
Value Iteration Tabular Sutton & Barto '18
Advantage Actor-Critic (A2C) RL Mnih et al. '16
Deep Q-networks (DQN) RL Mnih et al. '15
Ephemeral Value Adjustments (EVA) RL Hansen et al. '18 ~
AlphaZero (C++/LibTorch) MARL Silver et al. '18
AlphaZero (Python/TF) MARL Silver et al. '18
Deep CFR MARL Brown et al. '18
Exploitability Descent (ED) MARL Lockhart et al. '19
(Extensive-form) Fictitious Play (XFP) MARL Heinrich, Lanctot, & Silver '15
Neural Fictitious Self-Play (NFSP) MARL Heinrich & Silver '16
Neural Replicator Dynamics (NeuRD) MARL Omidshafiei, Hennes, Morrill, et al. '19 X
Regret Policy Gradients (RPG, RMPG) MARL Srinivasan, Lanctot, et al. '18
Policy-Space Response Oracles (PSRO) MARL Lanctot et al. '17
Q-based ("all-actions") Policy Gradient (QPG) MARL Srinivasan, Lanctot, et al. '18
Regression CFR (RCFR) MARL Waugh et al. '15, Morrill '16
Rectified Nash Response (PSRO_rn) MARL Balduzzi et al. '19 ~
α-Rank Eval. / Viz. Omidhsafiei et al. '19, arXiv
Replicator / Evolutionary Dynamics Eval. / Viz. Hofbaeur & Sigmund '98, Sandholm '10