ESCHER: Eschewing Importance Sampling in Games by Computing a History Value Function to Estimate Regret #30
francelico
started this conversation in
Presented Papers
Replies: 0 comments
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
-
Link: https://arxiv.org/abs/2206.04122 (new version soon)
Authors: Stephen McAleer (speaker), Gabriele Farina, Marc Lanctot, Tuomas Sandholm
Abstract: Recent techniques for approximating Nash equilibria in very large games leverage neural networks to learn approximately optimal policies (strategies). One promising line of research uses neural networks to approximate counterfactual regret minimization (CFR) or its modern variants. DREAM, the only current CFR-based neural method that is model free and therefore scalable to very large games, trains a neural network on an estimated regret target that can have extremely high variance due to an importance sampling term inherited from Monte Carlo CFR (MCCFR). In this paper we propose an unbiased model-free method that does not require any importance sampling. Our method, ESCHER, is principled and is guaranteed to converge to an approximate Nash equilibrium with high probability in the tabular case. We show that the variance of the estimated regret of a tabular version of ESCHER with an oracle value function is significantly lower than that of outcome sampling MCCFR and tabular DREAM with an oracle value function. We then show that a deep learning version of ESCHER outperforms the prior state of the art -- DREAM and neural fictitious self play (NFSP) -- and the difference becomes dramatic as game size increases.
Notes: Scheduled for 20/10/2022
Extra material:
Beta Was this translation helpful? Give feedback.
All reactions