A simplified, highly flexible, commented and (hopefully) easy to understand implementation of self-play based reinforcement learning based on the AlphaGo Zero paper (Silver et al). It is designed to be easy to adopt for any two-player turn-based adversarial game and any deep learning framework of your choice. A sample implementation has been provided for the game of Othello in PyTorch.
To use a game of your choice, subclass the classes in Game.py
and NeuralNet.py
and implement their functions. Example implementations for Othello can be found in othello/OthelloGame.py
and othello/NNet.py
.
Coach.py
contains the core training loop and MCTS.py
performs the Monte Carlo Tree Search. The parameters for the self-play can be specified in main.py
. Additional neural network parameters are in othello/NNet.py
(cuda flag, batch size, epochs, learning rate etc.).
To start training a model for Othello:
python main.py
We trained a model for 6x6 Othello (~80 iterations, 100 episodes per iteration and 25 MCTS simulations per turn). This took about 3 days on an NVIDIA Tesla K80. The pretrained model can be found in pretrained_models/
. You can play a game against it using pit.py
. Below is the performance of the model against a random and a greedy baseline with the number of iterations.
A concise description of our algorithm can be found here.
Thanks to pytorch-classification and progress.