Skip to content

A clean and simple general purpose implementation of a self-play algorithm based on AlphaGo Zero

Notifications You must be signed in to change notification settings

tonyxia2016/alpha-zero-general

 
 

Repository files navigation

Alpha Zero General (any game, any framework!)

A simplified, highly flexible, commented and (hopefully) easy to understand implementation of self-play based reinforcement learning based on the AlphaGo Zero paper (Silver et al). It is designed to be easy to adopt for any two-player turn-based adversarial game and any deep learning framework of your choice. A sample implementation has been provided for the game of Othello in PyTorch.

To use a game of your choice, subclass the classes in Game.py and NeuralNet.py and implement their functions. Example implementations for Othello can be found in othello/OthelloGame.py and othello/NNet.py.

Coach.py contains the core training loop and MCTS.py performs the Monte Carlo Tree Search. The parameters for the self-play can be specified in main.py. Additional neural network parameters are in othello/NNet.py (cuda flag, batch size, epochs, learning rate etc.).

To start training a model for Othello:

python main.py

Experiments

We trained a model for 6x6 Othello (~80 iterations, 100 episodes per iteration and 25 MCTS simulations per turn). This took about 3 days on an NVIDIA Tesla K80. The pretrained model can be found in pretrained_models/. You can play a game against it using pit.py. Below is the performance of the model against a random and a greedy baseline with the number of iterations. alt tag

A concise description of our algorithm can be found here.

Contributors and Credits

Thanks to pytorch-classification and progress.

About

A clean and simple general purpose implementation of a self-play algorithm based on AlphaGo Zero

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Languages

  • Python 100.0%