Shogi reinforcement learning by AlphaGo Zero methods.
- This repo simply translate the @Zeta36's chess implementation to shogi one.
- As mentioned by @Zeta36, the self-play is very slow. Moreover, shogi needs much more computing resources.
This repo may be only useful for just learning reinforcement learning - There might be many bugs in this repo since I am a beginner of reinforcement learning. I am happy if you report it as an issue.
- White is "Sente" (first move) in this repo.
- Python 3.6.3
- tensorflow-gpu: 1.3.0
- Keras: 2.0.8
python src/shogi_zero/run.py sl
I put many kif files in /scripts/kif/
for many people to start learning easily although it makes this repo heavy.
This AlphaGo Zero implementation consists of three workers: self
, opt
and eval
.
self
is Self-Play to generate training data by self-play using BestModel.opt
is Trainer to train model, and generate next-generation models.eval
is Evaluator to evaluate whether the next-generation model is better than BestModel. If better, replace BestModel.
data/model/model_best_*
: BestModel.data/model/next_generation/*
: next-generation models.data/play_data/play_*.json
: generated training data.logs/main.log
: log file./scripts/kif/
: kif files for supervised learning
If you want to train the model from the beginning, delete the above directories.
pip install -r requirements.txt
If you want to use GPU, follow these instructions to install with pip3.
Make sure Keras is using Tensorflow and you have Python 3.6.3+. Depending on your environment, you may have to run python3/pip3 instead of python/pip.
For training model, execute Self-Play
, Trainer
and Evaluator
.
Note: Make sure you are running the scripts from the top-level directory of this repo, i.e. python src/shogi_zero/run.py opt
, not python run.py opt
.
python src/shogi_zero/run.py self
When executed, Self-Play will start using BestModel. If the BestModel does not exist, new random model will be created and become BestModel.
--new
: create new BestModel--type mini
: use mini config for testing, (seesrc/shogi_zero/configs/mini.py
)
python src/shogi_zero/run.py opt
When executed, Training will start. A base model will be loaded from latest saved next-generation model. If not existed, BestModel is used. Trained model will be saved every epoch.
--type mini
: use mini config for testing, (seesrc/shogi_zero/configs/mini.py
)--total-step
: specify total step(mini-batch) numbers. The total step affects learning rate of training.
python src/shogi_zero/run.py eval
When executed, Evaluation will start. It evaluates BestModel and the latest next-generation model by playing about 200 games. If next-generation model wins, it becomes BestModel.
--type mini
: use mini config for testing, (seesrc/shogi_zero/configs/mini.py
)
python src/shogi_zero/run.py sl