board-game-ml

An exploration of machine learning techniques (mostly reinforcement learning) applied to simple games.

How to run

There's a very simple cli used to run the games. make run is a thin alias around pipenv run python src, which runs the cli defined in src/__main__.py. The cli then calls the main() function in the corresponding game's run file. For example, make run ARGS="ttt" runs the main() function in src/games/tictactoe/run.py.

Some policy files are saved along with the repo, check each specific game for its corresponding policy files.

What's been done

ML methods

The following two methods currently use native python, without any fancy libraries or frameworks. It's inefficient, but forces me to consider the details.

Simple Q-learning, which attempts to track the whole q-table. As opposed to deep q-learning which uses a neural network to guess the best possible action given a state. Works well when rewards can be calculated per action.
Monte carlo learning. Works for episodic rewards, where the value of each state/action is primarily determined after the game is complete. Stores all the states that it's explored, and the associated value of each state.

The next two methods use a form of deep reinforcement learning with the introduction of neural networks.

Alpha Zero. Combines a neural network with monte carlo tree search so that the training is more optimized. Only needs to store the neural network weights (and self-play training examples, if preferred), so it's much more memory efficient than storing all the explored states.
Deep Q Learning. Similar to q-learning, but it replaces the q-table with a neural network, drastically reducing the memory usage.

Bayesian Optimization

I use bayesian optimization to tune my network parameters. It's not the most efficient since:

The parameter space can be pretty large, and I have limited resources/time to search the whole space. And all this time is spent not even to train the network.
Generally I need an objective set of training data to ensure that the objective function is accurate (minimizing the loss on a suboptimal set of training data is pointless). This is particularly tough for alpha zero cases where the network generates its own training data.

However, bayesian optimization does seem promising for training Digit Party, as I have an objective set of training data from running naive q-learning.

Games

A random walk game which scores points at the right bound and loses points at the left bound. This is trained with q-learning and a monte carlo method.
Tic-tac-toe. This game can be successfully trained using the monte carlo learning method, and with alpha zero.
Digit party. Though naive q-learning should be able to learn this game, it is intractable as the number of states is just too high. Deep q-learning should work better.
Ultimate tic tac toe. Works ok with simple Monte Carlo method but even better with Alpha Zero.

TODO

Games

Consider implementing the following games and train them using an appropriate method:

Tech features/infra

refactor simple q and monte-carlo learners to use composition rather than inheritance. been lazy to do this
clean up functions in Ultimate/TTT game classes. function names could be better, and ordering/structuring could be better
logging with verbose option
expanding and optimizing the CLI
organize types

References

TicTacToe with a simple Monte Carlo learning implementation
RL with TicTacToe
Ultimate TicTacToe with the AlphaZero algorithm
Monte Carlo Tree Search
AlphaZero tutorial
AlphaZero general
More Monte Carlo Tree Search
Original AlphaZero paper
Deep Q Learning

Technical Considerations

Typing

For bigger projects like these, I'm not a fan of how Python is dynamically typed. Tracking down errors in runtime is frustrating, and sometimes they don't even appear because of how flexible Python syntax can be, so it becomes an obscure bug. Plus, defining types lets me think clearly about the objects I'm handling, as well as the function pre/post-conditions.

However, typing in Python can only go so far. For example, it's tough to enforce the shape and type of values in an NDArray. So when we try to modify a nested value in the array, mypy will think it's an Any type and be unable to check subsequent usages of the nested value. This is especially apparent when implementing game logic that uses numpy arrays extensively. There are some libraries that can mitigate this to some extent, but none are perfect. I can dream about a beautifully statically typed landscape like Haskell but that likely won't happen in Python.

Name		Name	Last commit message	Last commit date
Latest commit History 126 Commits
src		src
test		test
.flake8		.flake8
.gitignore		.gitignore
LICENSE		LICENSE
Makefile		Makefile
Pipfile		Pipfile
Pipfile.lock		Pipfile.lock
README.md		README.md
mypy.ini		mypy.ini
pytest.ini		pytest.ini

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

board-game-ml

How to run

What's been done

ML methods

Bayesian Optimization

Games

TODO

Games

Tech features/infra

References

Technical Considerations

Typing

About

Releases

Packages

Languages

License

zhjngli/board-game-ml

Folders and files

Latest commit

History

Repository files navigation

board-game-ml

How to run

What's been done

ML methods

Bayesian Optimization

Games

TODO

Games

Tech features/infra

References

Technical Considerations

Typing

About

Topics

Resources

License

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages