Simple pure python q-learning algorithm implementation
This repository just a sample was written to my presentation at ML Meetup. You may take ideas, but it's not intended to run in production environment.
Is a kind of unsupervised ML where agent communicating with environment and learns to take actions to earn best result Optionally some curiosity behavour may be implemented to give agent a chance to explore best strategy.
Q-Learning is a alghoritm (aka function) creating policy which tells agent what action can be performed in each environment state. Because agent can't get score (and recaculate policy) immediatly after each action (move) it should "remember" all moves and states in the round and calculate gradient of scores
TicTacToe is a pure python implementation of Q-Learnig alghoritm. Used just for POC.
Because of not big number of unique combinations it stores all seen "positions" in python dictionary.
The key is a features vector build out of 3x3 squire board and packed to single integer.
The value is a gradiented score related to board state
Some curiosity implemented, so algoritm can "improve" it's strategy during learning
To test algorithm simple game.py script included in this project
Installing as simple as install new uwsgi process if you know exactly what you do. Check ttt.ini for more info.
Warning change user/group of running process. Running as root may make you system potencially vulnerable
- Clone repository
- Point nginx /tictactoe location to the root of cloned repository
- Configure /tictactoe/ai location as proxy_pass to http://127.0.0.1:9092
- Make data directory writebale for the script user
- Run run.sh in tmux session or any other way you like
- Open in browser http://[yourdomain|localhost]/tictactoe
Note: Google login may not work from your domain till you change google-signin-client_id meta in index.html
Nginx configuration example
location ~/tictactoe
{
root /var/www;
index index.html;
}
location ~/tictactoe/ai*
{
error_log /var/log/nginx/tictactoe.error.log error;
proxy_set_header X-Real-IP $remote_addr;
proxy_pass http://127.0.0.1:9092;
}
We played with this alghoritm couple of hours. It was very impressive to see how algo learns and becomes a real player from kiddy randomizer After about 1000 rounds the algo plays good enouph but still has many loses.
Now it plays much better. You can try it by your self and Play DEMO
MIT license . You may use it for any purpose without warranty