# Rummikub AI

This project implements a learning agent for Rummikub using the **MaskablePPO** version of the **PPO** algorithm. The environment was implemented in **Python**, and the game engine was implemented in **C++** for performance reasons. The program allows for training the model for multiple players with different block ranges and starting numbers. The model can also be tested in a game with both a human and a computer.

This notebook shows:
- how to train a new model
- how to train an existing model
- how to test the model
- conclusions and problems

We need to compile the program and import all the necessary modules

In [None]:
!pip install -r requirements.txt
!pip install -e .

import time

In this presentation I will train the model for a limited version of Rummikub (with tiles 1 to 8 and 9 to start with)

We start by calling the program with the appropriate flags, which will save our newly created model

In [2]:
!python train.py --mode train --players 2 --blocks_range 8 --blocks_start 9 --total_games 10 --num_envs 4 --n_steps 128 --save_path models/demo

=== Training for around 10 games on cpu ===
Using cpu device
-----------------------------
| time/              |      |
|    fps             | 0    |
|    iterations      | 1    |
|    time_elapsed    | 3501 |
|    total_timesteps | 512  |
-----------------------------
Model saved to models/demo.zip


Once the model is trained, we can check its performance

In [3]:
!python train.py --mode test --players 2 --blocks_range 8 --blocks_start 9 --total_games 1 --model_path models/demo --render 1

Wrapping the env with a `Monitor` wrapper
Wrapping the env in a DummyVecEnv.
Loaded model from models/demo
=== Test 1 game on cpu ===

------ Round 1 ------

--- Player 0 ---
Player 0 hand: [R2, B7, K2, K4, B1, B7, K7, R7, R5]
Player 1 hand: [Y5, Y7, B5, K6, B8, K5, B2, B8, K7]
Table: []

--- Player 0 ---
Player 0 hand: [B1, R2, K2, K4, R5, B7]
Player 1 hand: [Y5, Y7, B5, K6, B8, K5, B2, B8, K7]
Table: [[R7, B7, K7]]

--- Player 1 ---
Player 0 hand: [B1, R2, K2, K4, R5, B7]
Player 1 hand: [Y5, Y7, B5, K6, B8, K5, B2, B8, K7]
Table: [[R7, B7, K7]]

--- Player 1 ---
Player 0 hand: [B1, R2, K2, K4, R5, B7]
Player 1 hand: [B2, B5, Y5, Y7, B8, B8]
Table: [[K5, K6, K7], [R7, B7, K7]]

--- Player 1 ---
Player 0 hand: [B1, R2, K2, K4, R5, B7]
Player 1 hand: [B2, B5, Y5, B8, B8]
Table: [[K5, K6, K7], [R7, B7, Y7, K7]]

------ Round 2 ------

--- Player 0 ---
Player 0 hand: [B1, R2, K2, K4, R5, B7]
Player 1 hand: [B2, B5, Y5, B8, B8]
Table: [[K5, K6, K7], [R7

Now we can change some parameters (for example, a reward system) and further train the model on them

In [4]:
!python train.py --mode train --players 2 --blocks_range 8 --blocks_start 9 --total_games 20 --num_envs 2 --n_steps 64 --model_path models/demo --save_path models/demo

Loaded model from models/demo
=== Training for around 20 games on cpu ===
----------------------------
| time/              |     |
|    fps             | 7   |
|    iterations      | 1   |
|    time_elapsed    | 32  |
|    total_timesteps | 256 |
----------------------------
Model saved to models/demo.zip


We already have the model trained, so now we can try to play with it, but I can't show it in jupyter notebook, because the output is read-only

In [None]:
!python train.py --mode play --players 2 --blocks_range 8 --blocks_start 9 --model_path models/demo

Finally, we can perform a performance test for the obtained model

In [6]:
start = time.time()
!python train.py --mode test --players 2 --blocks_range 8 --blocks_start 9 --total_games 10 --model_path models/demo --render 0
end = time.time()
print(f"Total time for 100 games: {end - start:.2f} seconds")
print(f"Average time per game: {(end - start)/10:.4f} seconds")

Wrapping the env with a `Monitor` wrapper
Wrapping the env in a DummyVecEnv.
Loaded model from models/demo
=== Test 10 games on cpu ===

===== 10 games finished =====
Total rounds: 10
Average game time: 3.9810000000000003
Player 0 - Total wins: 4, Total moves: 341, Total tiles placed: 185
Player 1 - Total wins: 6, Total moves: 330, Total tiles placed: 188
Total time for 100 games: 42.53 seconds
Average time per game: 4.2528 seconds


As we can see, the model runs very quickly and is able to play the entire game in a fraction of a second. However, the problem begins for games with a larger range of tiles, as the number of possible combinations grows even faster than exponentially. The engine must find all possible moves in a given setup to calculate the mask. Several optimizations are applied to speed up the entire process, counting only the setups that have a chance of being correct and counting different possibilities in parallel across different threads, but the entire process is still very demanding on the computer. Therefore, training a model that can learn to play the full pool of 13 different tiles plus jokers requires a very powerful computer and a significant amount of time.