### Variational AutoEncoder Chess Position Generator

##### Inspiration
* Recently, I have been reading up about generative models, and one of them that caught my eye was the VAE.
* It allows you to generate new data that is similar to your training data.
* At the same time, I am interested in chess and have enjoyed solving chess puzzles for quite awhile.
* However, the premise of a chess puzzle is that the player knows that there exists a optimal move / sequence of moves that provides the player an advantage.
* This helps the player to improve in terms of tactics and pattern recognition, but in most cases when playing a game of chess, we do not know if there exists an optimal solution.
* This introduces the idea of an anti-puzzle, where the premise is now that the chess position provided may have an optimal solution, or the "solution" is to play a move that maintains the status-quo.
* With the VAE, we can train it with a training set of legal chess positions, and have it output more chess positions.
* Since the VAE would not have any idea if the chess position has an optimal solution or not, it is perfect for creating "anti-puzzle" solutions.
* Furthermore, chess is a "constrained" game, where the rules are clear and we can check if the position generated by the VAE is a legal position or not.
* For this model, the goal is to simply generate new (legal) chess positions.

In [21]:
import math
import os
import numpy as np
import pandas as pd
import tensorflow as tf
from keras import backend as K
import matplotlib.pyplot as plt
import keras
from scipy.stats import norm
from keras import layers, models, metrics, losses, optimizers
from keras.callbacks import EarlyStopping
from sklearn.preprocessing import StandardScaler, MinMaxScaler, RobustScaler
from sklearn.model_selection import KFold, train_test_split

##### Data Collection

* The easiest way to obtain chess is positions is from my own games.
* I exported move data from some chess games that I have played online in Lichess, which comes in a .pgn file.
* From this file, we can get the move orders for the games that I have exported, from which I can deduce the chess positions.
* For this, I used the python-chess library, which helps to deduce FEN positions from PGN move list
* Once we get the FEN positions, we can derive the values for the input data we wish to parse into our model

##### Data Representation
* Although this doesn't give the chess positions directly, we can manipulate it into a form that works for the VAE.
* The current idea is to have a 8 x 8 x 12 matrix, which means to say each of the 12 pieces (K, Q, R, B, N, P, k, q, r, b, n, p) each have their own 8 x 8 chessboard that denotes their position.
* We can generate these as all chess games I exported start from the standard position, and we can denote the piece at a certain position with a 1 (i.e. 0 marks that the piece is not at that position).
* This coincidentally is a perfect data set for generating anti-puzzles as it is formed from the sequence of moves of a game, of which not all positions have an optimal solution.

In [22]:
DIR = os.path.dirname(__vsc_ipynb_file__)
fen_data_path = os.path.join(DIR, "data", "fen-data.txt")

In [23]:
PIECE_TO_IDX = dict([[c, i] for i, c in enumerate('KQRBNPkqrbnp')])

def generate_matrix_from_fen(fen_string):
    # initialise board
    board = [[[0 for k in range(12)] for j in range(8)] for i in range(8)]

    # process FEN string
    board_string = fen_string.split(" ")[0].split("/")
    row, col = 0, 0
    for board_row in board_string:
        for row_item in board_row:
            if row_item.isnumeric():
                col += int(row_item)
            else:
                board[row][col][PIECE_TO_IDX[row_item]] = 1
                col += 1
        row += 1
        col = 0
    return board

In [24]:
data = []
with open(fen_data_path) as file:
    for line in file:
        data.append(np.array(generate_matrix_from_fen(line)))
data = np.array(data)
print(data.shape)

(18175, 8, 8, 12)
