# Bulding a simple chess engine with reinforcement learning
# Outline
- [Step 1: Importing Essential Libraries](#1)

<a name="1"></a>
## Step 1: Importing Essential Libraries

Before building our chess engine, we need to set up the necessary tools. This step involves importing various Python libraries that will help with numerical operations, reinforcement learning, deep learning, chess logic, and data handling.
Code Implementation:


In [None]:
import numpy as np
import gym
from gym import spaces
from collections import deque, namedtuple
import tensorflow as tf
from tensorflow.keras import Sequential
from tensorflow.keras.layers import Dense, Input
from tensorflow.keras.losses import MSE
from tensorflow.keras.optimizers import Adam
import chess
import random

### Explanation of Imported Libraries:
- `NumPy` (numpy) – Provides efficient array handling, useful for representing chess positions numerically.
- `Gym` (gym) – Reinforcement learning environment, potentially helping with training a chess AI.
- `Spaces` (gym.spaces) – Defines action and observation spaces, essential for AI-driven move decisions.
- `Collections` (collections.deque, collections.namedtuple) – Used for structured data storage, such as move history.
- `TensorFlow` (tensorflow) – Enables deep learning capabilities for position evaluation and move selection.
- `Keras` (tensorflow.keras) – Simplifies neural network construction using layers like Dense and Input.
- `Loss Functions` (tensorflow.keras.losses) – MSE (Mean Squared Error) helps quantify the difference between predicted and actual outcomes.
- `Optimizers` (tensorflow.keras.optimizers) – Adam optimizer helps adjust model weights for better learning.
- `Python-Chess` (chess) – A chess-specific library for board representation, legal moves, and game management.
- `Random (random)` – Introduces randomness, useful for move selection or initializing weights in deep learning.
  
### Why This Step is Important?
Setting up the correct libraries ensures that our chess engine has the right tools for computation, game logic, and AI-based learning. These libraries form the backbone of the project, enabling us to process moves efficiently, train models, and interact with the chessboard.

## Step 2: Creating the Chess Environment
Now that we've set up our imports, we need to design a custom chess environment using OpenAI Gym. This environment will allow reinforcement learning agents to interact with the chessboard, make moves, and receive rewards.


In [None]:
class ChessEnv(gym.Env):
    def __init__(self):
        super(ChessEnv, self).__init__()
        self.board = chess.Board()
        # The action space: number of possible legal moves
        self.action_space = spaces.Discrete(4672)  # UCI move representation (a reasonable upper bound for moves)
        
        # Observation space: an 8x8x12 board representation (binary encoding of pieces)
        self.observation_space = spaces.Box(low=0, high=1, shape=(8, 8, 12), dtype=np.int8)
        self.move_count = 0
        self.reward = 0
        
    def reset(self, agent_color=chess.WHITE):
        self.agent_color = agent_color
        self.board.reset()
        self.move_count = 0
        self.prev_eval = 0.0  # Initial evaluation
        self.reward = 0
        return self._get_obs()

    def step(self, action):
        move = self._action_to_move(action)
        self.board.push(move)
        self.move_count += 1
        done = self.board.is_game_over()
        reward = self._evaluate_board(done)
        return self._get_obs(), self.reward, done, {}




    def _get_obs(self):
        """ Converts the board state to an observation (8x8x12 array) """
        obs = np.zeros((8, 8, 12), dtype=np.int8)

        # Encode each piece type on the board
        for i in range(8):
            for j in range(8):
                piece = self.board.piece_at(i * 8 + j)
                if piece:
                    piece_type = piece.piece_type
                    color = piece.color
                    channel = (piece_type - 1) if color else (piece_type + 5)
                    obs[i, j, channel] = 1

        return obs

    def _action_to_move(self, action):
        """ Converts the action (an integer) into a UCI move """
        move = list(self.board.legal_moves)[action]
        return move

    def _evaluate_board(self, done):
        reward = self._get_material_score()  # White's perspective
        # Movement penalty scaled exponentially, offset by turn (White = 0, Black = 1)
        penalty = 0.01 * (1.01 ** self.move_count)
        reward -= penalty * ((-1) ** self.board.turn)
        # Incremental reward
        self.reward += reward - self.prev_eval
        self.prev_eval = reward
        # Add endgame reward
        if done:
            result = self.board.result()
            if result == "1-0":
                self.reward += 100 if self.agent_color == chess.WHITE else -100
            elif result == "0-1":
                self.reward += -100 if self.agent_color == chess.WHITE else 100
            else:
                self.reward = 0  # Draw
            
        return self.reward



    def render(self, mode='human'):
        """ Renders the current board state (in a human-readable format) """
        print(self.board)
    def _get_material_score(self):
        """Calculates material balance from White's perspective"""
        piece_values = {
            chess.PAWN: 1,
            chess.KNIGHT: 3,
            chess.BISHOP: 3,
            chess.ROOK: 5,
            chess.QUEEN: 9
        }
    
        score = 0
        for square in chess.SQUARES:
            piece = self.board.piece_at(square)
            if piece:
                value = piece_values.get(piece.piece_type, 0)
                score += value if piece.color == chess.WHITE else -value
    
        return score

### Explanation of Key Components
- `Gym Environment` (ChessEnv): Defines interaction with the chess game, including move execution and state tracking.
- `Action Space` (spaces.Discrete(4672)): Represents possible legal moves.
- `Observation Space` (spaces.Box): Encodes board positions in an 8x8x12 array.
- `Reward Mechanism`: Evaluates material balance and position, incentivizing better moves.
- `Game Reset` (reset()): Initializes a fresh game state for training.
- `Move Execution` (step()): Allows the agent to make moves and updates the board.
- `Rendering` (render()): Prints the board visually.
### Why This Step is Important?
This environment sets up a structured way to train a reinforcement learning agent by defining game mechanics, move selection, and reward evaluation. It's crucial for allowing AI to learn strategies through gameplay.


## Step 3: Testing the Chess Environment and Reward System
Before training our AI, we need to verify that:
- The environment correctly applies legal moves.
- Rewards are calculated properly based on board evaluation.
- The game progresses logically between White and Black.


In [None]:
x = ChessEnv()
n = 0
done = False
reward = 0
x.reset(chess.WHITE)
i=0
while not done:
    m = reward
    if x.board.turn:
        print("white's turn")
    else:
        print("black's turn")
    legal_moves = list(x.board.legal_moves)
    action = random.randint(0, len(legal_moves) - 1)
    obs, reward, done, _ = x.step(action)
    print("Reward: ", round(reward,6))
    print("difference: ",round(reward - m,6) )
    #print(x.board)
    print("===")
    i+=1
print ("number of turns:", i/2)

### What This Code Does
- Initializes the chess environment (ChessEnv()).
- Runs a loop until the game ends (done == False).
- Selects and applies a random legal move on each turn.
- Prints the turn information (White or Black).
- Displays the calculated reward to verify that reward logic works.
### Why This Step is Important?
This test ensures that:
- The environment correctly recognizes legal moves.
- The agent receives meaningful rewards based on board evaluation.
- The game transitions properly between turns.
- The reward system provides useful feedback before integrating AI training.


In [None]:
x.board