<a href="https://colab.research.google.com/github/talhavawda/tic-tac-toe-monte-carlo/blob/master/TicTacToe_MonteCarlo.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

# Tic Tac Toe using Monte Carlo Tree Search

 ## COMP703 (ARTIFICIAL INTELLIGENCE) 2021 ASSIGNMENT 1
 
 Team: A1GroupWC

 Team Members:
 -  Azhar Mohamed&ensp;&ensp;&ensp;&ensp;&ensp;&ensp;&ensp;(218006491)
 - Yashlin Naidu &ensp;&ensp;&ensp;&ensp;&ensp;&ensp;&ensp;&ensp;&ensp;(216019492)
 - Ricardo Pillay &ensp;&ensp;&ensp;&ensp;&ensp;&ensp;&ensp;&ensp;&ensp;(218009114)
 - Ahmad Jawaad Shah&ensp;&ensp;&ensp;(218029400)
 - Talha Vawda &ensp;&ensp;&ensp;&ensp;&ensp;&ensp;&ensp;&ensp;&ensp;&ensp;(218023210)

<br>

## Acknowledgements and References
- AIMA
- COMP703 Lectures and Notes
- https://www.python.org/dev/peps/pep-0008/ (PEP 8 -- Style Guide for Python Code)
- Python: check if list is empty (https://stackoverflow.com/questions/53513/how-do-i-check-if-a-list-is-empty)



# Introduction and Background
Our variant on Tic Tac Toe is one that is aimed to make the game more interesting. The original game involves of lining up 3 of the same type of token (X or O) on a row, column or diagonal, in a 3 by 3 grid in order to win. There are 3 types of tokens that are used in our variant: "X","Y" and "S". The X and Y tokens are the player tokens (to be used by each of the 2 players) while the S tokens are universal tokens that are placed randomly on the board before the game commences. The S token may be used by either player as a their own in order to allow themselves to win the game faster. Measures have been taken to prevent the S tokens from giving an unfair advantage to 1 player. These include: not allowing S tokens to be placed in the same row and column, and also ensuring no to S tokens will be more less than 2 spaces away from each other on the diagonals. Our variant allows the game to played on the following square grid sizes: 4 by 4, 5 by 5, 6 by 6 and 7 by 7. Furthermore for a 4 by 4 grid you are only required to line up 3 of the same tokens on a row, column or diagonal to win. Similary, for a 5 by 5 and 6 by 6 grid, you are only required to line up 4 tokens and with a 7 by 7, you must line up 5 tokens to win. The tokens one lines up to win may consist of only 1 type of player token (X or Y) or 1 type of player token and an S token.


# Imports and constants
This cell contains all the libraries that is needed to run this game. Furthermore, constant initializations are done here and will be used throughout the notebook

Please run all cells in order (on your first run) so that classes that  are used later on are 'compiled' and accessible/existing before they are used

In [68]:
"""
    Imports
"""
import numpy as np
import math
import copy
import random
 
 
"""
    Constants
"""
# 'C' is used in the UCT formula
# Theoretically, C should be set to sqrt(2) but play around with different values to see which works best
C = 2

# Working with tuples
listTuples = list()
listTuples.append((0,1))
listTuples.append((0,2))
listTuples.append((0,3))
print(listTuples[0])
print(listTuples[0][0], "\t", listTuples[0][1])
print(listTuples[1][0], "\t", listTuples[1][1])
print(listTuples[2][0], "\t", listTuples[2][1])

if listTuples:
    print("not empty")
else:
    print("empty")

(0, 1)
0 	 1
0 	 2
0 	 3
not empty


# State Class

Represents a State of the Tic Tac Toe game


In [69]:
class State:
    """
        A Tic Tac Toe state for the Monte Carlo Tree Search Algorithm
        A State of the game is a specific board scenario, along with the player whose turn it is to play next
    """


    def __init__(self, board=None, turn=None, move=None):
        """
            Constructor for the initial creation of a Tic Tac Toe State.
            :param move: the move (a 2-tuple indicating the square) that was
            performed on the previous state to reach this state by the player -turn
        """

        if board != None:
            # Create a board for an existing game state
            self.board = board
        else:
            # Create the board for the initial game state
            self.board = TicTacToeGame.initial_game_board()
            self.turn = TicTacToeGame.X # this variable represents the player whose turn it is to play on this current state (current player) | X plays first on an initial board

        # DEPRECATED
        #self.is_terminal_state = False
        #self.winner = TIC_TAC_TOE_GAME.IN_PROGRESS 
        #self.check_game_over() # This function updates self.is_terminal_state and self.winner 



        if turn != None:
            self.turn = turn # The player whose turn it is to currently play on this current state, either [Deprecated: "X" or "O"] TicTacToeGame.X or TicTacToeGame.O
        else:
            #Todo - determine whose turn it is by counting the number of X's and O's on the board. if X's==O's then X's turn otherwise O's
            num_x = (np.array(self.board) == TicTacToeGame.X).sum()
            num_y = (np.array(self.board) == TicTacToeGame.O).sum()
            if num_x > num_y:
              self.turn = TicTacToeGame.O
            else:
              self.turn = TicTacToeGame.X


        self.move = move

        
    def toggle_turn(self, turn):
        """
            ALT:
            if turn == TicTacToeGame.X:
                return TicTacToeGame.O
             else:
                return = TicTacToeGame.O
        """

        return -turn # since TicTacToeGame.X == -TicTacToeGame.O and vice versa
        

# TicTacToeGame class

In [74]:
class TicTacToeGame:
    """ 
        A class that represents our variant of the Tic Tac Toe game
        The purpose is to initialise a Tic Tac Toe game by creating a nxn square board
        and placing the neutral tokens on it, and to hold the n, k, and neutral_tokens values


        The board is being implemented as a nxn array with an index pair representing
        a square on the board and the element values being either [Deprecated: 'S', 'X', 'O', ""]
        2, 1, -1, 0 to represent a neutral token, the X token, the O token, and an empty square (where
        no token has been played yet)

        We are having a specific game class as we are demonstrating various examples of Tic Tac Toe game scenarios
        where different games could have a different size board, so we're going to need different game instances
        representing different games with their boards to pass into the MCTS algorithm

        An instance of this class (representing a game scenario) will be passed into the Monte Carlo Tree Search (MCTS) algorithm
    """

    # Constants of this Tic Tac Toe Game class
    X = 1  # Player/Token X
    EMPTY = 0  # An empty square (no tokens placed on it yet)
    O = -1  # Player/Token O
    S = 2  # Neutral Token
    IN_PROGRESS = 3  # Game in progress
    DRAW = 4  # Game over and game ended in draw

    # MCTS uses Max and Min players
    MAX = X # Player Max is the X token (as X plays first)
    MIN = O


    def __init__(self, n):
        """
            :param n: Dimensions of the board - Since the board is a square board, the board will be an nxn board

            k is the number of tokens needed in a row to win
            neutral_tokens is the number of neutral tokens ('S') that the board will contain
        """
        self.n = n
        
        self.k, self.neutral_tokens = self.get_k_and_nt(n)


       

    def get_k_and_nt(self,n):

        k = 0
        neutral_tokens = 0

        if n == 4:
            k = 3
            neutral_tokens = 1
        elif n in [5, 6]:
            k = 4
            neutral_tokens = 2
        elif n == 7:
            k = 5
            neutral_tokens = 3
        else:
            pass  # We've already covered all cases and validation of n will be done when getting input from the user
        
        return k, neutral_tokens



    def check_in_diagonal(self, board, row, col):
        """
            Check if there are 2 neutral tokens in the same diagonal

            :return: True if so else False 
        """
        temp_board = [[0 for i in range(self.n + 4)] for j in range(self.n + 4)]

        for i in range(self.n):
            for j in range(self.n):
                temp_board[i + 2][j + 2] = board[i][j]

        if (temp_board[row - 2 + 2][col - 2 + 2] == self.S or temp_board[row - 1 + 2][col - 1 + 2] == self.S or
                temp_board[row + 2 + 2][col + 2 + 2] == self.S or temp_board[row + 1 + 2][col + 1 + 2] == self.S or
                temp_board[row + 2 + 2][col - 2 + 2] == self.S or temp_board[row + 1 + 2][col - 1 + 2] == self.S or
                temp_board[row - 2 + 2][col + 2 + 2] == self.S or temp_board[row - 1 + 2][col + 1 + 2] == self.S):
            return True

        else:
            return False


    def print_2D_array(self,array):
        for i in range(len(array)):
            output = ''
            for j in range(len(array[i])):
                output = output + (str(array[i][j])) + ' '
            print(output)



    def initial_game_board(self):
        """
            Generate an initial board for the game with the neutral tokens (randomly) placed
             on it, and return it

            :return: An initial board of the game (the initial game state) as a 2D array
        """

        # TO IMPLEMENT - ensure no neutral tokens are on the same row, column, or diagonal as any other neutral tokens

        board = [[self.EMPTY for i in range(self.n)] for j in range(self.n)]  # initialise a nxn board using Python List Comprehension

        neutral_rows = []
        neutral_cols = []
        rand_row = 0
        rand_col = 0

        for m in range(self.neutral_tokens):
            randRow = random.randint(0, self.n - 1)  # For randint(a, b) both bounds are inclusive i.e. range is [a, b]
            randCol = random.randint(0, self.n - 1)

            # While the coordinates of this square is not valid for a neutral token, generate new coordinates
            # The condition of the while loop also checks if the current square position already has a neutral token placed on it
            while (self.check_in_diagonal(board, rand_row, rand_col) or (rand_row in neutral_rows) or (rand_col in neutral_cols)):
                rand_row = random.randint(0, self.n - 1)
                rand_col = random.randint(0, self.n - 1)

            board[rand_row][rand_col] =  self.S
            neutral_rows.append(rand_row)
            neutral_cols.append(rand_col)

        '''
        token = 0

        #Azhar's code
        check = False # flag for checking neutral tokens
        while not check:
            neutral_squares = list() # list of neutral token co-ordinates

            while token < self.neutral_tokens:
                randRow = random.randint(0, self.n-1) # For randint(a, b) both bounds are inclusive i.e. range is [a, b]
                randCol = random.randint(0, self.n-1)

                if board[randRow][randCol] != self.S: # ALT: board[randRow][randCol] == self.EMPTY
                    # Do check here if square is a valid place for a neutral token, and if it is then do the code below
                    board[randRow][randCol] = self.S
                    neutral_squares.append((randRow,randCol))    # (,) is a tuple                
                    token = token + 1

            #Azhar's code
            """
            if abs(neutral_squares[0][0]-neutral_values[0][1])!=abs(neutral_values[1][0]-neutral_values[1][1])!= # Check Diagonals abs(X,Y)
            abs(neutral_values[2][0]-neutral_values[2][1]) or neutral_values[0][0] != neutral_values[1][0] != # Check row
            != neutral_values[2][0] or neutral_values[0][1] != neutral_values[1][1] != neutral_values[2][1]:  # Check col
                check = True
            else:
                token = 0
            """
            '''
        
        return board



    def create_game_state(self, neutral_moves=None, players_moves=None):
        """
            Create a game situation based on the actions given

            :neutral_moves: a list of the squares on the board that the neutral tokens are placed on - each element
                is a tuple of size 2 indicating the position of the square.  the length of this list must be equal to self.neutral_tokens.
                If None then call self.initial_game_board() to randomly place the neutral tokens
            :players_moves: moves that have already been played by both players - a list of tuples. Each tuple is of size 3.
                For each tuple, the first element is the player whose turn it is (either X or Y), second element is the x coordinate of the square,
                and the third element is the y coordinate. Ensure the coordinates are valid board positions
            :return: a State of the game with its board populated with the specified actions/moved
        """

        turn = self.X # Default - X plays first


        if neutral_moves == None:
            board = self.initial_game_board()
        else:
            board = [[self.EMPTY for i in range(self.n)] for j in range(self.n)]  # initialise a nxn board using Python List Comprehension

            # Add neutral moves from neutral_moves
            for i in range(len(neutral_moves)):
                board[neutral_moves[i][0]][neutral_moves[i][1]] = self.S  # neutral_moves[i][0] is x-coordinate of that tuple and neutral_moves[i][1] is y-coordinate of that tuple

        # Add the moves of the players
        last_move = None
        if players_moves != None:
            for move in players_moves: # Each move is a 3-tuple   
                player, x_coord, y_coord = move
                board[x_coord][y_coord] = player
                last_move = move

            """code to determine the player whose turn it is to play next on this current state"""
            num_x = (np.array(board) == self.X).sum()
            num_y = (np.array(board) == self.O).sum()

            if num_x > num_y:
                turn = self.O
            else:
                turn = self.X

        return State(board, turn, move=last_move)


    def moves(self, state):
        """
            Determine all the actions/moves (x,y pairs) that can be played on the current board state
            and return them

            A move is an empty square that the current player can place their token on
            (It doesn't matter who the current player is)

            :return: a list of tuples, with each tuple being the position/coordinates of a square on the board
        """

        board = state.board

        # A list of tuples
        moves = list()
        board = state.board

        for i in range(len(board)):
            for j in range(len(board[i])):
                # Check if the cell is empty
                if board[i][j] == self.EMPTY:
                    # Create a tuple holding a possible i, j move
                    move_tuple = (i, j)
                    # Add the tuple to the list
                    moves.append(move_tuple)

        return moves


    def play_move(self, state, move):
        """
            Play the specified action/move on this current state (resulting in a child state)
            :return: a new State with the move having being played on it
        """
        # Assumption: move is a tuple
        # Play the move on this state
        board = state.board
        child_board = copy.deepcopy(board)
        child_board[move[0]][move[1]] = state.turn

        # Return a new state - updated board - updated turn
        return State(board=child_board, turn=-state.turn, move=move)  # turn=-self.turn since TicTacToeGame.X == -TicTacToeGame.O and vice versa


    def get_child_states(self, state):
        """
            Generate and return all child states of this state with the current player being state.turn

            Use this method for expansion

            :return: a list of all child States of this state
        """

        child_moves = self.moves(state)
        board = state.board
        child_states = list()

        for move in child_moves:  # move is a tuple represeting the position/coordinates of a square on the board
            child_board = copy.deepcopy(board)
            i = move[0]
            j = move[1]
            child_board[i][j] = state.turn
            child_states.append(State(board=child_board, turn=-state.turn, move=move))  # toggle/switch the current player

        """
        Deprecated
        for i in range(len(board)):
            for j in range(len(board[i])):
                if (board[i][j] == TicTacToeGame.EMPTY):
                    child_board = copy.deepcopy(board)
                    child_board[i][j] = state.turn
                    child_states.append(State(board=child_board, turn=-state.turn)) # toggle/switch the current player
        """

        return child_states


    def get_empty_squares(self, state):
        """
            :return: a list of tuples, with each tuple representing the coordinates of an empty square on the board
        """

        board = state.board
        empty_squares = list()

        for i in range(len(board)):
            for j in range(len(board[i])):
                if state.board[i][j] == self.EMPTY:
                    empty_square = (i, j)  # (i, j) is a tuple
                    empty_squares.append(empty_square)

        return empty_squares


    def display_state(self, state):
        """
            A function for display the board state
        """

        board = state.board

        for i in range(len(board)):
            line = ""
            for j in range(len(board[i])):
                character = board[i][j]
                if character == self.EMPTY:
                    character = '_'
                elif character == self.X:
                    character = 'X'
                elif character == self.O:
                    character = 'O'
                else:
                    character = 'S'
                line += "|" + str(character)
            line += "|"
            print(line)


    def check_game_over(self, state):
        """
            Determines whether the game is over or not for this state and 
            returns whether the game is over or not, the winner (or lack of one)
            and the utilty (score/value) of this (terminal) state

            The game is over if either X won, O won, or game is a draw
            If the game is over, then this state is a terminal state

            Utility value is 1 if X won, -1 if O won, 0.5 if there is a draw, 
            or 0 if this state is not a terminal state (the game is currently in progress)

            :return: a (is_terminal_state, winner, utilty) tuple 
        """

        # Default - game is not over - currently in progress 
        is_terminal_state = False
        winner = self.IN_PROGRESS 
        utility = 0


        if self.check_k_win(state):
            # If either player has won
            is_terminal_state = True
            winner = -state.turn # The winner is the player who made the move to reach this current state
            utility = winner      # TicTacToeGame.X is represented by 1 and TicTacToeGame.O is represented by -1 so the corresponding value for the winner will be returned

        elif not self.moves(state): 
            # Alternate elif condition: np.array(self.board == self.EMPTY)).sum() == 0
            """
                If the game is a draw:
                If no more moves can be made of the board (if the list returned
                by moves() is empty) i.e. all squares have been filled with tokens
            """
            is_terminal_state = True
            winner = self.DRAW
            utility = 0.5
        else:
            pass # Default values will remain


        return is_terminal_state, winner, utility


    
    def check_k_win(self,state):
        stateBoard = state.board


        m = int(self.k / 2)
        tempMatrix = [[0 for i in range(self.n + (m * 2))] for j in range(self.n + (m * 2))]
        for i in range(self.n):
            for j in range(self.n):
                tempMatrix[i + m][j + m] = stateBoard[i][j]
        
        for i in range(len(state.board)):
            for j in range(len(state.board[i])):
                if (self.check_vertical_k_win(tempMatrix,state.board, i, j, -state.turn, m) or self.check_horizontal_k_win(tempMatrix,state.board, i, j, -state.turn, m) 
                or self.check_negative_diagonal_k_win(tempMatrix,state.board, i, j, -state.turn, m) or self.check_positive_diagonal_k_win(tempMatrix,state.board, i, j, -state.turn, m)):
                    return True
        return False

   
    def check_vertical_k_win(self,tempMatrix ,matrix, row, col, token, n):
        count = 0
        countS = 0
        sArray = []
        sArray.append(token)

        rows = self.n
        cols = self.n
        k = self.k

        if row <= rows-2:
            for l in range(k):
                if ((tempMatrix[row + n + l][col + n] == token) or (tempMatrix[row + n + l][col + n] == TicTacToeGame.S)):
                    count = count + 1
                elif (token == TicTacToeGame.S):
                    if (len(sArray) == 1):
                        sArray.append(tempMatrix[row + n + l][col + n])
                    if (tempMatrix[row + n + l][col + n] in sArray):
                        countS = countS + 1
                    else:
                        countS = 0

        if (count == k or countS == k):
                return True
        return False


    def check_horizontal_k_win(self,tempMatrix ,matrix, row, col, token,n):
        count = 0
        countS = 0
        sArray = []
        sArray.append(token)

        rows = self.n
        cols = self.n
        k = self.k

        if col <= cols-2:
            for l in range(k):
                if ((tempMatrix[row + n][col + n + l] == token) or (tempMatrix[row + n][col + n + l] == TicTacToeGame.S)):
                    count = count + 1
                elif (token == TicTacToeGame.S):
                    if (len(sArray) == 1):
                        sArray.append(tempMatrix[row + n][col + n + l])
                    if (tempMatrix[row + n][col + n + l] in sArray):
                        countS = countS + 1
                    else:
                        countS = 0

        if (count == k or countS == k):
                    return True
        return False


    def check_negative_diagonal_k_win(self,tempMatrix ,matrix, row, col, token, n):
        count = 0
        countS = 0
        sArray = []
        sArray.append(token)

        rows = self.n
        cols = self.n
        k = self.k

        if row <= rows - 2 and col <= cols-2:
            for l in range(k):
                if ((tempMatrix[row + n + l][col + n + l] == token) or (tempMatrix[row + n + l][col + n + l] == TicTacToeGame.S)):
                    count = count + 1
                elif (token == TicTacToeGame.S):
                    if (len(sArray) == 1):
                        sArray.append(tempMatrix[row + n + l][col + n + l])
                    if (tempMatrix[row + n + l][col + n + l] in sArray):
                        countS = countS + 1
                    else:
                        countS = 0

        if (count == k or countS == k):
                    return True
        return False


    def check_positive_diagonal_k_win(self,tempMatrix ,matrix, row, col, token, n):
        count = 0
        countS = 0
        sArray = []
        sArray.append(token)

        rows = self.n
        cols = self.n
        k = self.k

        if row <= rows - 2 and col <= cols-2:
            for l in range(k):
                #print(tempMatrix[row + n - l][col + n + l])
                if ((tempMatrix[row + n - l][col + n + l] == token) or (tempMatrix[row + n - l][col + n + l] == TicTacToeGame.S)):
                    count = count + 1
                elif (token == TicTacToeGame.S):
                    if (len(sArray) == 1):
                        sArray.append(tempMatrix[row + n - l][col + n + l])
                    if (tempMatrix[row + n - l][col + n + l] in sArray):
                        countS = countS + 1
                    else:
                        countS = 0

        if (count == k or countS == k):
                    return True
        #print(count)
        #print(countS)
        return False





# Node Class

[The root node of the tree is the current game state on which we want to decide the best move to make]


Node class for adversarial game: 

Option 1 (this is what sir mentions in notes): have 2 separate win counters, one for player 1 (Max) and one for player 2 (Min) and the one that is used to calculate UCT value is the player who's turn it is to currently play (Me: in state 0 - the initial state of the algo - the current state of the game). So if terminal value is positive then increment Player 1's score, else if negative then increment Player 2's score with the absolutve value of the terminal value. (We can also increment both players' scores (with half the value of a win - if win=1 then draw =0.5 for both players) if there is a draw). 
[AIMA mentions about having a vector of values per node for multiplayer games]

Option 2: have 1 score value, higher value indiciating preference to Max and lower value indicating preference to Min (thus value can go negative). So algo selects the node with either the best or worst UCT value depending on who's turn it is

Option 3 (See: https://en.wikipedia.org/wiki/Monte_Carlo_tree_search#Principle_of_operation): have 1 score/value counter for each node, and that score reperesents the player that that node represents. So when backpropagating, we update every alternate ascendant node with the score (the start of the alternation depends on whether the leaf node that was rolled out resulted in a win/loss for the player that that lead node represents). (So if selecting a child node, then the current node will have to choose the child with the minimum value as the values of the children represent that of the opposite player)
[This option is similar to Option 1]


Going with Option 1

Modelling: use MaxV and MinV for the scores of the 2 players in the Node representation. Max is the X token (the player who plays the first move of the game) and Min is the O token


calculate_uct_value_() explanation (also see its docstring):

in select_best__child() for the current node, we calculating the uct of each of its child nodes and selecting the one that has the highest uct value.
But the node that has the highest uct value needs to be in terms of this node - the player of this current node's state whose turn it is to make a move. cos we want to select the best move that this current player should play.

So when calculating the child node that has the highest uct value, it needs to be in terms of the player of this current node.

say the current node is X's turn to play. we want to select a child node of X (all of which are O's turn to play) that is the best move for x - so we want to get the node that has the max wins for X in the rollouts

all of the child nodes are O's turns. so when calcuating uct of the child node, we use the player of the parent node, which is X. and vice versa if current node is O's turn to play


In [76]:
class Node:
    """
        A node for the Monte Carlo Tree Search algorithm, that is based on 
        a Tic Tac Toe State
     """
 
 
    def __init__(self, state: State, parent=None):
        """
            Constructor for the Node class

            Parameters:
            :param state (State): An instance of the state of the game
            :param parent (Node): The parent node of the current node
        """

        # The number of wins for player Max through this node
        self.Max_V = 0

        # The number of wins for player Min through this node
        self.Min_V = 0

        # The number of times the node (or its children) has been rolled out
        self.n = 0

        # The parent Node
        self.parent = parent

        # The Game State
        self.state = state

        # Chilren Nodes
        self.children = list()
    

    def calculate_uct_value(self):
        """
            Calculate and return the UCT value of this node based on the player whose turn it is 
            in the state of the parent node of this node

            We are using the player of the parent node's state instead of the player
            of this node's state as this node's state is a result of the player
            of the parent node's state making a move, and the UCT value will be used
            to determine the quality of the move from the parent nodes perspective.

            But if the current node is the root node of the tree (it has no parent)
            then we'll use the player of this root node's state. Though this function won't
            be called on the root node of a tree as it called on children nodes of a node. 
            But this case is placed here for validation purposes so that there's no error
            if parent=None
        
            :return: uct: Float
        """

       
        if self.n == 0:  # Check for 0 division
            # Return positive infinity (This node must be set to a high priority for Exploration)
            return np.inf
        else:
            v = 0

            if self.parent != None:
                if self.parent.state.turn == TicTacToeGame.X:
                    v = self.Max_V # Player Max is X
                else:
                    v = self.Min_V # Player Min is Y
            else: # if root node of the tree - it doesn't have a parent node
                if self.state.turn == TicTacToeGame.X:
                    v = self.Max_V
                else:
                    v = self.Min_V 

            # Return uct value for this node (based on its parent node's state's current player)
            return (v/self.n) + C * math.sqrt(np.log(self.parent.n)/self.n)



    def select_best_child(self):
        """
            Determine the best child node of this node

            :return: best_child: Node
        """
        max_uct = 0
        best_child = None
        for child in self.children:
            uct_value = child.calculate_uct_value()
            if uct_value > max_uct:
                max_uct = uct_value
                best_child = child
        return best_child


    def to_string(self):
        output = "Node Summary\nNumber of wins through this node: {}\nNumber of visits: {}\nUCT value = {}".format(self.v, self.n, self.calculate_uct_value())
        return output

# Monte Carlo Tree Search

AIMA: When the iterations terminate, the move with the highest number of Playouts/Rollouts is returned. You might think that it would be better to return the node with the highest average utility, but the idea is that a node with 65/100 wins is better than one with 2/3 wins, because the latter has a lot of uncertainty. In any event, the UCB1 formula ensures that the node with the most playouts is almost always the node with the highest win percentage, because the selection process favors win percentage more and more as the number of playouts goes up.


At the end, number of rollouts on the root node = number of iterations,
as for each iteration, the result propagates back to the root node


In [90]:
def monte_carlo_tree_search(game, state, iterations=100):
    """
        Conduct the MCTS from the given state (the state that is passed in)
        Return the best move to make on the state passed in (the current state of the game)
        along with its resulting state
    """

    # Child/Sub Functions

    def add_child_nodes(node):
        """
            Create children nodes of this node and add it to the node as its children
        """

        state = node.state
        child_states = game.get_child_states(state)
        node.children = list() # initialise to an empty list
        
        if child_states: # if the child_states list is not empty
            for child_state in child_states:
                node.children.append(Node(child_state, parent=node)) # add child state as a child node to this current node


    def random_action(state):
        """
            Randomly select an action/move from all the possible moves that
            can be made on this state and return it

            :returns: random_move: A tuple (i, j) to place a move
        """

        possible_moves = game.moves(state)
        option = random.randint(0, len(possible_moves)-1)
        random_move = possible_moves[option]
        return random_move


    def simulate(state, move):
        """
            Apply a random action on the state

            Return:
            new_state: State
        """

        new_state = game.play_move(state, move)
        return new_state


    def rollout(state):

        while True:
            
            is_terminal_state, winner, utility = game.check_game_over(state)
            
            if is_terminal_state:
                return winner, utility

            action = random_action(state)
            state = simulate(state, action)


    def backpropagate(node, winner, result):
        """
            :param node: the node that rollout was done on and to start backpropagating
                the result from
            :param winner: the winner value returned by check_game_over()
            :param result: the utility value returned by check_game_over()
        """

        while node.parent != None:
            
            node.n += 1

            if winner == game.DRAW:
                node.Max_V += result
                node.Min_V += result
            elif winner == game.MAX:
                node.Max_V += result
            else: # if winner == game.MIN
                node.Min_V += result

            node = node.parent

        # Do for root node (parent==None)
        node.n += 1

        if winner == game.DRAW:
            node.Max_V += result
            node.Min_V += result
        elif winner == game.MAX:
            node.Max_V += result
        else: # if winner == game.MIN
            node.Min_V += result



    # MCTS - still working on it
    root_node = Node(state)
    count = 0
  
    while count < iterations:
        print("Iteration:", count+1)
        current = root_node # reset current for this new iteration

        """
            While current is not a leaf node (has children nodes), get the best child of current.
            Stop when current is a leaf node (doesn't have any children)

            current.children will evaluate to true if the children list is
            not empty (current has children nodes), and will evaluate to false
            if the children list is empty (current doesn't have any children
            nodes -> current is a leaf node)

        """
        while current.children: # Tree Traversal: while current is not a leaf node
            current = current.select_best_child()

        # current is now a leaf node (doesn't have any children)
        is_terminal_state, winner, utility = game.check_game_over(current.state)
        
        # if the state of the current node is a terminal state, then return its utilty (already calculated above)
        # if current.n == 0  then its utility will be returned anyway in rollout()
        # but if current.n != 0 there will be an issue as this node will not have any child nodes
        # and we'll be accessing the first child node of this node
        if not is_terminal_state: # if the state of the current (leaf) node is not a terminal state 
            if current.n == 0:
                winner, utility = rollout(current.state)
            else:
                add_child_nodes(current)
                #current = current.select_best_child()
                current = current.children[0] # First new child node
                winner, utility = rollout(current.state)

        backpropagate(current, winner, utility)
        count = count + 1

    # Return best action/move to make
    # AIMA: When the iterations terminate, the move with the highest number of Playouts/Rollouts is returned
    max_rollouts = 0
    best_move = None
    resultant_state = None

    rollouts_children_string = "Num rollouts on chidren of root:\t"
    for child in root_node.children:
        rollouts = child.n
        rollouts_children_string += str(rollouts) + "\t"
        if rollouts > max_rollouts:
            max_rollouts = rollouts
            best_move = child.state.move
            #print(best_move.state.board)
            resultant_state = child.state
    
    print(rollouts_children_string)
    return best_move, resultant_state, max_rollouts







# Examples




In [64]:
# For ease of access by the examples below
X = TicTacToeGame.X
O = TicTacToeGame.O

num_iterations = 100


## Example 1: 4x4 Ex 1


In [91]:
game = TicTacToeGame(4)
current_state = game.create_game_state([(0,0)],[(X,3,0),(O,3,3),(X,1,2)])
game.display_state(current_state)

best_move, resultant_state, max_rollouts = monte_carlo_tree_search(game, current_state, num_iterations)
print("Best move to make: ", best_move)
print("\nResultant State:")
game.display_state(resultant_state)
print("\nNumber of rollouts on this state (out of a total of", num_iterations, "rollouts):", max_rollouts)

print("")

|S|_|_|_|
|_|_|X|_|
|_|_|_|_|
|X|_|_|O|
Iteration: 1
Iteration: 2
Iteration: 3
Iteration: 4
Iteration: 5
Iteration: 6
Iteration: 7
Iteration: 8
Iteration: 9
Iteration: 10
Iteration: 11
Iteration: 12
Iteration: 13
Iteration: 14
Iteration: 15
Iteration: 16
Iteration: 17
Iteration: 18
Iteration: 19
Iteration: 20
Iteration: 21
Iteration: 22
Iteration: 23
Iteration: 24
Iteration: 25
Iteration: 26
Iteration: 27
Iteration: 28
Iteration: 29
Iteration: 30
Iteration: 31
Iteration: 32
Iteration: 33
Iteration: 34
Iteration: 35
Iteration: 36
Iteration: 37
Iteration: 38
Iteration: 39
Iteration: 40
Iteration: 41
Iteration: 42
Iteration: 43
Iteration: 44
Iteration: 45
Iteration: 46
Iteration: 47
Iteration: 48
Iteration: 49
Iteration: 50
Iteration: 51
Iteration: 52
Iteration: 53
Iteration: 54
Iteration: 55
Iteration: 56
Iteration: 57
Iteration: 58
Iteration: 59
Iteration: 60
Iteration: 61
Iteration: 62
Iteration: 63
Iteration: 64
Iteration: 65
Iteration: 66
Iteration: 67
Iteration: 68
Iteration: 69
Ite

## Example 2: 4x4 Ex 2

In [None]:

# FIX THE FOLLOWING GAME SCENARIOS BELOIW - SET THE PLAYER PARAM (alternate X and O) AND MAKE THE BRACKETS FOR EACH MOVE BE ROUND BRACKETS

game = TicTacToeGame(4)
current_state = game.create_game_state([[1,1]],[[3,0],[3,2],[0,3],[3,3]])
game.display_state(current_state)

best_move, resultant_state, max_rollouts = monte_carlo_tree_search(game, current_state, num_iterations)
print("Best move to make: ", best_move)
print("\nResultant State:")
game.display_state(resultant_state)
print("\nNumber of rollouts on this state (out of a total of", num_iterations, "rollouts:", max_rollouts)

print("")

## Example 3: 5x5 Ex 1

In [None]:
game = TicTacToeGame(5)
current_state = game.create_game_state([[3,0],[1,4]],[[2,2],[3,3]])
game.display_state(current_state)

best_move, resultant_state, max_rollouts = monte_carlo_tree_search(game, current_state, num_iterations)
print("Best move to make: ", best_move)
print("\nResultant State:")
game.display_state(resultant_state)
print("\nNumber of rollouts on this state (out of a total of", num_iterations, "rollouts:", max_rollouts)

print("")

## Example 4: 6x6 Ex 1

In [None]:
game = TicTacToeGame(6)
current_state = game.create_game_state([[1,1],[4,4]],[[0,0],[2,2],[3,3],[5,5]])
game.display_state(current_state)

best_move, resultant_state, max_rollouts = monte_carlo_tree_search(game, current_state, num_iterations)
print("Best move to make: ", best_move)
print("\nResultant State:")
game.display_state(resultant_state)
print("\nNumber of rollouts on this state (out of a total of", num_iterations, "rollouts:", max_rollouts)

print("")

## Example 5: 7x7 Ex 1

In [None]:
game = TicTacToeGame(7)
current_state = game.create_game_state([[6,1],[2,2],[3,5]],[[6,2],[1,3],[3,4]])
game.display_state(current_state)

best_move, resultant_state, max_rollouts = monte_carlo_tree_search(game, current_state, num_iterations)
print("Best move to make: ", best_move)
print("\nResultant State:")
game.display_state(resultant_state)
print("\nNumber of rollouts on this state (out of a total of", num_iterations, "rollouts:", max_rollouts)

print("")

## Example 6: 7x7 Ex 2

In [None]:
game = TicTacToeGame(7)
current_state = game.create_game_state([[6,1],[2,2],[4,5]],[[3,1],[2,0],[1,2],[1,3],[3,4]])
game.display_state(current_state)

best_move, resultant_state, max_rollouts = monte_carlo_tree_search(game, current_state, num_iterations)
print("Best move to make: ", best_move)
print("\nResultant State:")
game.display_state(resultant_state)
print("\nNumber of rollouts on this state (out of a total of", num_iterations, "rollouts:", max_rollouts)

print("")

User input (we dont need this)



In [48]:

# Dimension of the board from user
try:
  size = int(input("Enter the size of the square board (4,5,6,7): "))

  if size < 4 or size > 7:
    size = 4 # Default if invalid value entered
except ValueError:
  size = 4 # Default if invalid (non-int) input

# Create game instance
ttt_game = TicTacToeGame(size)


"""
Create an initial state
No X's or O's are present
"""
initial_state = ttt_game.create_game_state()

next_state = ttt_game.get_child_states(initial_state)
new_state = next_state[5]
move = [2,1]
next_state = ttt_game.play_move(new_state,move)

ttt_game.display_state(next_state)


best_move, resultant_state = monte_carlo_tree_search(ttt_game, next_state, 100)




"""
print("Board Size: {}x{}\nNumber of tokens needed to win: {}\nNumber of Neutral Tokens available: {}".format(TIC_TAC_TOE_GAME.n, TIC_TAC_TOE_GAME.n, TIC_TAC_TOE_GAME.k, TIC_TAC_TOE_GAME.neutral_tokens))
"""

KeyboardInterrupt: ignored