# Nondeterministic Actions: Solving Tic-Tac-Toe with And-Or-Tree Search

## Introduction 
 
Multiplayer games can be implemented as:
1. __Nondeterministic actions:__ The opponent is seen as part of an environment with nondeterministic actions. Non-determinism is the result of the unknown opponent's moves. 
2. Optimal Decisions: Minimax search (search complete game tree) and alpha-beta pruning.
3. Heuristic Alpha-Beta Tree Search: Cut off tree search and use heuristic to estimate state value. 
4. Monte Carlo Tree search: Simulate playouts to estimate state value. 

Here we will implement search for Tic-Tac-Toe (see [rules](https://en.wikipedia.org/wiki/Tic-tac-toe)).

We will implement
* __And-Or-Tree search.__

Each action consists of the move by the player (Or levels in the tree) and all possible (i.e., nondeterministic) responses by the opponents (and levels). The action therefore results in a set of possible states called a __belief state.__

We will search for a __conditional plan__ using And-Or-Tree search. 

## State Space and Search Tree Size

Each state is a possible board. Each of the 9 squares can have 3 values (empty, x and o), but some boards are impossible (where a player has several sequences of 3).The number of states in the state space graph is less than:

In [1]:
3**9

19683

A search tree can be superimposed on the state space graph. Note that a state can be in several branches of the tree resulting in more notes. We collapse the 

* The complete search tree has a maximal depth $m=9$
* The max branching factor $b=9$ (for first move).

DFS has a time complecity of $O(b^m)$ and a space complexity of $O(bm)$.

The number of terminal nodes of the complete search tree are less than:

In [2]:
import math

math.factorial(9)

362880

We can reach a complete board with that many sequences. Some sequences are cut short because of a win and therefore there are less terminal notes.

__Note:__ This size makes this a very small problem that can be easily solved by searching the complete game tree. Most games and real problems are to large and can only be.

## The board

I represent the board as a vector of length 9. The values are `' ', 'x', 'o'`.  

In [3]:
def empty_board():
    return [' '] * 9

board = empty_board()
display(board)

[' ', ' ', ' ', ' ', ' ', ' ', ' ', ' ', ' ']

Some helper functions.

In [4]:
import numpy as np

def show_board(board):
    """display the board"""
    b = np.array(board).reshape((3,3))
    print(b)

board = empty_board()
show_board(board)    

print()
print("Add some x's")
board[0] = 'x'; board[3] = 'x'; board[6] = 'x';  
show_board(board)

[[' ' ' ' ' ']
 [' ' ' ' ' ']
 [' ' ' ' ' ']]

Add some x's
[['x' ' ' ' ']
 ['x' ' ' ' ']
 ['x' ' ' ' ']]


In [5]:
def check_win(board):
    """check the board and return one of x, o, d (draw), or n (for next move)"""

    board = np.array(board).reshape((3,3))
   
    diagonals = np.array([[board[i][i] for i in range(len(board))], 
                          [board[i][len(board)-i-1] for i in range(len(board))]])
    
    for a_board in [board, np.transpose(board), diagonals]:
        for row in a_board:
            if len(set(row)) == 1 and row[0] != ' ':
                return row[0]

    # check for draw
    if(np.sum(board == ' ') < 1):
        return 'd'
    
    return 'n'

show_board(board)
print('Win? ' + check_win(board))

print()
show_board(empty_board())
print('Win? ' + check_win(empty_board()))

[['x' ' ' ' ']
 ['x' ' ' ' ']
 ['x' ' ' ' ']]
Win? x

[[' ' ' ' ' ']
 [' ' ' ' ' ']
 [' ' ' ' ' ']]
Win? n


In [6]:
def actions(board):
    """return possible actions as a vector of indices"""
    return np.where(np.array(board) == ' ')[0].tolist()

show_board(board)
actions(board)

[['x' ' ' ' ']
 ['x' ' ' ' ']
 ['x' ' ' ' ']]


[1, 2, 4, 5, 7, 8]

In [7]:
def result(state, player, action):
    """Add move to the board."""
    
    state = state.copy()
    state[action] = player
  
    return state

show_board(empty_board())

print()
print("State for placing an x at position 4:")
show_board(result(empty_board(), 'x', 4))

[[' ' ' ' ' ']
 [' ' ' ' ' ']
 [' ' ' ' ' ']]

State for placing an x at position 4:
[[' ' ' ' ' ']
 [' ' 'x' ' ']
 [' ' ' ' ' ']]


## Recursive DFS Algorithm for And-Or-Tree Search

See AIMA page 125. 

Modifications to the algorithm in the textbook:
* Modify the `results()` function to return for a state $\times$ action combination a belief state that reflects all possible responses by the opponent.
* Removed path since it is not used.
* No cycle checking (not needed).
* Goal: 
    - End search (prune subtree) when player loses. 
    - Check for loss also in the "and" phase.
    - Draw can be set as a goal state or a loss. Since DSF finds the first solution, it might be a draw while there is still a solution where the player could win. Considering draw a loss will prune the "draw" solutions and leave only the "wins," if any.


In [8]:
def results(state, action, player = 'x'):
    """produce the belief state after the provided action for player. 
       The belief state is the set of boards with the action and all possible reactions by the opponent."""
    
    if player == 'x': other = 'o'
    else: other = 'x'
    
    state = state.copy()
    
    # player's move
    state[action] = player
    
    # opponent reacts
    r = list()
    o_actions = actions(state)
    
    # board is full
    if len(o_actions) < 1 : return [state]
    
    for o_a in o_actions:
        s = state.copy()
        s[o_a] = other
        r.append(s)    
    
    return r

show_board(empty_board())

print()
print("Belief state for placing an x at position 4 of an empty board:")
results(empty_board(), 4)

[[' ' ' ' ' ']
 [' ' ' ' ' ']
 [' ' ' ' ' ']]

Belief state for placing an x at position 4 of an empty board:


[['o', ' ', ' ', ' ', 'x', ' ', ' ', ' ', ' '],
 [' ', 'o', ' ', ' ', 'x', ' ', ' ', ' ', ' '],
 [' ', ' ', 'o', ' ', 'x', ' ', ' ', ' ', ' '],
 [' ', ' ', ' ', 'o', 'x', ' ', ' ', ' ', ' '],
 [' ', ' ', ' ', ' ', 'x', 'o', ' ', ' ', ' '],
 [' ', ' ', ' ', ' ', 'x', ' ', 'o', ' ', ' '],
 [' ', ' ', ' ', ' ', 'x', ' ', ' ', 'o', ' '],
 [' ', ' ', ' ', ' ', 'x', ' ', ' ', ' ', 'o']]

In [9]:
def is_terminal(state, player = 'x', draw_is_win = True):
    """returns win or None (loss) for terminal states and False for non-terminal states."""
    if player == 'x': other = 'o'
    else: other = 'x'
    
    goal = check_win(state)        
    if goal == player: return 'win' 
    if goal == 'd': 
        if draw_is_win: return 'draw' 
        else: return None 
    if goal == other: return None  # loss is failure
    return False # continue

print(is_terminal(['x'] * 9))
print(is_terminal(['o'] * 9))
print(is_terminal(empty_board()))

win
None
False


In [10]:
# define global variables for debuging
DEBUG = 1
COUNT = 0 # used to report the number of searched nodes

def and_or_search(board, player = 'x', draw_is_win = True):
    """start the search. Consider draw_is_win a goal state?"""
    global DEBUG, COUNT
    COUNT = 0
    
    plan = or_search(board, player, draw_is_win)
    
    if DEBUG >= 1: 
        print(f"Number of nodes searched: {COUNT}")  
    
    return plan
 

def or_search(state, player, draw_is_win):
    """Or step of the search: the player makes a move. 
    We try all possible action and return a conditional 
    plan for the first action that only has goal states as leaf nodes. 
    If none can be found, then failure (None) is returned."""
    global DEBUG, COUNT
    COUNT += 1
    
    # goal check
    g = is_terminal(state, player, draw_is_win)
    if g != False: 
        return(g)
     
    # Note: no cycles for this problem! This also means we do not need to maintain the path.    
    #if is_cycle(path) return None  

    # check all possible actions
    for action in actions(state):
        plan = and_search(results(state, action, player), player, draw_is_win)
        if plan is not None: 
            return [action, plan]
    
    # failure
    return None


def and_search(states, player, draw_is_win):
    """And step of the search: Represents all opponent's possible moves. 
    Follow all possible states (call the or step). 
    Return a conditional plan only if all paths lead to a goal state."""
    global DEBUG, COUNT
    COUNT += 1
    
    # return plans only if no state fails
    plans = []
    for s in states:    
        # added another goal/terminal check after my move.
        g = is_terminal(s, player, draw_is_win)
        if g != False: 
            return(g)
      
        plan = or_search(s, player, draw_is_win)
        
        if plan is None: 
            return None    # found a state that fails!
        plans.append(['if', s, 'then', plan])
        
    return plans

# Some Tests

In [11]:
# x is about to win

board = empty_board() 
board[0] = 'x'
board[1] = 'o'
board[3] = 'o'
board[4] = 'x'

print("Board:")
show_board(board)

print()
print("Win or draw:")
%timeit -n1 -r1 display(and_or_search(board, player = 'x', draw_is_win = True))

Board:
[['x' 'o' ' ']
 ['o' 'x' ' ']
 [' ' ' ' ' ']]

Win or draw:
Number of nodes searched: 22


[2,
 [['if', ['x', 'o', 'x', 'o', 'x', 'o', ' ', ' ', ' '], 'then', [6, 'win']],
  ['if',
   ['x', 'o', 'x', 'o', 'x', ' ', 'o', ' ', ' '],
   'then',
   [5,
    [['if', ['x', 'o', 'x', 'o', 'x', 'x', 'o', 'o', ' '], 'then', [8, 'win']],
     ['if',
      ['x', 'o', 'x', 'o', 'x', 'x', 'o', ' ', 'o'],
      'then',
      [7, 'draw']]]]],
  ['if',
   ['x', 'o', 'x', 'o', 'x', ' ', ' ', 'o', ' '],
   'then',
   [5,
    [['if', ['x', 'o', 'x', 'o', 'x', 'x', 'o', 'o', ' '], 'then', [8, 'win']],
     ['if',
      ['x', 'o', 'x', 'o', 'x', 'x', ' ', 'o', 'o'],
      'then',
      [6, 'win']]]]],
  ['if',
   ['x', 'o', 'x', 'o', 'x', ' ', ' ', ' ', 'o'],
   'then',
   [5,
    [['if',
      ['x', 'o', 'x', 'o', 'x', 'x', 'o', ' ', 'o'],
      'then',
      [7, 'draw']],
     ['if',
      ['x', 'o', 'x', 'o', 'x', 'x', ' ', 'o', 'o'],
      'then',
      [6, 'win']]]]]]]

12.5 ms ± 0 ns per loop (mean ± std. dev. of 1 run, 1 loop each)


In [12]:
print("Win only:")
%timeit -n1 -r1 display(and_or_search(board, player = 'x', draw_is_win = False))

Win only:
Number of nodes searched: 27


[2,
 [['if', ['x', 'o', 'x', 'o', 'x', 'o', ' ', ' ', ' '], 'then', [6, 'win']],
  ['if', ['x', 'o', 'x', 'o', 'x', ' ', 'o', ' ', ' '], 'then', [8, 'win']],
  ['if',
   ['x', 'o', 'x', 'o', 'x', ' ', ' ', 'o', ' '],
   'then',
   [5,
    [['if', ['x', 'o', 'x', 'o', 'x', 'x', 'o', 'o', ' '], 'then', [8, 'win']],
     ['if',
      ['x', 'o', 'x', 'o', 'x', 'x', ' ', 'o', 'o'],
      'then',
      [6, 'win']]]]],
  ['if', ['x', 'o', 'x', 'o', 'x', ' ', ' ', ' ', 'o'], 'then', [6, 'win']]]]

9.29 ms ± 0 ns per loop (mean ± std. dev. of 1 run, 1 loop each)


In [13]:
# x can draw_is_win if it chooses 7.

board = empty_board() 
board[0] = 'x'
board[1] = 'o'
board[2] = 'x'
#board[3] = 'o'
board[4] = 'o'

print("Board:")
show_board(board)

print()
print("Win or draw:")
%timeit -n1 -r1 display(and_or_search(board, player = 'x', draw_is_win = True))

Board:
[['x' 'o' 'x']
 [' ' 'o' ' ']
 [' ' ' ' ' ']]

Win or draw:
Number of nodes searched: 56


[7,
 [['if',
   ['x', 'o', 'x', 'o', 'o', ' ', ' ', 'x', ' '],
   'then',
   [5,
    [['if', ['x', 'o', 'x', 'o', 'o', 'x', 'o', 'x', ' '], 'then', [8, 'win']],
     ['if',
      ['x', 'o', 'x', 'o', 'o', 'x', ' ', 'x', 'o'],
      'then',
      [6, 'draw']]]]],
  ['if',
   ['x', 'o', 'x', ' ', 'o', 'o', ' ', 'x', ' '],
   'then',
   [3,
    [['if',
      ['x', 'o', 'x', 'x', 'o', 'o', 'o', 'x', ' '],
      'then',
      [8, 'draw']],
     ['if',
      ['x', 'o', 'x', 'x', 'o', 'o', ' ', 'x', 'o'],
      'then',
      [6, 'win']]]]],
  ['if',
   ['x', 'o', 'x', ' ', 'o', ' ', 'o', 'x', ' '],
   'then',
   [3,
    [['if',
      ['x', 'o', 'x', 'x', 'o', 'o', 'o', 'x', ' '],
      'then',
      [8, 'draw']],
     ['if',
      ['x', 'o', 'x', 'x', 'o', ' ', 'o', 'x', 'o'],
      'then',
      [5, 'draw']]]]],
  ['if',
   ['x', 'o', 'x', ' ', 'o', ' ', ' ', 'x', 'o'],
   'then',
   [3,
    [['if', ['x', 'o', 'x', 'x', 'o', 'o', ' ', 'x', 'o'], 'then', [6, 'win']],
     ['if',
      ['x', '

16.6 ms ± 0 ns per loop (mean ± std. dev. of 1 run, 1 loop each)


In [14]:
print("Win only:")
%timeit -n1 -r1 display(and_or_search(board, player = 'x', draw_is_win = False))

Win only:
Number of nodes searched: 52


None

11.8 ms ± 0 ns per loop (mean ± std. dev. of 1 run, 1 loop each)


In [15]:
# o is about to win

board = empty_board() 
board[0] = 'o'
board[1] = 'o'
board[3] = 'o'
board[4] = 'x'
board[8] = 'x'

print("Board:")
show_board(board)

print()
print("Win or draw:")
%timeit -n1 -r1 display(and_or_search(board, player = 'x', draw_is_win = True))

Board:
[['o' 'o' ' ']
 ['o' 'x' ' ']
 [' ' ' ' 'x']]

Win or draw:
Number of nodes searched: 7


None

3.72 ms ± 0 ns per loop (mean ± std. dev. of 1 run, 1 loop each)


In [16]:
# Player o
print("Win only:")
%timeit -n1 -r1 display(and_or_search(board, player = 'o', draw_is_win = False))

Win only:
Number of nodes searched: 2


[2, 'win']

2.82 ms ± 0 ns per loop (mean ± std. dev. of 1 run, 1 loop each)


In [17]:
# o went first

board = empty_board() 
board[4] = 'o'


print("Board:")
show_board(board)

print()
print("Win or draw:")
%timeit -n1 -r1 display(and_or_search(board, player = 'x', draw_is_win = True))

Board:
[[' ' ' ' ' ']
 [' ' 'o' ' ']
 [' ' ' ' ' ']]

Win or draw:
Number of nodes searched: 370


[0,
 [['if',
   ['x', 'o', ' ', ' ', 'o', ' ', ' ', ' ', ' '],
   'then',
   [7,
    [['if',
      ['x', 'o', 'o', ' ', 'o', ' ', ' ', 'x', ' '],
      'then',
      [6,
       [['if',
         ['x', 'o', 'o', 'o', 'o', ' ', 'x', 'x', ' '],
         'then',
         [5, 'draw']],
        ['if',
         ['x', 'o', 'o', ' ', 'o', 'o', 'x', 'x', ' '],
         'then',
         [3, 'win']],
        ['if',
         ['x', 'o', 'o', ' ', 'o', ' ', 'x', 'x', 'o'],
         'then',
         [3, 'win']]]]],
     ['if',
      ['x', 'o', ' ', 'o', 'o', ' ', ' ', 'x', ' '],
      'then',
      [5,
       [['if',
         ['x', 'o', 'o', 'o', 'o', 'x', ' ', 'x', ' '],
         'then',
         [6, 'draw']],
        ['if',
         ['x', 'o', ' ', 'o', 'o', 'x', 'o', 'x', ' '],
         'then',
         [2, 'draw']],
        ['if',
         ['x', 'o', ' ', 'o', 'o', 'x', ' ', 'x', 'o'],
         'then',
         [2, 'draw']]]]],
     ['if',
      ['x', 'o', ' ', ' ', 'o', 'o', ' ', 'x', ' '],
      

107 ms ± 0 ns per loop (mean ± std. dev. of 1 run, 1 loop each)


In [18]:
print("Win only:")
%timeit -n1 -r1 display(and_or_search(board, player = 'x', draw_is_win = False))

Win only:
Number of nodes searched: 720


None

73.9 ms ± 0 ns per loop (mean ± std. dev. of 1 run, 1 loop each)


In [19]:
# Empty board: Only a draw_is_win an be guaranteed

board = empty_board() 

print("Board:")
show_board(board)

print()
print("Win or draw:")
%timeit -n 1 -r 1 display(and_or_search(board, player = 'x', draw_is_win = True))

Board:
[[' ' ' ' ' ']
 [' ' ' ' ' ']
 [' ' ' ' ' ']]

Win or draw:
Number of nodes searched: 833


[0,
 [['if',
   ['x', 'o', ' ', ' ', ' ', ' ', ' ', ' ', ' '],
   'then',
   [2,
    [['if',
      ['x', 'o', 'x', 'o', ' ', ' ', ' ', ' ', ' '],
      'then',
      [4,
       [['if',
         ['x', 'o', 'x', 'o', 'x', 'o', ' ', ' ', ' '],
         'then',
         [6, 'win']],
        ['if',
         ['x', 'o', 'x', 'o', 'x', ' ', 'o', ' ', ' '],
         'then',
         [5,
          [['if',
            ['x', 'o', 'x', 'o', 'x', 'x', 'o', 'o', ' '],
            'then',
            [8, 'win']],
           ['if',
            ['x', 'o', 'x', 'o', 'x', 'x', 'o', ' ', 'o'],
            'then',
            [7, 'draw']]]]],
        ['if',
         ['x', 'o', 'x', 'o', 'x', ' ', ' ', 'o', ' '],
         'then',
         [5,
          [['if',
            ['x', 'o', 'x', 'o', 'x', 'x', 'o', 'o', ' '],
            'then',
            [8, 'win']],
           ['if',
            ['x', 'o', 'x', 'o', 'x', 'x', ' ', 'o', 'o'],
            'then',
            [6, 'win']]]]],
        ['if',
        

205 ms ± 0 ns per loop (mean ± std. dev. of 1 run, 1 loop each)


In [20]:
print("Win only:")
%timeit -n 1 -r 1 display(and_or_search(board, player = 'x', draw_is_win = False))

Win only:
Number of nodes searched: 13023


None

1.03 s ± 0 ns per loop (mean ± std. dev. of 1 run, 1 loop each)


## Experiments


### Baseline: Randomized Player

A completely randomized player agent should be a weak baseline.

In [21]:
def random_player(board, player = None):
    """Simple player that chooses a random empy square. player is unused"""
    return np.random.choice(actions(board))

show_board(board)
%timeit -n1 -r1 random_player(board)

[[' ' ' ' ' ']
 [' ' ' ' ' ']
 [' ' ' ' ' ']]
72.4 µs ± 0 ns per loop (mean ± std. dev. of 1 run, 1 loop each)


### The Environment

Implement the environment that calls the agent. The percept is the board and the action is move.

In [22]:
def switch_player(player, x, o):
    if player == 'x':
        return 'o', o
    else:
        return 'x', x

def play(x, o, N = 100):
    """Let two agents play each other N times. x starts. x and y are agent functions that 
    get the board as the percept and return their next action."""
    results = {'x': 0, 'o': 0, 'd': 0}
    
    for i in range(N):
        board = empty_board()
        player, fun = 'x', x
        
        while True:
            a = fun(board, player)
            board = result(board, player, a)
            
            win = check_win(board)   # returns the 'n' if the game is not done.
            if win != 'n':
                results[win] += 1
                break
            
            player, fun = switch_player(player, x, o)   
    
    return results

### Random vs. Random

In [23]:
# timeit: n ... how many times to execute the statement, r ... how many times to repeat the timer (default 5)

%timeit -n 1 -r 1 display(play(random_player, random_player, N = 100))

{'x': 63, 'o': 25, 'd': 12}

86.5 ms ± 0 ns per loop (mean ± std. dev. of 1 run, 1 loop each)


_Note:_ It looks like the first player (x) has an advantage!

### And-Or Tree Search vs. Random

In [24]:
DEBUG = 0

def and_or_player(board, player = 'x'):
    plan = and_or_search(board, player)

    # if there is no plan then we do a random
    if plan is None: 
        return np.random.choice(actions(board))
    else:
        return plan[0]



print("and-or-search vs. random:")
%timeit -n 1 -r 1 display(play(and_or_player, random_player))

print()
print("random vs. and-or-search")
%timeit -n 1 -r 1 display(play(random_player, and_or_player))

and-or-search vs. random:


{'x': 95, 'o': 0, 'd': 5}

7.88 s ± 0 ns per loop (mean ± std. dev. of 1 run, 1 loop each)

random vs. and-or-search


{'x': 0, 'o': 69, 'd': 31}

11.4 s ± 0 ns per loop (mean ± std. dev. of 1 run, 1 loop each)


### And-Or Tree Search vs. And-Or Tree Search

In [27]:
# No randomness -> run only once

%timeit -n 1 -r 1 display(play(and_or_player, and_or_player, N = 1))

{'x': 0, 'o': 0, 'd': 1}

179 ms ± 0 ns per loop (mean ± std. dev. of 1 run, 1 loop each)


_Note:_ And-Or search produces a complete conditional plan and a better implementation would not have to rerun the algorithm for each move. 