In [None]:
from IPython.core.display import HTML
with open('../style.css') as f:
    css = f.read()
HTML(css)

# Alpha-Beta Pruning with Intelligent Memoization

In order to have some variation in our game, we use random numbers to choose between optimal moves.
In order to have reproducible results, we use a *seed* for the random number generator.

In [None]:
import random
random.seed(0)

The global variable `gCache` is used as a cache for the function `evaluate` defined later.  Instead of just storing the values for a given `State`, the cache stores pairs of the form 
* `('=', v)`, 
* `('≤', v)`, or
* `('≥', v)`.

The first component of these pairs is a *flag* that specifies whether the stored value `v` is exact or whether it only is a lower or upper bound.  Concretely, provided `gCache[State]` is defined and `evaluate(State)` computes the *value* of a given `State`, the following invariants are satisfied:
* $\texttt{gCache[State]} = (\texttt{'='}, v) \rightarrow \texttt{evaluate(State)} = v$.
* $\texttt{gCache[State]} = (\texttt{'≤'}, v) \rightarrow \texttt{evaluate(State)} \leq v$.
* $\texttt{gCache[State]} = (\texttt{'≥'}, v) \rightarrow \texttt{evaluate(State)} \geq v$.

In [None]:
gCache = {}

The function `evaluate` takes four arguments:
- `State` is the current state of the game,
- `f`     is either the function `maxValue` or the function `minValue`.

   These functions are defined later. If in `State` it is the first player's turn, then `f` is equal to `maxValue`, else `f` is equal to `minValue`.
- `alpha` is a lower bound for the value of `State`,
- `beta`  is an upper bound for the value of `State`,

The function `evaluate` satisfies the following specification:
- $\alpha \leq \texttt{value}(s) \leq \beta \;\rightarrow\;\texttt{evaluate}(s, f, \alpha, \beta) = \texttt{value}(s)$
- $\texttt{value}(s) < \alpha \;\rightarrow\; \texttt{evaluate}(s, f, \alpha, \beta) \leq \alpha$
- $\beta < \texttt{value}(s) \;\rightarrow\; \beta \leq \texttt{evaluate}(s, f, \alpha, \beta)$

Here, the expression `value(s)` returns the *value* of the given state `s` if both players play their optimal game.  This value is an element from the set $\{-1, 0, 1\}$.  
- If the first player can force a win, the return value is `1`.
- If the first player can at best force a draw, the return value is `0`.
- If the second player can force a win, the return value is `-1`.

For reasons of efficiency, the function `evaluate` is *memoized* using the global variable `gCache`.
If `gCache[State]` is defined, then the computation of `evaluate(State, alpha, beta)`
proceeds according to the following case distinction:
1. If the stored value $v$ is exact, we can return this value:

   $$\texttt{gCache[State]} = (\texttt{'='}, v) \rightarrow \texttt{evaluate}(\texttt{State}, \alpha, \beta) = v.$$
2. If the stored value $v$ is an upper bound and this upper bound is less or equal than $\alpha$, then we know that
   the true value of `State` is less or equal than $\alpha$ and hence we can also return the value $v$:

   $$\texttt{gCache[State]} = (\texttt{'≤'}, v) \wedge v \leq \alpha \rightarrow 
     \texttt{evaluate}(\texttt{State}, \alpha, \beta) = v.$$
3. If the stored value $v$ is an upper bound and this upper bound is bigger than $\alpha$ but less than $\beta$, then we know that the true value is less or equal than $v$ and hence we shrink the interval $[\alpha, \beta]$
   into the interval $[\alpha, v]$.

   $$\texttt{gCache[State]} = (\texttt{'≤'}, v) \wedge \alpha < v < \beta \rightarrow 
     \texttt{evaluate}(\texttt{State}, \alpha, \beta) = f(\texttt{State}, \alpha, v).$$
     
   In this case, `gCache` is updated.  

4. If the stored value $v$ is an upper bound and this upper bound is bigger or equal than $\beta$, then the stored value isn't of any help.
   
   $$\texttt{gCache[State]} = (\texttt{'≤'}, v) \wedge \beta \leq v \rightarrow 
     \texttt{evaluate}(\texttt{State}, \alpha, \beta) = f(\texttt{State}, \alpha, \beta).$$

   In this case,  `gCache` needs to be updated.

5. If the stored value $v$ is a lower bound and this lower bound is greater or equal than $\beta$, then we 
   know that the true value is bigger or equal than $\beta$ and hence we can return the value $v$: 

   $$\texttt{gCache[State]} = (\texttt{'≥'}, v) \wedge \beta \leq v \rightarrow 
     \texttt{evaluate}(\texttt{State}, \alpha, \beta) = v.$$
     
6. If the stored value $v$ is a lower bound and this lower bound is less than $\beta$ but bigger than $\alpha$,
   then we know that the true value is bigger or equal than $v$ and hence we shrink the interval $[\alpha,\beta]$
   into the interval $[v, \beta]$: 

   $$\texttt{gCache[State]} = (\texttt{'≥'}, v) \wedge \alpha < v < \beta \rightarrow 
     \texttt{evaluate}(\texttt{State}, \alpha, \beta) = f(\texttt{State}, v, \beta).$$
     
   In this case, `gCache` is updated.    
7. If the stored value $v$ is a lower bound and this lower bound is less or equal than $\alpha$, then the stored value isn't of any help.
   
   $$\texttt{gCache[State]} = (\texttt{'≥'}, v) \wedge v \leq \alpha \rightarrow 
     \texttt{evaluate}(\texttt{State}, \alpha, \beta) = f(\texttt{State}, \alpha, \beta).$$

   In this case, the `gCache` needs to be updated.

In [None]:
def evaluate(State, f, alpha=-1, beta=1):
    global gCache
    if State in gCache:
        flag, v = gCache[State]
        if flag == '=':
            return v
        if flag == '≤':
            if v <= alpha:
                return v
            elif alpha < v < beta:
                w = f(State, alpha, v)
                store_cache(State, alpha, v, w)
                return w
            else: 
                w = f(State, alpha, beta)
                store_cache(State, alpha, beta, w)
                return w
        if flag == '≥':
            if beta <= v:
                return v
            elif alpha < v < beta:
                w = f(State, v, beta)
                store_cache(State, v, beta, w)
                return w
            else:
                w = f(State, alpha, beta)
                store_cache(State, alpha, beta, w)
                return w
    else: # no value stored in gCache for State
        v = f(State, alpha, beta)
        store_cache(State, alpha, beta, v)
        return v

The function `store_cache` is called with four arguments:
* `State` is a state of the game,
* `alpha` is a number,
* `beta`  is a number, and
* `v`     is a number such that:
  $$\texttt{evaluate(State, f, alpha, beta)} = v$$
  
The function stores the `value` in the dictionary `gCache` under the key `State`.
It also stores an indicator that is either `'≤'`, `'='`, or `'≥'`.  The value that is stored 
satisfies the following conditions:
* If `gCache[State] = ('≤', v)`, then `value(State) ≤ v`. 
* If `gCache[State] = ('=', v)`, then `value(State) = v`. 
* If `gCache[State] = ('≥', v)`, then `value(State) ≥ v`. 

In [None]:
def store_cache(State, alpha, beta, v):
    global gCache
    if   v <= alpha:
        gCache[State] = ('≤', v)
    elif v <  beta: # alpha < v
        gCache[State] = ('=', v)
    else: # beta <= v
        gCache[State] = ('≥', v)

The function `maxValue` is called with three arguments:
* `State` is a state of the game,
* `alpha` is a number, and
* `beta`  is a number.

The function `maxValue` is only called if in the given `State` it is the turn of the maximizing player.

The function `maxValue` satisfies the following specification:
- $\alpha \leq \texttt{value}(s) \leq \beta \;\rightarrow\;\texttt{maxValue}(s, \alpha, \beta) = \texttt{value}(s)$
- $\texttt{value}(s) < \alpha \;\rightarrow\; \texttt{maxValue}(s, \alpha, \beta) \leq \alpha$
- $\beta < \texttt{value}(s) \;\rightarrow\; \beta \leq \texttt{maxValue}(s, \alpha, \beta)$

In [None]:
def maxValue(State, alpha, beta):
    if finished(State):
        return utility(State)
    if alpha >= beta:
        return alpha
    v = alpha
    for ns in next_states(State, gPlayers[0]):
        v = max(v, evaluate(ns, minValue, v, beta))
        if v >= beta:
            return v
    return v

The function `minValue` is called with three arguments:
* `State` is a state of the game,
* `alpha` is a number, and
* `beta`  is a number.

The function `minValue` is only called if in the given `State` it is the turn of the minimizing player.

The function `minValue` satisfies the following specification:
- $\alpha \leq \texttt{value}(s) \leq \beta \;\rightarrow\;\texttt{minValue}(s, \alpha, \beta) = \texttt{value}(s)$
- $\texttt{value}(s) < \alpha \;\rightarrow\; \texttt{minValue}(s, \alpha, \beta) \leq \alpha$
- $\beta < \texttt{value}(s) \;\rightarrow\; \beta \leq \texttt{minValue}(s, \alpha, \beta)$

In [None]:
def minValue(State, alpha, beta):
    if finished(State):
        return utility(State)
    if beta <= alpha:
        return beta
    v = beta
    for ns in next_states(State, gPlayers[1]):
        v = min(v, evaluate(ns, maxValue, alpha, v))
        if v <= alpha:
            return v
    return v

In [None]:
%%capture
%run Tic-Tac-Toe-Bitboard.ipynb

$\alpha$-$\beta$ pruning with intelligent memoization takes 17 ms to compute the value of the state `gStart` for Tic-Tac-Toe. 
Hence, there is no big difference between intelligent memoization and naive memoization in this example.  However, if we implement
a non-trivial game like Connect Four, the situation changes.

In [None]:
%%time
v = evaluate(gStart, maxValue, -1, 1)
v

We check how many different states are stored in the `Cache`.  Without alpha-beta pruning, we had to inspect $5478$ different states, but now there are only
$2474$ different states in the cache.

In [None]:
len(gCache)

## Playing the Game

The function `best_move` takes two arguments:
- `State` is the current state of the game,
- `player` is a player.

The function `best_move` returns a pair of the form $(v, s)$ where $s$ is a state and $v$ is the value of this state.  The state $s$ is a state that is reached from `State` if `player` makes one of her optimal moves.  In order to have some variation in the game, the function randomly chooses any of the optimal moves.

In [None]:
def best_move(State):
    NS        = next_states(State, gPlayers[0])
    bestValue = evaluate(State, maxValue, -1, 1)
    BestMoves = [s for s in NS if evaluate(s, minValue, -1, 1) == bestValue]
    BestState = random.choice(BestMoves)
    return bestValue, BestState

The next line is needed because we need the function `IPython.display.clear_output` to clear the output in a cell.

In [None]:
import IPython.display 

The function `play_game` plays on the given `canvas`.  The game played is specified indirectly by specifying the following:
- `Start` is a global variable defining the start state of the game.
- `next_states` is a function such that $\texttt{next_states}(s, p)$ computes the set of all possible states that can be reached from state $s$ if player $p$ is next to move.
- `finished` is a function such that $\texttt{finished}(s)$ is true for a state $s$ if the game is over in state $s$.
- `utility` is a function such that $\texttt{utility}(s, p)$ returns either `-1`, `0`, or `1` in the *terminal state* $s$.  We have that
  - $\texttt{utility}(s, p)= -1$ iff the game is lost for player $p$ in state $s$, 
  - $\texttt{utility}(s, p)=  0$ iff the game is drawn, and 
  - $\texttt{utility}(s, p)=  1$ iff the game is won for player $p$ in state $s$.

In [None]:
def play_game(canvas):
    State = gStart
    while (True):
        val, State = best_move(State);
        draw(State, canvas, f'For me, the game has the value {val}.')
        if finished(State):
            final_msg(State)
            break
        IPython.display.clear_output(wait=True)
        State = get_move(State)
        draw(State, canvas, '')
        if finished(State):
            IPython.display.clear_output(wait=True)
            final_msg(State)
            break

Let's draw the board.

In [None]:
canvas = create_canvas()
draw(gStart, canvas, f'Current value of game for "X": {v}')

Now its time to play.  In the input window that will pop up later, enter your move in the format "`row,col`"  with no space between row and column.  Both `row` and `col` should be integers from the set `{0,1,2}`.  

In [None]:
play_game(canvas)