In [None]:
from IPython.core.display import HTML
with open('../style.css') as f:
    css = f.read()
HTML(css)

# Utilities

The global variable `gCache` is used as a cache for the function `evaluate` defined later.  Instead of just storing the values for a given `State`, the cache stores pairs of the form 
* `('=', v)`, 
* `('≤', v)`, or
* `('≥', v)`.

The first component of these pairs is a *flag* that specifies whether the stored value `v` is exact or whether it only is a lower or upper bound.  Concretely, provided `gCache[State]` is defined and `value(State)` computes the *value* of a given `State` from the perspective of the maximizing 
player, the following invariants are satisfied:
* $\texttt{gCache[State]} = (\texttt{'='}, v) \rightarrow \texttt{value(State)} = v$.
* $\texttt{gCache[State]} = (\texttt{'≤'}, v) \rightarrow \texttt{value(State)} \leq v$.
* $\texttt{gCache[State]} = (\texttt{'≥'}, v) \rightarrow \texttt{value(State)} \geq v$.

In [None]:
gCache = {}

In order to have some variation in our game, we use random numbers to choose between optimal moves.

In [None]:
import random
random.seed(0)

# Alpha-Beta Pruning with Progressive Deepening, Move Ordering, and Memoization

The function `pd_evaluate` takes three arguments:
- `State` is the current state of the game,
- `limit` determines how deep the game tree is searched,
- `f`     is either the function `maxValue` or the function `minValue`.

The function `pd_evaluate` uses *progressive deepening* to compute the value of `State`.  The given `State` is evaluated for a depth of $0$, $1$, $\cdots$, `limit`.  The values calculated for a depth of $l$ are stored and used to sort the states when `State` is next evaluated for a depth of $l+1$.  This is beneficial for *alpha-beta pruning* because alpha-beta pruning can cut off more branches from the search tree if we start be evaluating the best moves first.  

We need to declare the function `maxValue` since we use it as a default value for the parameter `f` of the function `pd_evaluate`. 

In [None]:
import time

In [None]:
def pd_evaluate(State, time_limit, f):
    start = time.time()
    limit = 0
    while True:
        value = evaluate(State, limit, f)
        stop = time.time()
        if value in [-1, 1] or stop - start > time_limit:
            print(f'searched to depth {limit}, using {round(stop - start, 3)} seconds')
            return value, limit
        limit += 1

The function `evaluate` takes five arguments:
- `State` is the current state of the game,
- `limit` determines the lookahead.  To be more precise, it is the number of *half-moves* that are investigated to compute the value.  If `limit` is 0 and the game has not ended, the game is evaluated via the function `heuristic`. This function is supposed to be defined in the notebook defining the game.
- `f` is either the function `maxValue` or the function `minValue`.  

   `f = maxValue` if it's the maximizing player's turn in `State`.  Otherwise,
   `f = minValue`.
- `alpha` and `beta` are the parameters from *alpha-beta pruning*.

The function `evaluate` returns the *value* that the given `State` has if both players play their optimal game. 
- If the maximizing player can force a win, the return value is `1`.
- If the maximizing player can at best force a draw, the return value is `0`.
- If the maximizing player might loose even when playing optimal, the return value is `-1`.

Otherwise, the value is calculated according to a *heuristic*.

For reasons of efficiency, the function `evaluate` is *memoized* using the global variable `gCache`.   This work in the same way as described in the notebook `Alpha-Beta-Pruning-Memoization.ipynb`.

In [None]:
def evaluate(State, limit, f, alpha=-1, beta=1):
    global gCache
    if (State, limit) in gCache:
        flag, v = gCache[(State, limit)]    
        if flag == '=':
            return v
        if flag == '≤':
            if v <= alpha:
                return v
            elif alpha < v < beta:
                w = f(State, limit, alpha, v)
                store_cache(State, limit, alpha, v, w)
                return w
            else: # beta <= v:
                w = f(State, limit, alpha, beta)
                store_cache(State, limit, alpha, beta, w)
                return w
        if flag == '≥':
            if beta <= v:
                return v
            elif alpha < v < beta:
                w = f(State, limit, v, beta)
                store_cache(State, limit, v, beta, w)
                return w
            else: # v <= alpha
                w = f(State, limit, alpha, beta)
                store_cache(State, limit, alpha, beta, w)
                return w
    else:
        v = f(State, limit, alpha, beta)
        store_cache(State, limit, alpha, beta, v)
        return v

The function `store_cache` is called with five arguments:
* `State` is a state of the game,
* `limit` is the search depth,
* `alpha` is a number,
* `beta`  is a number, and
* `value` is a number such that:
  $$\texttt{evaluate(State, limit, f, alpha, beta)} = \texttt{value}$$
  
The function stores the `value` in the dictionary `Cache` under the key `State`.
It also stores an indicator that is either `'≤'`, `'='`, or `'≥'`.  The value that is stored 
satisfies the following conditions:
* If `Cache[State, limit] = ('≤', value)`, then `evaluate(State, limit) ≤ value`. 
* If `Cache[State, limit] = ('=', value)`, then `evaluate(State, limit) = value`. 
* If `Cache[State, limit] = ('≥', value)`, then `evaluate(State, limit) ≥ value`. 

In [None]:
def store_cache(State, limit, alpha, beta, value):
    global gCache
    if value <= alpha:
        gCache[(State, limit)] = ('≤', value)
    elif value < beta:
        gCache[(State, limit)] = ('=', value)
    else: # value >= beta
        gCache[(State, limit)] = ('≥', value)

The function `value_cache` receives a `State` and a `limit` as parameters.  If a *value* for `State` has been computed to the given evaluation depth, this value is returned. Otherwise, `0` is returned.

In [None]:
def value_cache(State, limit):
    flag, value = gCache.get((State, limit), ('=', 0))
    return value

The module [`heapq`](https://docs.python.org/3/library/heapq.html) implements [heaps](https://en.wikipedia.org/wiki/Heap_(data_structure)).  The implementation of `maxValue` and `minValue` use heaps as *priority queues* in order to sort the moves.  This improves the performance of *alpha-beta pruning*.

In [None]:
import heapq

The function `maxValue` satisfies the following specification:
- $\alpha \leq \texttt{value}(s) \leq \beta \;\rightarrow\;\texttt{maxValue}(s, l, \alpha, \beta) = \texttt{value}(s)$
- $\texttt{value}(s) < \alpha \;\rightarrow\; \texttt{maxValue}(s, l, \alpha, \beta) \leq \alpha$
- $\beta < \texttt{value}(s) \;\rightarrow\; \beta \leq \texttt{maxValue}(s, \alpha, \beta)$

It assumes that `gPlayers[0]` is the maximizing player.  This function implements *alpha-beta pruning*.  After searching up to a depth of `limit`, the value is approximated using the function `heuristic`. 

In [None]:
def maxValue(State, limit, alpha=-1, beta=1):
    if finished(State):
        return utility(State)
    if limit == 0:
        return heuristic(State)
    value      = alpha
    NextStates = next_states(State, gPlayers[0])
    if len(NextStates) == 1:  # singular value extension
        return evaluate(NextStates[0], limit, minValue, value, beta)
    Moves      = []  # empty priority queue
    for ns in NextStates:
        # heaps are sorted ascendingly, hence the minus
        heapq.heappush(Moves, (-value_cache(ns, limit-1), ns))
    while Moves != []:
        _, ns = heapq.heappop(Moves)
        value = max(value, evaluate(ns, limit-1, minValue, value, beta))
        if value >= beta:
            return value
    return value

The function `minValue` satisfies the following specification:
- $\alpha \leq \texttt{value}(s) \leq \beta \;\rightarrow\;\texttt{minValue}(s, l, \alpha, \beta) = \texttt{value}(s)$
- $\texttt{value}(s) < \alpha \;\rightarrow\; \texttt{minValue}(s, l, \alpha, \beta) \leq \alpha$
- $\beta < \texttt{value}(s) \;\rightarrow\; \beta \leq \texttt{minValue}(s, \alpha, \beta)$

It assumes that `gPlayers[1]` is the minimizing player.  This function implements *alpha-beta pruning*.  After searching up to a depth of `limit`, the value is approximated using the function `heuristic`. 

In [None]:
def minValue(State, limit, alpha=-1, beta=1):
    if finished(State):
        return utility(State)
    if limit == 0:
        return heuristic(State)
    value      = beta
    NextStates = next_states(State, gPlayers[1])
    if len(NextStates) == 1:
        return evaluate(NextStates[0], limit, maxValue, alpha, value)
    Moves      = []  # empty priority queue
    for ns in NextStates:
        heapq.heappush(Moves, (value_cache(ns, limit-1), ns))
    while Moves != []:
        _, ns = heapq.heappop(Moves)
        value = min(value, evaluate(ns, limit-1, maxValue, alpha, value))
        if value <= alpha:
            return value
    return value

In [None]:
%%capture
%run Connect-Four.ipynb

In the state shown below, `Red` can force a win by pushing his stones in the 6th row.  Due to this fact, *alpha-beta pruning is able to prune large parts of the search path and hence the evaluation is fast.

In [None]:
canvas = create_canvas()
draw(gTestState, canvas, '?')

In [None]:
gCache = {}

In [None]:
%%time
value, limit = pd_evaluate(gTestState, 5, maxValue)
value

In [None]:
len(gCache)

In [None]:
gCache = {}

In [None]:
%%time
value, limit = pd_evaluate(gStart, 5, maxValue)
value

In [None]:
len(gCache)

In order to evaluate the effect of *progressive deepening*, we reset the cache and can then evaluate the test state without progressive deepening.

In [None]:
gCache = {}

In [None]:
%%time
value = evaluate(gTestState, 8, maxValue)
value

In [None]:
len(gCache)

## Playing the Game

The function `best_move` takes two arguments:
- `State` is the current state of the game,
- `limit` is the depth limit of the recursion.

The function `best_move` returns a pair of the form $(v, s)$ where $s$ is a state and $v$ is the value of this state.  The state $s$ is a state that is reached from `State` if the player makes one of her optimal moves.  In order to have some variation in the game, the function randomly chooses any of the optimal moves.

In [None]:
def best_move(State, time_limit):
    NextStates = next_states(State, gPlayers[0])
    bestValue, limit = pd_evaluate(State, time_limit, maxValue)
    BestMoves  = [s for s in NextStates 
                    if evaluate(s, limit-1, minValue) == bestValue
                 ]
    BestState  = random.choice(BestMoves)
    return bestValue, BestState

The next line is needed because we need the function `IPython.display.clear_output` to clear the output in a cell.

In [None]:
import IPython.display 

The function `play_game` plays on the given `canvas`.  The game played is specified indirectly by specifying the following:
- `Start` is a global variable defining the start state of the game.
- `next_states` is a function such that $\texttt{next_states}(s, p)$ computes the set of all possible states that can be reached from state $s$ if player $p$ is next to move.
- `finished` is a function such that $\texttt{finished}(s)$ is true for a state $s$ if the game is over in state $s$.
- `utility` is a function such that $\texttt{utility}(s, p)$ returns either `-1`, `0`, or `1` in the *terminal state* $s$.  We have that
  - $\texttt{utility}(s, p)= -1$ iff the game is lost for player $p$ in state $s$, 
  - $\texttt{utility}(s, p)=  0$ iff the game is drawn, and 
  - $\texttt{utility}(s, p)=  1$ iff the game is won for player $p$ in state $s$.

In [None]:
def play_game(canvas, time_limit):
    global gCache, gMoveCounter
    State   = gStart
    while (True):
        gCache = {}
        firstPlayer = gPlayers[0]
        val, State  = best_move(State, time_limit)
        draw(State, canvas, f'value = {round(val, 2)}.')
        if finished(State):
            final_msg(State)
            break
        IPython.display.clear_output(wait=True)
        State = get_move(State)
        draw(State, canvas, '')
        if finished(State):
            IPython.display.clear_output(wait=True)
            final_msg(State)
            break

In [None]:
canvas = create_canvas()
draw(gStart, canvas, f'Current value of game for "X": {round(value, 2)}')

In [None]:
play_game(canvas, 2)

In [None]:
len(gCache)