-
So far, we have only fought the instance
-
Consider a two-player game with:
-
Perfect knowledge (for now)
-
No "luck" (for now)
-
Turn-based
-
Always ends
-
Well-defined final score + "zero sum"
-
-
Examples: Checkers, Chess, Connect-4, Tic-Tac-Toe
-
We can use state-space search to find "good" or maybe even "best" plays in that game at a given state
-
State:
-
"Board" position (where are the checkers)
-
Side on move (red or black)
-
Maybe history of game (ugh, not checkers)
-
-
State space is directed, neighbors are states after legal moves
-
State space may be cyclic (which is where history rules sometimes come from)
-
Minimax Theorem (Von Neumann, Nash): Any state in the game has a well-defined value assuming "best play" by both sides
-
Basic Negamax idea:
-
Choose a next state with minimum value for opponent (moves are boring)
-
Determine this by valuing each opponent state from opponent's point of view (recursively)
-
Zero-sum, so your value = negation of opponent's value
-
Thus, pick state that maximizes negation of opponent's value
-
-
If on some search you find a move (successor state) that is a win for the side on move — no point in searching further
-
This is true whether the win is immediate or by forced play
-
Can remove an "exponential" number of states from the search
-
Value of a game is value of starting state to side on move
-
Game's value is value of starting state
-
Game is "solved" if the value of every state is known (recorded or trivially calculated)
-
Game is "weakly solved" if the value of every state that could occur in perfect play is known
-
-
Value is known: Hex
-
Weakly solved: Checkers
-
Solved: Tic-Tac-Toe, Connect-4, Awari, many others
-
Game I found in some old Martin Gardner book
-
Rules:
-
Start with tiles labeled 1-9 in the center
-
Players take turns choosing a tile from the pile
-
Any time a player holds 3 tiles whose total is 15, they win
-
-
Perfect information, alternating, terminating, zero-sum: check
-
(There's a fantastic trick to playing this game)
-
Build a negamax search http://github.com/pdx-cs-ai/nkt
-
Can cheaply calculate value of any position in practice
-
Or can calculate the value of all possible positions and record them
- (How many possible positions are there?)
-
We will generalize to "NKT": N tiles, exactly K tiles in a win, winning total is T
-
Next natural size is N=15, K=4, T=28, so we'll call this game "28"
-
Calculating the value of the starting position is going to take a long time
-
We want an approximate value for a position, and will tolerate an approximately best move
-
For chess, we use "piece value" as a heuristic approximation. For "28", maybe number of "almost wins"?
-
Heuristic should look at both sides, so subtract on-move value from opponent
-
But just doing this right away might not be the best plan. Let's look ahead some fixed depth and use negamax to combine heuristic values
-
28 or fewer possible almost-wins, so we'll set the value of an actual win to 29
-
Generalization of Win Pruning. Hard to explain:
-
Imagine you find a forced score of 3 for some state s in the middle of the search
-
Imagine further that in some earlier state t at the same level you had found a score of 2
-
Then your opponent (parent) would never let you get to s: they would avoid it
-
So there's no point in searching s any further
-
-
"Alpha-Beta window" keeps track of best score for side not on move (alpha) and side on move (beta)
-
Prune when computed score exceeds beta
-
Game is zero-sum. On recursion, flip and negate the window: (alpha, beta) → (-beta, -alpha)
-
Window starts with (-max, max). When searching, keep bumping alpha up as better moves are found
-
Depends on move ordering: a good move for the side on move will "close the window" a lot, so want to try good moves first
-
With optimal move ordering, will effectively double the depth of the search by pruning; runtime of depth-4 without = runtime of depth-8 with
-
Note that alpha-beta never changes the answer: it only prunes states that provably don't matter