
# Minimax on Mancala (State-Based)

**Goal:** implement a correct, state-based **Minimax** for Mancala using the engine functions:
- `legal_actions(state)`
- `step(state, action)`
- `evaluate(state)` (scores from the current player's perspective)

We'll also handle Mancala’s **extra turn** rule properly and measure node counts.



## 0) Setup

If you're running this from the repo root, the following cell makes sure Python can import from `src/`.


In [1]:
import os, sys, pathlib
REPO_ROOT = pathlib.Path().resolve().parents[1]  # .../tutorials/minimax_alpha/ -> repo root
SRC = REPO_ROOT / "src"
if str(SRC) not in sys.path:
    sys.path.insert(0, str(SRC))

print(REPO_ROOT, SRC)

from mancala_ai.engine.core import new_game, legal_actions, step, evaluate
print("Imports OK. Repo root:", REPO_ROOT)


/mnt/c/Users/ndqua/OneDrive - Trent University/AMOD 5310H-AI/AI-algorithm-for-Mancala-game /mnt/c/Users/ndqua/OneDrive - Trent University/AMOD 5310H-AI/AI-algorithm-for-Mancala-game/src
Imports OK. Repo root: /mnt/c/Users/ndqua/OneDrive - Trent University/AMOD 5310H-AI/AI-algorithm-for-Mancala-game



## 1) Quick reminder: state shape

```python
state = {
  "pits": [[int]*6, [int]*6],   # row 0 = player_1, row 1 = player_2
  "stores": [int, int],         # stores[0] = player_1 store, stores[1] = player_2 store
  "current_player": 0 | 1       # 0 = player_1, 1 = player_2
}
```



## 2) Helpers: pretty print + fixed-perspective scoring

We always score from the **root player's** perspective by temporarily setting `state["current_player"]` before calling `evaluate(state)`.


In [2]:

import copy

def score_for(state, root_idx: int) -> float:
    s = copy.deepcopy(state)
    s["current_player"] = root_idx
    return float(evaluate(s))

def print_state(s):
    p0, p1 = s["pits"][0], s["pits"][1]
    st0, st1 = s["stores"]
    turn = s["current_player"]
    # Top row is player 1 (reverse display for board-like view)
    print("+----- Mancala -----+")
    print("P1 store:", st1)
    print("P1 pits: ", list(reversed(p1)))
    print("P0 pits: ", p0)
    print("P0 store:", st0)
    print("Turn: P", turn, sep="")
    print("+-------------------+")



## 3) Plain Minimax (no alpha-beta)

**Key point:** If a move grants an **extra turn**, the ply did **not** change. So we **do not decrease depth** in that case.


In [3]:
import math
from typing import Dict, Optional, Tuple

class Stats:
    """Tiny container to track search statistics.

    Attributes
    ----------
    visits : int
        Number of nodes (states) expanded by the search.
    """
    def __init__(self):
        self.visits = 0


def minimax(state: Dict, depth: int, maximizing_player: int, stats: Stats) -> Tuple[float, Optional[int]]:
    """Plain Minimax (no alpha–beta) for Mancala using a state-based engine.

    Parameters
    ----------
    state : dict
        Current Mancala state:
        {
          "pits": [[int]*6, [int]*6],
          "stores": [int, int],
          "current_player": 0|1
        }
    depth : int
        Remaining search depth in plies (half-moves).
        NOTE: For Mancala, a move can grant an extra turn; when that happens
        we do NOT decrement depth (still the same ply).
    maximizing_player : int
        The index of the player we are maximizing for **at the root** (0 or 1).
        We score all leaf nodes from this fixed perspective.
    stats : Stats
        Mutable counter used to measure how many nodes we expand.

    Returns
    -------
    (value, best_move) : (float, Optional[int])
        - value: heuristic score from the root player's perspective.
        - best_move: pit index (0..5) for the CURRENT state's player,
          or None at leaves.

    Notes
    -----
    • This version does **not** perform alpha–beta pruning.
    • The evaluation function `score_for(...)` is assumed available in scope
      and must evaluate 'state' from the perspective of 'maximizing_player'.
    • The legal move generator `legal_actions(state)` and the state transition
      `step(state, action)` are also assumed available.
    """
    # Count this node expansion
    stats.visits += 1

    # --- Terminal / leaf check -----------------------------------------------
    # Stop if depth is exhausted OR either side has no stones in pits (terminal).
    if depth == 0 or sum(state["pits"][0]) == 0 or sum(state["pits"][1]) == 0:
        # Evaluate the position from the FIXED root player's perspective
        return score_for(state, maximizing_player), None

    # Generate legal actions for the current player (pit indices 0..5 that are non-empty)
    actions = legal_actions(state)
    if not actions:
        # No legal actions: treat as leaf and evaluate
        return score_for(state, maximizing_player), None

    # Decide whether this layer is maximizing or minimizing:
    # If it's the root player's turn, we maximize; otherwise minimize.
    is_max = (state["current_player"] == maximizing_player)

    # Initialize best move with something valid (used as fallback/tie-breaker).
    best_move = actions[0]

    if is_max:
        # ----------------------------- MAX layer ------------------------------
        best = -math.inf
        for a in actions:
            # Apply action -> next state, with (reward, done) ignored in tree search 
            # because reward here is simply difference in stores
            ns, _, _ = step(state, a)

            # Mancala "extra turn" rule:
            # If current_player stays the same after the move, it's still the same ply.
            # Do NOT decrement depth in that case.
            reduce = 0 if ns["current_player"] == state["current_player"] else 1

            v, _ = minimax(ns, depth - reduce, maximizing_player, stats)

            # Pick the move that maximizes the root player's score
            if v > best:
                best, best_move = v, a

        return best, best_move

    else:
        # ----------------------------- MIN layer ------------------------------
        best = math.inf
        for a in actions:
            ns, _, _ = step(state, a)
            reduce = 0 if ns["current_player"] == state["current_player"] else 1

            v, _ = minimax(ns, depth - reduce, maximizing_player, stats)

            # Pick the move that minimizes the root player's score
            if v < best:
                best, best_move = v, a

        return best, best_move


def choose_move_minimax(state: Dict, depth: int = 5) -> Tuple[int, Stats]:
    """Convenience wrapper: search and return a move for the current player.

    Parameters
    ----------
    state : dict
        Current Mancala state (see `minimax` docstring).
    depth : int, default=5
        Search depth in plies.

    Returns
    -------
    (move, stats) : (int, Stats)
        - move: selected pit index (0..5). If no legal moves exist, returns 0.
        - stats: node expansion statistics for this search.

    Notes
    -----
    • If something goes wrong and `minimax` returns None for the move (e.g.,
      at a leaf), we fall back to the first legal action to keep the game flowing.
    """
    stats = Stats()
    _, mv = minimax(state, depth, state["current_player"], stats)

    if mv is None:
        # Fallback safety: choose the first legal action if available
        acts = legal_actions(state)
        mv = int(acts[0]) if acts else 0

    return int(mv), stats



## 4) Try it


In [4]:

s = new_game()
print_state(s)
mv, st = choose_move_minimax(s, depth=5)
print("Chosen move:", mv, "| nodes visited:", st.visits)
ns, _, _ = step(s, mv)
print_state(ns)


+----- Mancala -----+
P1 store: 0
P1 pits:  [4, 4, 4, 4, 4, 4]
P0 pits:  [4, 4, 4, 4, 4, 4]
P0 store: 0
Turn: P0
+-------------------+
Chosen move: 4 | nodes visited: 160971
+----- Mancala -----+
P1 store: 0
P1 pits:  [4, 4, 4, 4, 5, 5]
P0 pits:  [4, 4, 4, 4, 0, 5]
P0 store: 1
Turn: P1
+-------------------+



## 5) Exercise — Full-game simulation

Play Minimax vs Random to see it in action.


In [10]:

import random

def random_move(state: Dict) -> int:
    acts = legal_actions(state)
    return random.choice(acts) if acts else 0

def play_game_minimax_vs_random(depth_minimax=5, verbose=False):
    s = new_game()
    while sum(s["pits"][0]) > 0 and sum(s["pits"][1]) > 0:
        if s["current_player"] == 0:
            mv, _ = choose_move_minimax(s, depth=depth_minimax)
            if verbose:
                print_state(s)
                print(f"Player {1 if s['current_player']==1 else 0} moved {mv}")
        else:
            mv = random_move(s)
            if verbose:
                print_state(s)
                print(f"Player {1 if s['current_player']==1 else 0} moved {mv}")
        s, _, _ = step(s, mv)
    return s

final_state = play_game_minimax_vs_random(depth_minimax=5, verbose=True)
print_state(final_state)


+----- Mancala -----+
P1 store: 0
P1 pits:  [4, 4, 4, 4, 4, 4]
P0 pits:  [4, 4, 4, 4, 4, 4]
P0 store: 0
Turn: P0
+-------------------+
Player 0 moved 4
+----- Mancala -----+
P1 store: 0
P1 pits:  [4, 4, 4, 4, 5, 5]
P0 pits:  [4, 4, 4, 4, 0, 5]
P0 store: 1
Turn: P1
+-------------------+
Player 1 moved 0
+----- Mancala -----+
P1 store: 0
P1 pits:  [5, 5, 5, 5, 6, 0]
P0 pits:  [4, 4, 4, 4, 0, 5]
P0 store: 1
Turn: P0
+-------------------+
Player 0 moved 0
+----- Mancala -----+
P1 store: 0
P1 pits:  [5, 5, 5, 5, 0, 0]
P0 pits:  [0, 5, 5, 5, 0, 5]
P0 store: 8
Turn: P1
+-------------------+
Player 1 moved 5
+----- Mancala -----+
P1 store: 1
P1 pits:  [0, 5, 5, 5, 0, 0]
P0 pits:  [1, 6, 6, 6, 0, 5]
P0 store: 8
Turn: P0
+-------------------+
Player 0 moved 0
+----- Mancala -----+
P1 store: 1
P1 pits:  [0, 5, 5, 5, 0, 0]
P0 pits:  [0, 7, 6, 6, 0, 5]
P0 store: 8
Turn: P1
+-------------------+
Player 1 moved 4
+----- Mancala -----+
P1 store: 2
P1 pits:  [1, 0, 5, 5, 0, 0]
P0 pits:  [1, 8, 7, 6, 0,