
# Mancala Search Assignment (No Solutions) — Minimax, Alpha–Beta, Ordering, Simulation, Custom Heuristic

This is the **assignment-only** version. Cells contain **stubs** and **TODOs** for you to complete.

## Tasks
1. Implement `score_for(state, root_idx)`.
2. Implement **Minimax** (no pruning) with correct **extra-turn depth** handling.
3. Implement **Alpha–Beta** with the same extra-turn rule.
4. Implement **one-ply move ordering** (`order_moves_by_one_ply_score`) and compare **node visits**.
5. Run **100-game simulations** (Minimax vs Random; Alpha–Beta vs Random with/without ordering) and report:
   - win_rate, avg time per move, avg nodes per move, avg moves per game.
6. Implement your **own heuristic** and a **Minimax variant** that uses it; test vs Random.


## 0) Setup & Imports

In [None]:
# Add your repo's src/ to the Python path and import the engine.
import sys, pathlib
REPO_ROOT = pathlib.Path().resolve()
SRC = REPO_ROOT / "src"
if str(SRC) not in sys.path:
    sys.path.insert(0, str(SRC))

# Engine API expected:
#   new_game(), legal_actions(state), step(state, action), evaluate(state)
from mancala_ai.engine.core import new_game, legal_actions, step, evaluate

print("Engine imports OK from:", SRC)

## 1) Utilities and Stubs

In [None]:
import copy, math, time, random
from typing import Dict, List, Tuple, Optional, Callable

class Stats:
    """Tracks node expansions during search."""
    def __init__(self):
        self.visits = 0

def pretty(state: Dict):
    """Optional: quick text view of the board."""
    p0, p1 = state["pits"][0], state["pits"][1]
    st0, st1 = state["stores"]
    print("+----- Mancala -----+")
    print("P1 store:", st1)
    print("P1 pits: ", list(reversed(p1)))
    print("P0 pits: ", p0)
    print("P0 store:", st0)
    print("Turn: P", state["current_player"], sep="")
    print("+-------------------+")

### TODO A — `score_for(state, root_idx)`

In [None]:
def score_for(state: Dict, root_idx: int) -> float:
    """Return a score for 'state' from the perspective of 'root_idx' (0 or 1).
    HINT: Temporarily set state['current_player'] = root_idx before calling evaluate(state).
    """
    # TODO: implement
    raise NotImplementedError("Implement score_for(state, root_idx)")

## 2) Minimax (no pruning) — **Implement me**

In [None]:
def minimax(state: Dict, depth: int, root_idx: int, stats: Stats) -> Tuple[float, Optional[int]]:
    """Plain Minimax on the state-based Mancala engine.

    Requirements:
    - Terminal/leaf: depth==0 or one side's pits sum to 0 → return score_for(...), None
    - Generate actions: legal_actions(state)
    - Extra-turn rule: after step(state, a) → next_state,
        if next_state['current_player'] == state['current_player'],
        DO NOT decrement depth for that recursion (same ply).
    - If it's root_idx's turn → maximize; else minimize.
    - Increment stats.visits at each call.
    """
    # TODO: implement
    raise NotImplementedError("Implement minimax(...)")

In [None]:
def choose_move_minimax(state: Dict, depth: int = 5) -> Tuple[int, Stats]:
    """Return (move, stats) using your minimax implementation.
    Fallback: if no move (None), choose first legal action or 0.
    """
    # TODO: call minimax(...) and handle fallbacks
    raise NotImplementedError("Implement choose_move_minimax(...)")

## 3) Alpha–Beta (with optional ordering) — **Implement me**

In [None]:
def alphabeta(state: Dict, depth: int, alpha: float, beta: float, root_idx: int,
              stats: Stats, ordering_fn: Optional[Callable[[Dict, int, List[int]], List[int]]] = None
             ) -> Tuple[float, Optional[int]]:
    """Alpha–Beta pruning with the same extra-turn rule as Minimax.

    Requirements:
    - Apply ordering function if provided:
        acts = ordering_fn(state, root_idx, acts) if ordering_fn else acts
    - Use alpha/beta cuts where appropriate.
    - Increment stats.visits at each call.
    """
    # TODO: implement
    raise NotImplementedError("Implement alphabeta(...)")

In [None]:
def choose_move_alphabeta(state: Dict, depth: int = 7,
                          ordering_fn: Optional[Callable[[Dict, int, List[int]], List[int]]] = None
                         ) -> Tuple[int, Stats]:
    """Return (move, stats) using your alphabeta implementation."""
    # TODO: implement wrapper similar to choose_move_minimax
    raise NotImplementedError("Implement choose_move_alphabeta(...)")

## 4) Move Ordering — **Implement me**

In [None]:
def order_moves_by_one_ply_score(state: Dict, root_idx: int, acts: List[int]) -> List[int]:
    """Return actions sorted DESC by one-ply child score (use score_for on step(state, a))."""
    # TODO: implement
    raise NotImplementedError("Implement order_moves_by_one_ply_score(...)")

## 5) Compare visits (with vs without ordering) — **Run after implementing**

In [None]:
# A reproducible mid-game challenge state
CHALLENGE_STATE = {
    "pits":   [[0, 3, 0, 5, 1, 7], [4, 0, 6, 0, 2, 5]],
    "stores": [5, 10],
    "current_player": 0,
}

def compare_visits_on_state(state: Dict, depth: int = 8):
    s1 = copy.deepcopy(state); s2 = copy.deepcopy(state)
    mv_no, st_no   = choose_move_alphabeta(s1, depth=depth, ordering_fn=None)
    mv_ord, st_ord = choose_move_alphabeta(s2, depth=depth, ordering_fn=order_moves_by_one_ply_score)
    print(f"Alpha–Beta d={depth} WITHOUT ordering: move={mv_no}, nodes={st_no.visits}")
    print(f"Alpha–Beta d={depth} WITH    ordering: move={mv_ord}, nodes={st_ord.visits}")

# After you implement Alpha–Beta and ordering, uncomment to run:
# compare_visits_on_state(CHALLENGE_STATE, depth=8)

## 6) Simulation Harness (provided)

In [None]:
def random_agent():
    def _fn(state: Dict):
        acts = legal_actions(state)
        mv = random.choice(acts) if acts else 0
        st = Stats()
        return mv, st
    return _fn

def minimax_agent(depth: int):
    def _fn(state: Dict):
        return choose_move_minimax(state, depth=depth)
    return _fn

def alphabeta_agent(depth: int, ordering: bool = False):
    def _fn(state: Dict):
        order = order_moves_by_one_ply_score if ordering else None
        return choose_move_alphabeta(state, depth=depth, ordering_fn=order)
    return _fn

def play_game(agent0, agent1, max_plies=500):
    s = new_game(); moves = 0
    times_0, times_1, nodes_0, nodes_1 = [], [], [], []
    while sum(s["pits"][0]) > 0 and sum(s["pits"][1]) > 0 and moves < max_plies:
        turn = s["current_player"]
        agent = agent0 if turn == 0 else agent1
        t0 = time.perf_counter()
        mv, st = agent(copy.deepcopy(s))
        dt = time.perf_counter() - t0
        s, _, _ = step(s, mv)
        moves += 1
        if turn == 0:
            times_0.append(dt); nodes_0.append(st.visits)
        else:
            times_1.append(dt); nodes_1.append(st.visits)
    st0, st1 = s["stores"]
    winner = 0 if st0 > st1 else 1 if st1 > st0 else -1
    return {"winner": winner, "moves": moves,
            "times_0": times_0, "times_1": times_1,
            "nodes_0": nodes_0, "nodes_1": nodes_1}

def run_series(agentA, agentB, n_games=100, seed=42):
    random.seed(seed)
    wins_A = wins_B = draws = 0
    moves_list = []; tA=[]; tB=[]; nA=[]; nB=[]
    for g in range(n_games):
        if g % 2 == 0:
            res = play_game(agentA, agentB)
            if   res["winner"] == 0: wins_A += 1
            elif res["winner"] == 1: wins_B += 1
            else: draws += 1
            tA += res["times_0"]; tB += res["times_1"]
            nA += res["nodes_0"]; nB += res["nodes_1"]
        else:
            res = play_game(agentB, agentA)
            if   res["winner"] == 0: wins_B += 1
            elif res["winner"] == 1: wins_A += 1
            else: draws += 1
            tA += res["times_1"]; tB += res["times_0"]
            nA += res["nodes_1"]; nB += res["nodes_0"]
        moves_list.append(res["moves"])
    def _avg(xs): return (sum(xs)/len(xs)) if xs else 0.0
    return {
        "games": n_games,
        "wins_A": wins_A, "wins_B": wins_B, "draws": draws,
        "win_rate_A": wins_A/n_games, "win_rate_B": wins_B/n_games,
        "avg_moves_per_game": _avg(moves_list),
        "avg_time_per_move_A": _avg(tA), "avg_time_per_move_B": _avg(tB),
        "avg_nodes_per_move_A": _avg(nA), "avg_nodes_per_move_B": _avg(nB),
    }

### 6.1) Run: Minimax vs Alpha-Beta (after you implement Minimax) and comment

In [None]:
# Uncomment after implementing choose_move_minimax:
# summary = run_series(minimax_agent(depth=5), alphabeta_agent(depth=5, ordering=False), n_games=100, seed=123)
# summary

## 7) Your Heuristic — **Implement me**

In [None]:
def heuristic_score_for(state: Dict, root_idx: int) -> float:
    """Return a custom heuristic score from root_idx's perspective.
    Replace this with your design (store diff, material, capture threats, etc.).
    """
    # TODO: implement your heuristic
    raise NotImplementedError("Implement heuristic_score_for(...)")

In [None]:
def minimax_with_advanced_heuristic(state: Dict, depth: int, root_idx: int, stats: Stats):
    """Minimax that uses heuristic_score_for at leaves/terminal instead of evaluate()."""
    # TODO: implement (mirror your minimax, but call heuristic_score_for at leaves)
    raise NotImplementedError("Implement minimax_with_heuristic(...)")

In [None]:
def choose_move_minimax_heuristic(state: Dict, depth: int = 5):
    # TODO: wrapper similar to choose_move_minimax but for your heuristic version
    raise NotImplementedError("Implement choose_move_minimax_heuristic(...)")

### 7.1) Run: Your Heuristic Minimax vs Random

In [None]:
# Uncomment after implementing your heuristic and wrapper:
# summary = run_series(lambda s: choose_move_minimax_heuristic(s, depth=5),
#                      random_agent(), n_games=100, seed=123)
# summary


---

## What to submit
1. **Code** for all TODOs.
2. A brief **report** including:
   - Visit comparison (Alpha–Beta w/ and w/o ordering) on `CHALLENGE_STATE`.
   - 100-game metrics for each matchup.
   - Description of your heuristic and its 100-game results vs Random.
