# Overview 

Monte Carlo Tree Search (MCTS) is a heuristic search algorithm used for decision-making in various domains, most notably in games such as Go, Chess, and general game AI. MCTS combines the concepts of tree search and Monte Carlo simulations to make optimal decisions by approximating the value of actions through random sampling. Here's how MCTS works:
1. Basic Concept
MCTS is particularly well-suited for environments where the search space is too large to exhaustively explore, as it iteratively explores possible future states by simulating random play from different positions and gradually building a search tree.

The key idea is to build a search tree incrementally, using random simulations to estimate the value of each action and state. The search tree is expanded as more simulations are performed, and over time, the algorithm converges towards the best decision.

2. Four Key Steps
MCTS consists of four main steps that are repeated until a decision needs to be made or a computational budget is exhausted (e.g., time or iteration limit):

`Selection:` Starting from the root node (current state), the algorithm selects the child node to explore based on a selection strategy. A commonly used strategy is the Upper Confidence Bound applied to Trees (UCT) formula, which balances exploration (trying less visited nodes) and exploitation (choosing nodes with high win rates).

`Expansion:` If the selected node is not a terminal state (i.e., the end of the game or decision-making process), new child nodes are added to the tree, representing possible actions that have not been explored yet.

`Simulation:` From the newly expanded node, a simulation (or rollout) is performed. This involves playing the game or running the process to the end, typically using random moves or a simple policy. The outcome of the simulation is used to estimate the value of the node.

`Backpropagation:` The result of the simulation is then propagated back up the tree, updating the value estimates of all the nodes along the path from the expanded node to the root. This information is used to improve future decisions.

3. Key Formula: UCB1 (Upper Confidence Bound)
The UCB1 formula is commonly used in the selection phase to balance exploration and exploitation. It is defined as:

$ [
\text{UCB1} = \frac{w_i}{n_i} + C \cdot \sqrt{\frac{\ln(N)}{n_i}}
] $

Where:
- \(w_i\) is the total reward (or wins) for node \(i\).
- \(n_i\) is the number of times node \(i\) has been visited.
- \(N\) is the total number of simulations (visits) for the parent node.
- \(C\) is a constant that controls the exploration-exploitation trade-off

This formula ensures that the algorithm not only focuses on the most promising moves (exploitation) but also occasionally tries out less explored moves to ensure no potentially good moves are missed (exploration)

# Setup and Import Statements


In [None]:
# Suppress warnings for cleaner output
import warnings
warnings.filterwarnings('ignore')

# System and OS imports for environment configuration
import os
import sys
from typing import List, Tuple


# Numerical and data handling libraries
import numpy as np
import polars as pl  # polars is preferred for fast DataFrame operations
import pandas as pd

# Machine learning and model evaluation libraries
from sklearn.model_selection import KFold  # Cross-validation splitting strategy
import lightgbm as lgb  # LightGBM for gradient boosting
from lightgbm import early_stopping, log_evaluation  # LightGBM callbacks

# Kaggle-specific module for MCTS inference
import kaggle_evaluation.mcts_inference_server

# Defining Column Categories

In [None]:
# Columns that are irrelevant for model training or inference
irrelevant_cols = [
    'Id', 'Properties', 'Format', 'Time', 'Discrete', 'Realtime', 'Turns', 'Alternating', 
    'Simultaneous', 'HiddenInformation', 'Match', 'AsymmetricRules', 'AsymmetricPlayRules', 
    'AsymmetricEndRules', 'AsymmetricSetup', 'Players', 'NumPlayers', 'Simulation', 'Solitaire', 
    'TwoPlayer', 'Multiplayer', 'Coalition', 'Puzzle', 'DeductionPuzzle', 'PlanningPuzzle', 
    'Equipment', 'Container', 'Board', 'PrismShape', 'ParallelogramShape', 'RectanglePyramidalShape', 
    'TargetShape', 'BrickTiling', 'CelticTiling', 'QuadHexTiling', 'Hints', 'PlayableSites', 
    'Component', 'DiceD3', 'BiasedDice', 'Card', 'Domino', 'Rules', 'SituationalTurnKo', 
    'SituationalSuperko', 'InitialAmount', 'InitialPot', 'Play', 'BetDecision', 'BetDecisionFrequency', 
    'VoteDecisionFrequency', 'ChooseTrumpSuitDecision', 'ChooseTrumpSuitDecisionFrequency', 
    'LeapDecisionToFriend', 'LeapDecisionToFriendFrequency', 'HopDecisionEnemyToFriend', 
    'HopDecisionEnemyToFriendFrequency', 'HopDecisionFriendToFriend', 'FromToDecisionWithinBoard', 
    'FromToDecisionBetweenContainers', 'BetEffect', 'BetEffectFrequency', 'VoteEffectFrequency', 
    'SwapPlayersEffectFrequency', 'TakeControl', 'TakeControlFrequency', 'PassEffectFrequency', 
    'SetCost', 'SetCostFrequency', 'SetPhase', 'SetPhaseFrequency', 'SetTrumpSuit', 
    'SetTrumpSuitFrequency', 'StepEffectFrequency', 'SlideEffectFrequency', 'LeapEffectFrequency', 
    'HopEffectFrequency', 'FromToEffectFrequency', 'SwapPiecesEffect', 'SwapPiecesEffectFrequency', 
    'ShootEffect', 'ShootEffectFrequency', 'MaxCapture', 'OffDiagonalDirection', 'Information', 
    'HidePieceType', 'HidePieceOwner', 'HidePieceCount', 'HidePieceRotation', 'HidePieceValue', 
    'HidePieceState', 'InvisiblePiece', 'End', 'LineDrawFrequency', 'ConnectionDraw', 
    'ConnectionDrawFrequency', 'GroupLossFrequency', 'GroupDrawFrequency', 'LoopLossFrequency', 
    'LoopDraw', 'LoopDrawFrequency', 'PatternLoss', 'PatternLossFrequency', 'PatternDraw', 
    'PatternDrawFrequency', 'PathExtentEndFrequency', 'PathExtentWinFrequency', 
    'PathExtentLossFrequency', 'PathExtentDraw', 'PathExtentDrawFrequency', 'TerritoryLoss', 
    'TerritoryLossFrequency', 'TerritoryDraw', 'TerritoryDrawFrequency', 'CheckmateLoss', 
    'CheckmateLossFrequency', 'CheckmateDraw', 'CheckmateDrawFrequency', 'NoTargetPieceLoss', 
    'NoTargetPieceLossFrequency', 'NoTargetPieceDraw', 'NoTargetPieceDrawFrequency', 
    'NoOwnPiecesDraw', 'NoOwnPiecesDrawFrequency', 'FillLoss', 'FillLossFrequency', 'FillDraw', 
    'FillDrawFrequency', 'ScoringDrawFrequency', 'NoProgressWin', 'NoProgressWinFrequency', 
    'NoProgressLoss', 'NoProgressLossFrequency', 'SolvedEnd', 'Behaviour', 'StateRepetition', 
    'PositionalRepetition', 'SituationalRepetition', 'Duration', 'Complexity', 'BoardCoverage', 
    'GameOutcome', 'StateEvaluation', 'Clarity', 'Narrowness', 'Variance', 'Decisiveness', 
    'DecisivenessMoves', 'DecisivenessThreshold', 'LeadChange', 'Stability', 'Drama', 'DramaAverage', 
    'DramaMedian', 'DramaMaximum', 'DramaMinimum', 'DramaVariance', 'DramaChangeAverage', 
    'DramaChangeSign', 'DramaChangeLineBestFit', 'DramaChangeNumTimes', 'DramaMaxIncrease', 
    'DramaMaxDecrease', 'MoveEvaluation', 'MoveEvaluationAverage', 'MoveEvaluationMedian', 
    'MoveEvaluationMaximum', 'MoveEvaluationMinimum', 'MoveEvaluationVariance', 
    'MoveEvaluationChangeAverage', 'MoveEvaluationChangeSign', 'MoveEvaluationChangeLineBestFit', 
    'MoveEvaluationChangeNumTimes', 'MoveEvaluationMaxIncrease', 'MoveEvaluationMaxDecrease', 
    'StateEvaluationDifference', 'StateEvaluationDifferenceAverage', 'StateEvaluationDifferenceMedian', 
    'StateEvaluationDifferenceMaximum', 'StateEvaluationDifferenceMinimum', 
    'StateEvaluationDifferenceVariance', 'StateEvaluationDifferenceChangeAverage', 
    'StateEvaluationDifferenceChangeSign', 'StateEvaluationDifferenceChangeLineBestFit', 
    'StateEvaluationDifferenceChangeNumTimes', 'StateEvaluationDifferenceMaxIncrease', 
    'StateEvaluationDifferenceMaxDecrease', 'BoardSitesOccupied', 'BoardSitesOccupiedMinimum', 
    'BranchingFactor', 'BranchingFactorMinimum', 'DecisionFactor', 'DecisionFactorMinimum', 
    'MoveDistance', 'MoveDistanceMinimum', 'PieceNumber', 'PieceNumberMinimum', 
    'ScoreDifference', 'ScoreDifferenceMinimum', 'ScoreDifferenceChangeNumTimes', 'Roots', 
    'Cosine', 'Sine', 'Tangent', 'Exponential', 'Logarithm', 'ExclusiveDisjunction', 'Float', 
    'HandComponent', 'SetHidden', 'SetInvisible', 'SetHiddenCount', 'SetHiddenRotation', 
    'SetHiddenState', 'SetHiddenValue', 'SetHiddenWhat', 'SetHiddenWho'
]

# Columns that represent game rules and settings
game_cols = ['GameRulesetName', 'EnglishRules', 'LudRules']

# Target columns representing the outcome of the game for agent 1
output_cols = ['num_wins_agent1', 'num_draws_agent1', 'num_losses_agent1']

# Columns representing the participating agents
agent_cols = ['agent1', 'agent2']

# Columns to be dropped from the dataset (irrelevant, game, and output columns)
dropped_cols = output_cols + irrelevant_cols + game_cols


# Config Class

In [None]:
class Config:
    train_path = '/kaggle/input/um-game-playing-strength-of-mcts-variants/train.csv'
    
    early_stop = 50
    n_splits = 5
    seed = 1212
    split_agent_features = True
    
    lgbm_params = {
        'num_boost_round': 10_000,
        'seed': 1212,
        'verbose': -1,
        'num_leaves': 63,
        'learning_rate': 0.05,
        'max_depth': 8,
        'reg_lambda': 1.0,
    }
#   Some common params to experiment with (here are default values):
#         'learning_rate': 0.1,
#         'reg_lambda': 0.0,
#         'num_leaves': 31,
#         'max_depth': -1,
#         'max_bin': 255,
#         'extra_trees': False,

# Preprocessing

In [None]:
def process_data(df: pl.DataFrame) -> pd.DataFrame:
    """
    Processes the input DataFrame by removing irrelevant columns, splitting agent features,
    and casting the appropriate data types to ensure proper processing.

    Parameters:
    ----------
    df : pl.DataFrame
        The input DataFrame to be processed.

    Returns:
    -------
    pd.DataFrame
        The processed DataFrame converted to a pandas DataFrame.
    """
    # Drop irrelevant columns based on `dropped_cols`
    df = df.drop(filter(lambda x: x in df.columns, dropped_cols))
    
    # Split agent features into separate columns if `split_agent_features` is True
    if Config.split_agent_features:
        for col in agent_cols:
            df = (
                df.with_columns(
                    pl.col(col).str.split(by="-").list.to_struct(
                        fields=lambda idx: f"{col}_{idx}"
                    )
                )
                .unnest(col)
                .drop(f"{col}_0")  # Optionally drop the first part if unnecessary
            )
    
    # Cast agent-related columns to categorical and all other columns to float32
    df = df.with_columns(
        [pl.col(col).cast(pl.Categorical) for col in df.columns if col.startswith("agent")]
    )
    df = df.with_columns(
        [pl.col(col).cast(pl.Float32) for col in df.columns if not col.startswith("agent")]
    )
    
    # Print the shape of the processed DataFrame
    print(f'Data shape: {df.shape}')
    
    # Convert the Polars DataFrame to a pandas DataFrame
    return df.to_pandas()


# Model Training

In [None]:
def train_lgb(data: pd.DataFrame) -> List[lgb.LGBMRegressor]:
    """
    Trains a LightGBM model using KFold cross-validation.

    Parameters:
    ----------
    data : pd.DataFrame
        The input DataFrame containing features and the target variable 'utility_agent1'.

    Returns:
    -------
    List[lgb.LGBMRegressor]
        A list of trained LightGBM models, one for each fold.
    """
    # Separate features (X) and target variable (y)
    X = data.drop(['utility_agent1'], axis=1)
    y = data['utility_agent1']
    
    # Initialize KFold cross-validation
    cv = KFold(n_splits=Config.n_splits, shuffle=True, random_state=Config.seed)
    models: List[lgb.LGBMRegressor] = []
    
    # Train models for each fold
    for fi, (train_idx, valid_idx) in enumerate(cv.split(X, y)):
        print(f'Fold {fi+1}/{Config.n_splits} ...')
        model = lgb.LGBMRegressor(**Config.lgbm_params)
        model.fit(
            X.iloc[train_idx], y.iloc[train_idx],
            eval_set=[(X.iloc[valid_idx], y.iloc[valid_idx])],
            eval_metric='rmse',
            callbacks=[lgb.early_stopping(Config.early_stop)]
        )
        models.append(model)
    
    return models

def infer_lgb(data: pd.DataFrame, models: List[lgb.LGBMRegressor]) -> np.ndarray:
    """
    Makes predictions using the trained LightGBM models.

    Parameters:
    ----------
    data : pd.DataFrame
        The input DataFrame containing features for prediction.
    models : List[lgb.LGBMRegressor]
        A list of trained LightGBM models.

    Returns:
    -------
    np.ndarray
        The averaged predictions from all models.
    """
    # Generate predictions from each model and compute the average
    predictions = np.mean([model.predict(data) for model in models], axis=0)
    return predictions

# Submission

In [None]:
# Global variable to track the run state
run_i = 0
models: List[lgb.LGBMRegressor] = []

def predict(test_data: pl.DataFrame, submission: pl.DataFrame) -> pl.DataFrame:
    """
    Predicts the 'utility_agent1' values for the test data using pre-trained LightGBM models.

    Parameters:
    ----------
    test_data : pl.DataFrame
        The input DataFrame containing features for prediction.
    submission : pl.DataFrame
        The submission DataFrame to which the predictions will be added.

    Returns:
    -------
    pl.DataFrame
        The submission DataFrame with the predicted 'utility_agent1' values.
    """
    global run_i, models
    
    # Train models only if it's the first run
    if run_i == 0:
        train_df = pl.read_csv(Config.train_path)
        models = train_lgb(process_data(train_df))
        run_i += 1
    
    # Process test data and make predictions
    test_data = process_data(test_data)
    predictions = infer_lgb(test_data, models)
    
    return submission.with_columns(pl.Series('utility_agent1', predictions))

# Initialize the inference server
inference_server = kaggle_evaluation.mcts_inference_server.MCTSInferenceServer(predict)

# Serve the inference server or run local gateway based on the environment
if os.getenv('KAGGLE_IS_COMPETITION_RERUN'):
    inference_server.serve()
else:
    inference_server.run_local_gateway(
        (
            '/kaggle/input/um-game-playing-strength-of-mcts-variants/test.csv',
            '/kaggle/input/um-game-playing-strength-of-mcts-variants/sample_submission.csv'
        )
    )