# NBA Game Statistics Predictor
### CMPE 257 Project
Authors: Kaushika Uppu, Miranda Billawala, Yun Ei Hlaing, Iris Cheung

## Imports

In [None]:
import pandas as pd
import numpy as np
import time
import matplotlib.pyplot as plt
import seaborn as sns

import random
from datetime import datetime, timedelta
from sklearn.linear_model import LogisticRegression
from sklearn.ensemble import RandomForestClassifier
from xgboost import XGBClassifier, XGBRegressor
from sklearn.metrics import mean_squared_error
import itertools

## NBA Game Data
First, we load in all of the NBA game data from the CSV file. Exact code for gathering data is in a separate file and use the nba_api file. Only games from the 1985-1986 season and afterward are loaded in as the seasons before that are missing a very significant portion of the game statistics' data. We also want to be able to map from team id to abbreviation and back easily.

In [None]:
all_stats_cleaned = pd.read_csv('all_stats_cleaned.csv')
all_stats_cleaned['GAME_DATE'] = pd.to_datetime(all_stats_cleaned['GAME_DATE'], format='ISO8601') # convert date to datetime object

all_stats_cleaned.head()

In [None]:
team_id_to_abb = {} # dictionary to convert from team_id to team_abbreviation
team_abb_to_id = {} # dictionary to convert from team_abbreviation to team_id

teams = (all_stats_cleaned[['TEAM_ID', 'TEAM_ABBREVIATION']]).drop_duplicates()

for index, row in teams.iterrows() :
    if row['TEAM_ID'] not in team_id_to_abb.keys():
        team_id_to_abb[row['TEAM_ID']] = []
    team_id_to_abb[row['TEAM_ID']].append(row['TEAM_ABBREVIATION'])
    team_abb_to_id[row['TEAM_ABBREVIATION']] = row['TEAM_ID']

### Merging Home and Away Team Stats Into One Row
Currently, each game is represented by two separate rows in the dataset - one for the home team and one for the away team. To make the data more clear, we decided to combine the two rows into a single row with statistics for both teams. Since predicting with our model will pass one set order of team one and team two (i.e. Lakers as Team One, Warriors as Team Two), we want to make sure that the model realizes games with the Lakers as Team Two and Warriors as Team One are more similar than may appear by the data. To do this, we will duplicate the rows and flip the teams. Then, we will have each game listed twice with the teams flipped. 

Firstly, we split the dataset into two : home games and away games. Then, we performed a join on these two datasets, matching each home team with its corresponding opponent based on the same dates. 

In [None]:
home = all_stats_cleaned[all_stats_cleaned.HOME == 1]
away = all_stats_cleaned[all_stats_cleaned.HOME == 0]

In [None]:
combined_stats_home = pd.merge(home, away, 
                          left_on=['GAME_DATE', 'OPPONENT'], 
                          right_on=['GAME_DATE', 'TEAM_ABBREVIATION'],
                          suffixes=('_ONE', '_TWO'))
combined_stats_away = pd.merge(away, home, 
                          left_on=['GAME_DATE', 'OPPONENT'], 
                          right_on=['GAME_DATE', 'TEAM_ABBREVIATION'],
                          suffixes=('_ONE', '_TWO'))

combined_stats = pd.concat([combined_stats_home, combined_stats_away], ignore_index = True)
combined_stats.head(5)

After merging the rows, there are some columns that appear twice or are now unneccessary to the dataset. These columns include `MIN_ONE`/`MIN_TWO` (length of game in minutes), `SEASON_YEAR_ONE`/`SEASON_YEAR_TWO`, `OPPONENT_ONE` and `OPPONENT_TWO`. We first checked if the `MIN_ONE` and `MIN_TWO` for each row has the same values. As seen below, there are 24 games where the minutes differed slightly. However, since the difference did not seem to be significant, we decided to retain one column and rename it `MIN`.

In [None]:
(combined_stats['MIN_ONE'] != combined_stats['MIN_TWO']).sum()

In [None]:
combined_stats[combined_stats['MIN_ONE'] != combined_stats['MIN_TWO']][['MIN_ONE','MIN_TWO']]

In [None]:
combined_stats = combined_stats.drop(columns = ['MIN_TWO', 'OPPONENT_ONE', 'OPPONENT_TWO', 'SEASON_YEAR_ONE'])
combined_stats.rename(columns={'MIN_ONE': 'MIN', 'SEASON_YEAR_TWO': 'SEASON_YEAR'}, inplace=True)

## Feature Engineering
Features to add : 
1) Win streak
2) Win percentage
3) ELO Scores
4) EFG%
5) TS%
6) Win last (who won the last game between the two teams playing)

### Win Streak and Win Percentage

In [None]:
def add_win_streak_and_percentage(df, combined=False):
    """
    Input: Dataframe with team one and team two data for each game and boolean to check if dataframe is combined with both team data
    Output: New dataframe with added win streak and win percentage for both teams
    """
    if combined :
        team_date_stats = df[['TEAM_ID_ONE', 'GAME_DATE', 'WIN_ONE']].sort_values(by=['TEAM_ID_ONE', 'GAME_DATE']).reset_index(drop=True)
        team_date_stats['WIN_STREAK'] = 0
        team_date_stats['WIN_PERCENTAGE'] = 0.0
        
        for team_id, group in team_date_stats.groupby('TEAM_ID_ONE'):
            streak = 0
            wins = 0
            total_games = 0
            indices = group.index
        
            for i in range(len(indices)):
                idx = indices[i]
        
                # WIN STREAK
                team_date_stats.at[idx, 'WIN_STREAK'] = streak
        
                if team_date_stats.at[idx, 'WIN_ONE'] == 1:
                    streak += 1
                else: 
                    streak = 0
        
                # WIN PERCENTAGE
                if total_games == 0:
                    team_date_stats.at[idx, 'WIN_PERCENTAGE'] = 0.0
                else: 
                    team_date_stats.at[idx, 'WIN_PERCENTAGE'] = wins / total_games
        
                total_games += 1
                if team_date_stats.at[idx, 'WIN_ONE'] == 1:
                    wins += 1
        # First, prepare a lookup table
        stats_lookup = team_date_stats.set_index(['TEAM_ID_ONE', 'GAME_DATE'])
        df['WIN_STREAK_ONE'] = df.set_index(['TEAM_ID_ONE', 'GAME_DATE']).index.map(stats_lookup['WIN_STREAK'])
        df['WIN_PERCENTAGE_ONE'] = df.set_index(['TEAM_ID_ONE', 'GAME_DATE']).index.map(stats_lookup['WIN_PERCENTAGE'])
    
        df['WIN_STREAK_TWO'] = df.set_index(['TEAM_ID_TWO', 'GAME_DATE']).index.map(stats_lookup['WIN_STREAK'])
        df['WIN_PERCENTAGE_TWO'] = df.set_index(['TEAM_ID_TWO', 'GAME_DATE']).index.map(stats_lookup['WIN_PERCENTAGE'])
    
    else :
        team_date_stats = df[['TEAM_ID', 'GAME_DATE', 'WIN']].sort_values(by=['TEAM_ID', 'GAME_DATE']).reset_index(drop=True)
        team_date_stats['WIN_STREAK'] = 0
        team_date_stats['WIN_PERCENTAGE'] = 0.0
        
        for team_id, group in team_date_stats.groupby('TEAM_ID'):
            streak = 0
            wins = 0
            total_games = 0
            indices = group.index
        
            for i in range(len(indices)):
                idx = indices[i]
        
                # WIN STREAK
                team_date_stats.at[idx, 'WIN_STREAK'] = streak
        
                if team_date_stats.at[idx, 'WIN'] == 1:
                    streak += 1
                else: 
                    streak = 0
        
                # WIN PERCENTAGE
                if total_games == 0:
                    team_date_stats.at[idx, 'WIN_PERCENTAGE'] = 0.0
                else: 
                    team_date_stats.at[idx, 'WIN_PERCENTAGE'] = wins / total_games
        
                total_games += 1
                if team_date_stats.at[idx, 'WIN'] == 1:
                    wins += 1
        # First, prepare a lookup table
        stats_lookup = team_date_stats.set_index(['TEAM_ID', 'GAME_DATE'])
        df['WIN_STREAK'] = df.set_index(['TEAM_ID', 'GAME_DATE']).index.map(stats_lookup['WIN_STREAK'])
        df['WIN_PERCENTAGE'] = df.set_index(['TEAM_ID', 'GAME_DATE']).index.map(stats_lookup['WIN_PERCENTAGE'])

    return df

### ELO Score Before Current Game

In [None]:
def merge_opponent_points(df):
    df_opp = df[['TEAM_ABBREVIATION', 'GAME_DATE', 'PTS', 'TEAM_ID']].copy()
    merged_df = pd.merge(df, df_opp, 
                         how='left',
                          left_on=['GAME_DATE', 'OPPONENT'],
                            right_on=['GAME_DATE', 'TEAM_ABBREVIATION'],
                          suffixes=('', '_OPPONENT'))
    merged_df.drop(columns=['TEAM_ABBREVIATION_OPPONENT'], inplace=True)
    return merged_df

In [None]:
def add_elo_score(df, combined=False):
    """
    Input: Dataframe with team one and team two data for each game and boolean to check if dataframe is combined with both team data
    Output: New dataframe with elo scores for both teams added 
    """
    if combined:
        df['GAME_ID'] = df.apply(
        lambda row: '_'.join(sorted([str(row['TEAM_ID_ONE']), str(row['TEAM_ID_TWO'])]) + [str(row['GAME_DATE'])]),
        axis=1
    )
        df['ELO_ONE'] = np.nan
        df['ELO_TWO'] = np.nan
    else:
        df = merge_opponent_points(df)
        df['ELO'] = np.nan
        df['GAME_ID'] = df.apply(
        lambda row: '_'.join(sorted([str(row['TEAM_ID']), str(row['TEAM_ID_OPPONENT'])]) + [str(row['GAME_DATE'])]),
        axis=1
    )
    
    team_elos = {} # to use for checking if a team has appeared and track team last elo scores
    team_last_season = {} # to track last seasons of teams
    processed_games = set() # to track game id - handle duplicate game columns
    elo_map = {} # for faster computation
    df = df.sort_values(by='GAME_DATE').reset_index(drop=True)
    
    for i,row in df.iterrows():
        season = row['SEASON_YEAR']
        game_id = row['GAME_ID']

        if game_id in processed_games:
            continue
        processed_games.add(game_id)

        if combined:
            team_one, team_two = row['TEAM_ID_ONE'], row['TEAM_ID_TWO']
            points_one, points_two = row['PTS_ONE'], row['PTS_TWO']
            home_one = row['HOME_ONE']
        
            # Season adjustment formula for ELO : New Season ELO = 0.75 * Last Season ELO + 0.25 * Mean ELO, Mean ELO = 1505
            for team in [team_one, team_two]:
                # check if team has not appeared yet in the dataset
                if team not in team_elos:
                    team_elos[team] = 1505 
                    team_last_season[team] = season
                # check for new season, if yes, apply season adjustment
                elif team_last_season[team] != season:
                    team_elos[team] = 0.75 * team_elos[team] + 0.25 * 1505
                    team_last_season[team] = season
        
            # elo scores before game
            elo_one = team_elos[team_one]
            elo_two = team_elos[team_two]
        
            # Add 100 score to home team
            if home_one == 1:
                elo_one_after_home_adv = elo_one + 100 
                elo_two_after_home_adv = elo_two
            else:
                elo_one_after_home_adv = elo_one 
                elo_two_after_home_adv = elo_two + 100
        
            # Expected score of game formula : exp = 1/ (1+10^((ELO two after home advantage - ELO one after home advantage) / 400))
            exp = 1/ (1+10**((elo_two_after_home_adv - elo_one_after_home_adv) / 400))
        
            actual = 1 if points_one > points_two else 0
            margin_of_victory = abs(points_one - points_two)
        
            # Margin of Victory Multiplier formula : ((MOV + 3) ** 0.8) / (7.5 + 0.006 * (Elo team one - Elo team two))
            MOVM = ((margin_of_victory + 3) ** 0.8) / (7.5 + 0.006 * (elo_one - elo_two))
        
            # change in ELO: K * MOVM * (actual - exp), k -> attenuation factor -> higher means elo score adjusts quickly to changes in strength of team
            K = 20 # 20 is optimal for nba 
            change = K * MOVM * (actual - exp)
    
            # Update data for ELO ratings
            team_elos[team_one] += change
            team_elos[team_two] -= change
        
            # store elo score for game id at the table
            # df.at[i, 'ELO_ONE'] = elo_one
            # df.at[i, 'ELO_TWO'] = elo_two
            # df.loc[(df['GAME_ID'] == game_id) & df['TEAM_ID_ONE'] == team_two, 'ELO_ONE'] = elo_two
            # df.loc[(df['GAME_ID'] == game_id) & df['TEAM_ID_TWO'] == team_one, 'ELO_TWO'] = elo_one

            # store elo scores in dictionary
            elo_map[(game_id, team_one, team_two)] = elo_one
            elo_map[(game_id, team_two, team_one)] = elo_two
     
        else:
            team, team_opp = row['TEAM_ID'], row['TEAM_ID_OPPONENT']
            points_team, points_opp = row['PTS'], row['PTS_OPPONENT']
            home = row['HOME']
        
            # Season adjustment formula for ELO : New Season ELO = 0.75 * Last Season ELO + 0.25 * Mean ELO, Mean ELO = 1505
            for t in [team, team_opp]:
                # check if team has not appeared yet in the dataset
                if t not in team_elos:
                    team_elos[t] = 1505 
                    team_last_season[t] = season
                # check for new season, if yes, apply season adjustment
                elif team_last_season[t] != season:
                    team_elos[t] = 0.75 * team_elos[t] + 0.25 * 1505
                    team_last_season[t] = season
        
            # elo scores before game
            elo_team = team_elos[team]
            elo_opponent = team_elos[team_opp]
        
            # Add 100 score to home team
            if home == 1:
                elo_team_home = elo_team + 100 
                elo_opp_home = elo_opponent
            else:
                elo_team_home = elo_team 
                elo_opp_home = elo_opponent + 100
        
            # Expected score of game formula : exp = 1/ (1+10^((ELO two after home advantage - ELO one after home advantage) / 400))
            exp = 1/ (1+10**((elo_opp_home - elo_team_home) / 400))
        
            actual = 1 if points_team > points_opp else 0
            margin_of_victory = abs(points_team - points_opp)
        
            # Margin of Victory Multiplier formula : ((MOV + 3) ** 0.8) / (7.5 + 0.006 * (Elo team one - Elo team two))
            MOVM = ((margin_of_victory + 3) ** 0.8) / (7.5 + 0.006 * (elo_team - elo_opponent))
        
            # change in ELO: K * MOVM * (actual - exp), k -> attenuation factor -> higher means elo score adjusts quickly to changes in strength of team
            K = 20 # 20 is optimal for nba 
            change = K * MOVM * (actual - exp)

            # Update data for ELO ratings
            team_elos[team] += change
            team_elos[team_opp] -= change
        
            # store elo score for both row of game at the table
            # df.at[i, 'ELO'] = elo_team
            # df.loc[(df['GAME_ID'] == game_id) & df['TEAM_ID'] == team_opp, 'ELO'] = elo_opponent
            elo_map[(game_id, team)] = elo_team
            elo_map[(game_id, team_opp)] = elo_opponent

    # add data from elo dictionary into dataframe
    if not combined:
        df['ELO'] = df.apply(lambda x: elo_map.get((x['GAME_ID'], x['TEAM_ID']), np.nan), axis=1)
        df.drop(columns=['PTS_OPPONENT', 'TEAM_ID_OPPONENT'], axis=1, inplace=True)
    else: 
        df['ELO_ONE'] = df.apply(lambda x: elo_map.get((x['GAME_ID'], x['TEAM_ID_ONE'], x['TEAM_ID_TWO']), np.nan), axis=1)
        df['ELO_TWO'] = df.apply(lambda x: elo_map.get((x['GAME_ID'], x['TEAM_ID_TWO'], x['TEAM_ID_ONE']), np.nan), axis=1)
    df.drop(columns=['GAME_ID'], axis=1, inplace=True)
    
            
    return df                                   

### Effective Field Goal Percentage and True Shooting Percentage

In [None]:
def add_shooting_percentages(df, combined=False):
    if combined: 
        df['EFG%_ONE'] = (df['FGM_ONE'] + 1.5 * df['FG3M_ONE']) / df['FGA_ONE']
        df['EFG%_TWO'] = (df['FGM_TWO'] + 1.5 * df['FG3M_TWO']) / df['FGA_TWO']
        df['TS%_ONE'] = df['PTS_ONE'] / (2 * (df['FGA_ONE'] + 0.44 * df['FTA_ONE']))
        df['TS%_TWO'] = df['PTS_TWO'] / (2 * (df['FGA_TWO'] + 0.44 * df['FTA_TWO']))
    else:
        df['EFG%'] = (df['FGM'] + 1.5 * df['FG3M']) / df['FGA']
        df['TS%'] = df['PTS'] / (2 * (df['FGA'] + 0.44 * df['FTA']))
    return df    

### Win for Last Matchup Game

In [None]:
def add_win_last_game(df, combined=False):
    if combined:
        # Save original order
        df['__original_order'] = range(len(df))

        # Sort for correct shifting
        sorted_df = df.sort_values(by=['TEAM_ID_ONE', 'TEAM_ID_TWO', 'GAME_DATE'])
        # Compute WIN_LAST
        sorted_df['WIN_LAST_ONE'] = sorted_df.groupby(['TEAM_ID_ONE', 'TEAM_ID_TWO'])['WIN_ONE'].shift(1)
        sorted_df['WIN_LAST_TWO'] = 1 - sorted_df['WIN_LAST_ONE'] 
        
        # Restore original order and keep WIN_LAST
        df = sorted_df.sort_values('__original_order').drop(columns='__original_order')
        
    else:
        # Save original order
        df['__original_order'] = range(len(df))

        # Sort for correct shifting
        sorted_df = df.sort_values(by=['TEAM_ID', 'OPPONENT', 'GAME_DATE'])

        # Compute WIN_LAST
        sorted_df['WIN_LAST'] = sorted_df.groupby(['TEAM_ID', 'OPPONENT'])['WIN'].shift(1)

        # Restore original order and keep WIN_LAST
        df = sorted_df.sort_values('__original_order').drop(columns='__original_order')

    return df

## Predict Game Statistics

### Get Validation Set
We will take some subset of the games to check how well our predicted statistics represent the actual statistics.Since the last five seasons are our test set, we will not look at games in that window.

In [None]:
def get_val_set (first_season, last_season, n = 1) :
    dates = []
    for season in range(first_season, last_season) :
        season_data = all_stats_cleaned[all_stats_cleaned['SEASON_YEAR'] == season]
        start_date = season_data['GAME_DATE'].min()
        end_date = season_data['GAME_DATE'].max()

        # day around the beginning of the season
        beg = season_data[season_data['GAME_DATE'].between(start_date, start_date + timedelta(weeks = 4))]

        # day around trade deadline (after about 2/3 of the season)
        delta = round((2/3)*(end_date-start_date).days)
        approx_deadline = start_date + timedelta(days = delta)
        mid = season_data[season_data['GAME_DATE'].between(approx_deadline, approx_deadline + timedelta(weeks = 4))]
        
        # day around the end of the season
        end = season_data[season_data['GAME_DATE'].between(end_date - timedelta(weeks = 4), end_date)]

        dates.extend(list(pd.concat([beg.sample(n)['GAME_DATE'], mid.sample(n)['GAME_DATE'], end.sample(n)['GAME_DATE']])))

    return dates

In [None]:
first_season = all_stats_cleaned['SEASON_YEAR'].min() + 1
last_season = all_stats_cleaned['SEASON_YEAR'].max() - 5
val_set = get_val_set(first_season, last_season)

We attempt two different methods for predicting game statistics. As a baseline, we use a regular rolling window. Then, we implement a model which predicts a team's statistics. We use both of these values to test an outcome predictor model after.

In [None]:
# added shooting percentage
all_stats_cleaned = add_shooting_percentages(all_stats_cleaned)
# added win streak and win percentage
all_stats_cleaned = add_win_streak_and_percentage(all_stats_cleaned)
# added ELO score
all_stats_cleaned = add_elo_score(all_stats_cleaned)
# added win for last game
all_stats_cleaned = add_win_last_game(all_stats_cleaned)

In [None]:
def rolling_window(n, cols) :
    pred = None
    for team_id in all_stats_cleaned['TEAM_ID'].unique() :
        team_data = all_stats_cleaned[all_stats_cleaned['TEAM_ID'] == team_id].sort_values(by='GAME_DATE')
        for col in cols :
            shift = team_data[col].shift(1)
            team_data[col] = shift.rolling(window = n).mean()
        if pred is None :
            pred = team_data
        else :
            pred = pd.concat([pred, team_data], ignore_index = True)
    pred = pred.dropna(axis = 0)

    home = pred[pred['HOME'] == 1]
    away = pred[pred['HOME'] == 0]

    combined_pred_stats_home = pd.merge(home, away, 
                          left_on=['GAME_DATE', 'OPPONENT'], 
                          right_on=['GAME_DATE', 'TEAM_ABBREVIATION'],
                          suffixes=('_ONE', '_TWO'))
    combined_pred_stats_away = pd.merge(away, home, 
                          left_on=['GAME_DATE', 'OPPONENT'], 
                          right_on=['GAME_DATE', 'TEAM_ABBREVIATION'],
                          suffixes=('_ONE', '_TWO'))

    combined_pred_stats = pd.concat([combined_pred_stats_home, combined_pred_stats_away], ignore_index = True)
    combined_pred_stats.rename(columns={'MIN_ONE': 'MIN', 'SEASON_YEAR_TWO': 'SEASON_YEAR'}, inplace=True)
    combined_pred_stats = combined_pred_stats.drop(columns = ['MIN_TWO', 'OPPONENT_ONE', 'OPPONENT_TWO', 'SEASON_YEAR_ONE', 
                                                              'TEAM_ABBREVIATION_ONE', 'TEAM_NAME_ONE', 'MIN', 'FGM_ONE', 
                                                              'FGA_ONE', 'FG3M_ONE', 'FG3A_ONE', 'FTM_ONE', 'FTA_ONE', 'PTS_ONE', 
                                                              'PLUS_MINUS_ONE', 'TEAM_ABBREVIATION_TWO', 'TEAM_NAME_TWO', 'HOME_TWO',
                                                              'WIN_TWO', 'FGM_TWO', 'FGA_TWO', 'FG3M_TWO', 'FG3A_TWO', 'FTM_TWO', 
                                                              'FTA_TWO', 'PTS_TWO', 'PLUS_MINUS_TWO'])

    return combined_pred_stats

### Rolling Window Statistics (Baseline)

In [None]:
cols = ['FG_PCT', 'FG3_PCT', 'FT_PCT', 'OREB', 'DREB', 'REB', 'AST', 'STL', 'BLK', 'TOV', 'PF', 'EFG%', 'TS%']
df_rolling = rolling_window(5, cols)
df_rolling.head()

In [None]:
df_rolling[df_rolling['SEASON_YEAR'].between(2019,2024)].shape

### Predicting Using ML Model

In [None]:
# get actual stats
combined_stats_training = add_shooting_percentages(combined_stats, combined = True)
combined_stats_training = combined_stats[['TEAM_ID_ONE', 'TEAM_ID_TWO', 'GAME_DATE', 'FG_PCT_ONE',
                                          'FG3_PCT_ONE','FT_PCT_ONE', 'OREB_ONE', 'DREB_ONE', 'REB_ONE',
                                          'AST_ONE', 'STL_ONE', 'BLK_ONE', 'TOV_ONE', 'PF_ONE', 'EFG%_ONE', 'TS%_ONE']]

In [None]:
# get rolling window stats
cols = ['FG_PCT', 'FG3_PCT', 'FT_PCT', 'OREB', 'DREB', 'REB', 'AST', 'STL', 'BLK', 'TOV', 'PF', 'EFG%', 'TS%']
rolling_stats_training = rolling_window(5, cols)

In [None]:
# combine 
model_training_set = pd.merge(rolling_stats_training, combined_stats_training, 
                          left_on=['TEAM_ID_ONE', 'TEAM_ID_TWO', 'GAME_DATE'], 
                          right_on=['TEAM_ID_ONE', 'TEAM_ID_TWO', 'GAME_DATE'],
                          suffixes=('_PRED', '_ACT'))

In [None]:
model_training_set.columns

In [None]:
act_cols = ['FG_PCT_ONE_ACT', 'FG3_PCT_ONE_ACT', 'FT_PCT_ONE_ACT', 'OREB_ONE_ACT', 'DREB_ONE_ACT', 
                'REB_ONE_ACT','AST_ONE_ACT', 'STL_ONE_ACT', 'BLK_ONE_ACT', 'TOV_ONE_ACT', 
                'PF_ONE_ACT', 'EFG%_ONE_ACT', 'TS%_ONE_ACT'] 

def train_model(df, team_id, game_date, model_params = None):
    """
    Trains a model to predict team stats for a given game using past rolling averages of both teams.
    Features: past performance of TEAM_ONE and TEAM_TWO.
    Targets: actual stats of TEAM_ONE in the current game.
    """
    # determine season of the game
    season = game_date.year if game_date.month >= 10 else game_date.year - 1
    
    # get games for training
    df_past = df[df['SEASON_YEAR'].between(season - 5, season)].copy() # only look at the last 5 seasons
    df_past = df[(df['GAME_DATE'] < game_date) & (df['TEAM_ID_ONE'] == team_id)]
    X = df_past.drop(columns = act_cols+['GAME_DATE'])
    # fitting a XGBoost model for each stat
    models = {}
    for col in act_cols:
        y = df_past[col]
        if model_params is None :
            model = XGBRegressor(n_estimators = 100, random_state = 33)
        else :
            model = XGBRegressor(**model_params[col], random_state = 33)
        model.fit(X, y)
        models[col] = model

    return models

def predict_game_stats(df, team_id, game_date, model_params = None) :
    """
    Predicts the statistics for given game.
    """
    df = df.drop(columns = 'WIN_ONE')
    df = pd.get_dummies(df, columns=['TEAM_ID_TWO'], drop_first=True)
    if model_params is None :
        models = train_model(df, team_id, game_date)
    else :
        models = train_model(df, team_id, game_date, model_params)

    pred = df[(df['GAME_DATE'] == game_date) & (df['TEAM_ID_ONE'] == team_id)].drop(columns = act_cols+['GAME_DATE'])

    
    prediction = {}
    for stat, model in models.items():
        prediction[stat] = model.predict(pred)[0]
    return prediction

def evaluate_stats_model(df, test_set, model_params = None):
    """
    Evaluates predicting stats model by testing on last `test_seasons` seasons using RMSE.
    """
    predictions = []
    actuals = []

    for day in test_set :
        print("Predicting...", day)
        games_on_day = df[df['GAME_DATE'] == day]
        for index, row in games_on_day.iterrows() :
            if model_params is None :
                pred = predict_game_stats(df, row['TEAM_ID_ONE'], day)
            else :
                pred = predict_game_stats(df, row['TEAM_ID_ONE'], day, model_params)
            pred = [pred[col] for col in act_cols]
            act = [row[col] for col in act_cols]
            predictions.append(pred)
            actuals.append(act)

    # evaluating model's predictions
    y_true = np.array(actuals)
    y_pred = np.array(predictions)
    total_rmse = np.sqrt(mean_squared_error(y_true.flatten(), y_pred.flatten()))
    return total_rmse

In [None]:
rmse = evaluate_stats_model(model_training_set, val_set)

In [None]:
rmse # np.float64(3.7292723986015033)

### Hyperparameter Tuning
To perform hyperparameter tuning, we are going to look at only a small subset of the validation set since each game to be predicted requires fitting a number of different models. For computational efficiency, we are going to make the validation subset include only dates from 2018.

In [None]:
param_test_set = [d for d in val_set if d.year == 2018]
param_test_set

In [None]:
def hyperparameter_tuning (df, params, test_set) :
    index = 1
    param_perf = None
    for p in params :
        print(f"Iteration {index} / {len(params)}")
        predictions = None 
        actual = None
        for day in test_set :
            games_on_day = df[df['GAME_DATE'] == day]
            for _, row in games_on_day.iterrows() :
                model_params = {col : p for col in act_cols}
                pred = pd.DataFrame([predict_game_stats(df, row['TEAM_ID_ONE'], day, model_params)])
                act = pd.DataFrame([{col : row[col] for col in act_cols}])
                predictions = pred if predictions is None else pd.concat([predictions, pred], ignore_index = True)
                actual = act if predictions is None else pd.concat([actual, act], ignore_index = True)

        scores = {'params': p}
        for col in act_cols :
            scores[col] = np.sqrt(mean_squared_error(predictions[col], actual[col]))
        scores = pd.DataFrame([scores])
        param_perf = scores if param_perf is None else pd.concat([param_perf, scores], ignore_index = True)
        index += 1

    best_params = {}
    for col in act_cols :
        best_params[col] = param_perf.loc[param_perf[col].idxmin(), 'params']
    return best_params

In [None]:
param_grid = {
    "n_estimators": [50, 100, 150],
    "eta": [0.01, 0.05, 0.1], # learning_rate
    "max_depth": [4, 6, 8], # maximum depth of a tree
    "subsample": [0.5, 0.7, 1], # fraction of observation to be radnomly sampled for each tree
    "colsample_bytree": [0.5, 0.7, 1], # fraction of columns to be random samples for each tree
    }

params = []
# Iterate over all combinations of hyperparameters
for values in itertools.product(*param_grid.values()):
    params.append(dict(zip(param_grid.keys(), values)))

best_params = hyperparameter_tuning(model_training_set, params, param_test_set)

In [None]:
for k, v in best_params.items():
    print(k,v)

In [None]:
# uncomment to avoid rerunning
best_params = {
    'FG_PCT_ONE_ACT': {'n_estimators': 50, 'eta': 0.1, 'max_depth': 8, 'subsample': 0.7, 'colsample_bytree': 0.5},
    'FG3_PCT_ONE_ACT': {'n_estimators': 50, 'eta': 0.1, 'max_depth': 6, 'subsample': 0.7, 'colsample_bytree': 1},
    'FT_PCT_ONE_ACT': {'n_estimators': 100, 'eta': 0.1, 'max_depth': 4, 'subsample': 0.5, 'colsample_bytree': 0.5},
    'OREB_ONE_ACT': {'n_estimators': 50, 'eta': 0.1, 'max_depth': 8, 'subsample': 0.5, 'colsample_bytree': 0.5},
    'DREB_ONE_ACT': {'n_estimators': 100, 'eta': 0.1, 'max_depth': 4, 'subsample': 0.5, 'colsample_bytree': 1},
    'REB_ONE_ACT': {'n_estimators': 150, 'eta': 0.05, 'max_depth': 8, 'subsample': 0.5, 'colsample_bytree': 0.7},
    'AST_ONE_ACT': {'n_estimators': 100, 'eta': 0.1, 'max_depth': 8, 'subsample': 0.5, 'colsample_bytree': 0.5},
    'STL_ONE_ACT': {'n_estimators': 100, 'eta': 0.1, 'max_depth': 8, 'subsample': 0.5, 'colsample_bytree': 0.5},
    'BLK_ONE_ACT': {'n_estimators': 150, 'eta': 0.1, 'max_depth': 6, 'subsample': 0.7, 'colsample_bytree': 1},
    'TOV_ONE_ACT': {'n_estimators': 100, 'eta': 0.1, 'max_depth': 6, 'subsample': 0.7, 'colsample_bytree': 1},
    'PF_ONE_ACT': {'n_estimators': 150, 'eta': 0.05, 'max_depth': 4, 'subsample': 0.7, 'colsample_bytree': 1},
    'EFG%_ONE_ACT': {'n_estimators': 50, 'eta': 0.05, 'max_depth': 8, 'subsample': 0.7, 'colsample_bytree': 1},
    'TS%_ONE_ACT': {'n_estimators': 100, 'eta': 0.1, 'max_depth': 6, 'subsample': 1, 'colsample_bytree': 0.5}
}

In [None]:
rmse_tuned = evaluate_stats_model(model_training_set, val_set, best_params)

In [None]:
rmse_tuned #3.509182892225601

### Predict Training Set for Outcome Model

In [None]:
static_cols = ['TEAM_ID_ONE', 'SEASON_YEAR', 'HOME_ONE', 'WIN_ONE', 'ELO_ONE', 'WIN_STREAK_ONE', 'WIN_PERCENTAGE_ONE', 'WIN_LAST_ONE']
def pred_training_set (df, first_season, last_season, model_params) :
    days = all_stats_cleaned[all_stats_cleaned['SEASON_YEAR'].between(first_season, last_season)]['GAME_DATE'].unique()
    rows = []
    current_day = 1
    total_days = len(days)
    for d in days:
        print(f"Predicting Day {current_day} / {total_days} ")
        games_on_day = df[df['GAME_DATE'] == d]
        for _, row in games_on_day.iterrows() :
            pred = predict_game_stats(df, row['TEAM_ID_ONE'], d, model_params)
            pred['GAME_DATE'] = d
            pred['OPP'] = row['TEAM_ID_TWO']
            for s in static_cols :
                pred[s] = row[s]
            rows.append(pred)
        current_day += 1
            
    all_predictions = pd.DataFrame(rows)
    all_predictions.rename(columns=lambda col: col.replace('_ONE', ''), inplace=True)
    all_predictions.rename(columns=lambda col: col.replace('_ACT', ''), inplace=True)

    home = all_predictions[all_predictions.HOME == 1]
    away = all_predictions[all_predictions.HOME == 0]

    combined_pred_home = pd.merge(home, away, 
                          left_on=['GAME_DATE', 'OPP'], 
                          right_on=['GAME_DATE', 'TEAM_ID'],
                          suffixes=('_ONE', '_TWO'))
    combined_pred_away = pd.merge(away, home, 
                          left_on=['GAME_DATE', 'OPP'], 
                          right_on=['GAME_DATE', 'TEAM_ID'],
                          suffixes=('_ONE', '_TWO'))

    combined_pred = pd.concat([combined_pred_home, combined_pred_away], ignore_index = True)
    combined_pred = combined_pred.drop(columns = ['OPP_ONE', 'OPP_TWO', 'HOME_TWO', 'WIN_TWO', 'SEASON_YEAR_TWO'])
    combined_pred.rename(columns = {'SEASON_YEAR_ONE': 'SEASON_YEAR'}, inplace=True)
    combined_pred=combined_pred[rolling_stats_training.columns] # orient the columns nicely

    return combined_pred

In [None]:
time_horizon = 5
first_test_season = all_stats_cleaned['SEASON_YEAR'].max() - 5
last_test_season = all_stats_cleaned['SEASON_YEAR'].max()

In [None]:
df_model = pred_training_set(model_training_set, first_test_season - time_horizon, last_test_season, best_params)

In [None]:
df_model.to_csv('df_model_fixed.csv', index = False) # still need to add who wins

In [None]:
df_model.head()

## Export Training Statistics

In [None]:
df_model.to_csv('df_model_tuned.csv', index = False)
df_rolling.to_csv('df_rolling.csv', index = False)

## Predict Playoff Statistics
For each playoff game, need to calculate the rolling statistics, predict the actual values and assume as true and then append to the full dataframe so they can be included in the next prediction

In [None]:
cols = ['FG_PCT', 'FG3_PCT','FT_PCT', 'OREB', 'DREB', 'REB', 'AST', 'STL', 'BLK', 'TOV', 'PF', 'EFG%', 'TS%']

act_values = combined_stats[['TEAM_ID_ONE', 'TEAM_ID_TWO',  'HOME_ONE', 'WIN_ONE', 'GAME_DATE', 'FG_PCT_ONE',
                                          'FG3_PCT_ONE','FT_PCT_ONE', 'OREB_ONE', 'DREB_ONE', 'REB_ONE',
                                          'AST_ONE', 'STL_ONE', 'BLK_ONE', 'TOV_ONE', 'PF_ONE', 'EFG%_ONE', 
                                          'TS%_ONE', 'ELO_ONE']]
rolling_values = rolling_window(5, cols).drop(columns = 'WIN_ONE')

In [None]:
def get_input_values (df, team_id, opponent, game_date, home, n=5) :
    cols = ['FG_PCT', 'FG3_PCT','FT_PCT', 'OREB', 'DREB', 'REB', 'AST', 'STL', 'BLK', 'TOV', 'PF', 'EFG%', 'TS%']
    
    input_values = {
        'TEAM_ID': team_id,
        'OPPONENT': opponent,
        'GAME_DATE': game_date,
        'HOME': home
    }
    
    team_data = df[df['TEAM_ID_ONE'] == team_id].sort_values(by='GAME_DATE')
    rolling_stats = team_data.tail(n)
    for col in cols :
            input_values[col] = rolling_stats[col+'_ONE'].mean()

    return pd.DataFrame(input_values, index = [0])

def pred_playoff_game (df_act, df_rolling, team_one, team_two, game_date, home_one) :
    team_one_stats = get_input_values(df_act, team_one, team_two, game_date, home_one)
    team_two_stats = get_input_values(df_act, team_two, team_one, game_date, int(not home_one))

    # to find win streak, win percentage, and win for last game
    combined = df_act.copy().rename(columns=lambda x: x.replace('_ONE', ''))
    combined.rename(columns = {'TEAM_ID_TWO': 'OPPONENT'}, inplace = True)
    combined = pd.concat([combined, team_one_stats, team_two_stats], axis = 0, ignore_index = True).fillna(0)
    combined = add_win_streak_and_percentage(combined)
    combined = add_win_last_game(combined)   

    team_one_stats = combined.iloc[[-2]]
    team_two_stats = combined.iloc[[-1]]

    team_one_stats.loc[:,'ELO'] = df_act[df_act['TEAM_ID_ONE'] == team_one].sort_values(by = 'GAME_DATE').iloc[-1]['ELO_ONE']
    team_two_stats.loc[:,'ELO'] = df_act[df_act['TEAM_ID_ONE'] == team_two].sort_values(by = 'GAME_DATE').iloc[-1]['ELO_ONE']

    pred_stats_one = pd.merge(team_one_stats, team_two_stats, 
                          left_on = ['GAME_DATE'],
                          right_on = ['GAME_DATE'],
                          suffixes = ('_ONE', '_TWO')).drop(columns = ['OPPONENT_ONE', 'OPPONENT_TWO', 'WIN_TWO', 'HOME_TWO'])
    pred_stats_two = pd.merge(team_two_stats, team_one_stats, 
                          left_on = ['GAME_DATE'],
                          right_on = ['GAME_DATE'],
                          suffixes = ('_ONE', '_TWO')).drop(columns = ['OPPONENT_ONE', 'OPPONENT_TWO', 'WIN_TWO', 'HOME_TWO'])
    
    season = game_date.year if game_date.month >= 10 else game_date.year - 1
    pred_stats_one['SEASON_YEAR'] = pred_stats_two['SEASON_YEAR'] = season
    df_rolling = pd.concat([df_rolling, pred_stats_one.drop(columns = 'WIN_ONE'), pred_stats_two.drop(columns = 'WIN_ONE')], ignore_index = True)

    training_set = pd.merge(df_rolling, act_values.drop(columns = ['HOME_ONE']),
                            how = 'left',
                            left_on = ['TEAM_ID_ONE','TEAM_ID_TWO', 'GAME_DATE'],
                            right_on = ['TEAM_ID_ONE', 'TEAM_ID_TWO', 'GAME_DATE'],
                            suffixes = ('_PRED', '_ACT'))
    
    pred_stats_one = pd.DataFrame(predict_game_stats(training_set, team_one, game_date, best_params), index = [0]).rename(columns=lambda col: col.replace('_ACT', ''))
    pred_stats_two = pd.DataFrame(predict_game_stats(training_set, team_two, game_date, best_params), index = [0]).rename(columns=lambda col: col.replace('_ACT', ''))
    
    static_cols = ['TEAM_ID', 'HOME', 'WIN', 'ELO', 'WIN_STREAK', 'WIN_PERCENTAGE', 'WIN_LAST']
    
    pred_stats_one['GAME_DATE'] = pred_stats_two['GAME_DATE'] = game_date
    pred_stats_one['SEASON_YEAR'] = pred_stats_two['SEASON_YEAR'] = season
    pred_stats_one['OPP'] = team_two
    pred_stats_two['OPP'] = team_one
    for s in static_cols :
        pred_stats_one[s] = team_one_stats.iloc[0][s]
        pred_stats_two[s] = team_two_stats.iloc[0][s]

    preds = pd.concat([pred_stats_one, pred_stats_two], ignore_index = True)
    return df_rolling, preds

def playoff_predictions(df_act, df_rolling, games) :
    all_predictions = None
    for _, row in games.iterrows() :
        df_rolling, pred = pred_playoff_game(df_act, df_rolling, row['TEAM_ONE'], row['TEAM_TWO'], row['GAME_DATE'], row['HOME_ONE'])
        all_predictions = pred if all_predictions is None else pd.concat([all_predictions, pred], ignore_index = True)

    all_predictions.rename(columns=lambda col: col.replace('_ACT', ''), inplace=True)    
    all_predictions.rename(columns=lambda col: col.replace('_ONE', ''), inplace=True)

    home = all_predictions[all_predictions.HOME == 1]
    away = all_predictions[all_predictions.HOME == 0]

    combined_pred = pd.merge(home, away, 
                          left_on=['GAME_DATE', 'OPP'], 
                          right_on=['GAME_DATE', 'TEAM_ID'],
                          suffixes=('_ONE', '_TWO'))

    combined_pred = combined_pred.drop(columns = ['OPP_ONE', 'OPP_TWO', 'HOME_TWO', 'WIN_TWO', 'SEASON_YEAR_TWO'])
    combined_pred.rename(columns = {'SEASON_YEAR_ONE': 'SEASON_YEAR'}, inplace=True)

    return combined_pred, df_rolling, all_predictions

### Round 1
In Round 1, we have the following games

Eastern
1. Cleveland (C, 2) vs. Miami (H, 11): 4/20 C, 4/23 C, 4/26 H, 4/28 H, 4/30 C, 5/2 H, 5/4 C
2. Boston (C, 1) vs. Orlando (M, 16): 4/20 C, 4/23 C, 4/25 M, 4/27 M, 4/29 C, 5/1 M, 5/3 C
3. New York (K, 15) vs. Detroit (P, 28): 4/19 K, 4/21 K, 4/24 P, 4/27 P, 4/29 K, 5/1 P, 5/3 K
4. Indiana (P, 17) vs. Milwaukee (B, 12): 4/19 P, 4/22 P, 4/25 B, 4/27 B, 4/29 P, 5/2 B, 5/4 P

Western
1. Oklahoma City (T, 23) vs. Memphis (G, 26): 4/20 T, 4/22 T, 4/24 G, 4/26 G, 4/28 T, 5/1 G, 5/3 T
2. Houston (R, 8) vs Golden State (W, 7): 4/20 R, 4/23 R, 4/26 W, 4/28 W, 4/30 R, 5/2 W, 5/4 R
3. LA Lakers (L, 10) vs. Minnesota (T, 13): 4/19 L, 4/22 L, 4/25 T, 4/27 T, 4/30 L, 5/2 T, 5/4 L
4. Denver (N, 6) vs. LA Clippers (C, 9): 4/19 N, 4/21 N, 4/24 C, 4/26 C, 4/29 N, 5/1 C, 5/3 N

To simulate the playoffs properly, we will predict the statistics and outcome for the first game of each series. Then, append this value and predict the statistics and outcome for the next. We keep going until every matchup has a winner. But, first we need to figure out the id numbers for the teams playing. 

In [None]:
all_stats_cleaned[['TEAM_ID', 'TEAM_NAME']].sort_values(by = 'TEAM_ID').drop_duplicates(subset=['TEAM_ID'], keep = 'last')

#### Game 1

1. Cleveland (C, 2) vs. Miami (H, 11): 4/20 C
2. Boston (C, 1) vs. Orlando (M, 16): 4/20 C
3. New York (K, 15) vs. Detroit (P, 28): 4/19 K
4. Indiana (P, 17) vs. Milwaukee (B, 12): 4/19 P
5. Oklahoma City (T, 23) vs. Memphis (G, 26): 4/20 T
6. Houston (R, 8) vs Golden State (W, 7): 4/20 R
7. LA Lakers (L, 10) vs. Minnesota (T, 13): 4/19 L
8. Denver (N, 6) vs. LA Clippers (C, 9): 4/19 N

In [None]:
games = pd.DataFrame([[2, 11, pd.Timestamp('2025-04-20'), 1], 
         [1, 16, pd.Timestamp('2025-04-20'), 1], 
         [15, 28, pd.Timestamp('2025-04-19'), 1], 
         [17, 12, pd.Timestamp('2025-04-19'), 1], 
         [23, 26, pd.Timestamp('2025-04-20'), 1], 
         [8, 7, pd.Timestamp('2025-04-20'), 1], 
         [10, 13, pd.Timestamp('2025-04-19'), 1],
         [6, 9, pd.Timestamp('2025-04-19'), 1]], columns = ['TEAM_ONE', 'TEAM_TWO', 'GAME_DATE', 'HOME_ONE'])
preds, rolling_values, all_preds = playoff_predictions(act_values, rolling_values, games)
preds.to_csv('playoffs_round_one_one.csv', index = False)

Running these predictions, we found the model predicted the following outcomes:
1. Cleveland vs. Miami : 1-0
2. Boston vs Orlando : 1-0
3. New York vs Detroit: 1-0
4. Indiana vs. Milwaukee 1-0
5. Oklahoma City vs. Memphis : 1-0
6. Houston vs. Golden State : 1-0
7. LA Lakers vs. Minnesota : 0-1
8. Denver vs. LA Clippers : 1-0

We then assume the predicted values to be true and add them to our actual values.

In [None]:
all_preds = all_preds.add_suffix('_ONE')
all_preds = all_preds.rename(columns = {'OPP_ONE': 'TEAM_ID_TWO', 'GAME_DATE_ONE': 'GAME_DATE'})
all_preds = all_preds[act_values.columns]

winners = [2, 1, 23, 8, 15, 17, 13, 6]
all_preds['WIN_ONE'] = [1 if team in winners else 0 for team in all_preds['TEAM_ID_ONE']]

act_values = pd.concat([act_values, all_preds], ignore_index = True)

#### Game 2
1. Cleveland (C, 2) vs. Miami (H, 11): 4/23 C
2. Boston (C, 1) vs. Orlando (M, 16): 4/23 C
3. New York (K, 15) vs. Detroit (P, 28): 4/21 K
4. Indiana (P, 17) vs. Milwaukee (B, 12): 4/22 P
5. Oklahoma City (T, 23) vs. Memphis (G, 26): 4/22 T
6. Houston (R, 8) vs Golden State (W, 7): 4/23 R
7. LA Lakers (L, 10) vs. Minnesota (T, 13): 4/22 L
8. Denver (N, 6) vs. LA Clippers (C, 9): 4/21 N


In [None]:
games = pd.DataFrame([[2, 11, pd.Timestamp('2025-04-23'), 1], 
         [1, 16, pd.Timestamp('2025-04-23'), 1], 
         [15, 28, pd.Timestamp('2025-04-21'), 1], 
         [17, 12, pd.Timestamp('2025-04-22'), 1], 
         [23, 26, pd.Timestamp('2025-04-22'), 1], 
         [8, 7, pd.Timestamp('2025-04-23'), 1], 
         [10, 13, pd.Timestamp('2025-04-22'), 1],
         [6, 9, pd.Timestamp('2025-04-21'), 1]], columns = ['TEAM_ONE', 'TEAM_TWO', 'GAME_DATE', 'HOME_ONE'])
preds, rolling_values, all_preds = playoff_predictions(act_values, rolling_values, games)
preds.to_csv('playoffs_round_one_two.csv', index = False)

Running these predictions, we found the model predicted the following outcomes:
1. Cleveland vs. Miami : 2-0
2. Boston vs Orlando : 2-0
3. New York vs Detroit: 2-0
4. Indiana vs. Milwaukee 1-1
5. Oklahoma City vs. Memphis : 2-0
6. Houston vs. Golden State : 2-0
7. LA Lakers vs. Minnesota : 0-2
8. Denver vs. LA Clippers : 2-0

We then assume the predicted values to be true and add them to our actual values.

In [None]:
all_preds = all_preds.add_suffix('_ONE')
all_preds = all_preds.rename(columns = {'OPP_ONE': 'TEAM_ID_TWO', 'GAME_DATE_ONE': 'GAME_DATE'})
all_preds = all_preds[act_values.columns]

winners = [2, 1, 8, 15, 6, 12, 23, 13]
all_preds['WIN_ONE'] = [1 if team in winners else 0 for team in all_preds['TEAM_ID_ONE']]

act_values = pd.concat([act_values, all_preds], ignore_index = True)

#### Game 3
1. Cleveland (C, 2) vs. Miami (H, 11): 4/26 H
2. Boston (C, 1) vs. Orlando (M, 16): 4/25 M
3. New York (K, 15) vs. Detroit (P, 28): 4/24 P
4. Indiana (P, 17) vs. Milwaukee (B, 12): 4/25 B
5. Oklahoma City (T, 23) vs. Memphis (G, 26): 4/24 G
6. Houston (R, 8) vs Golden State (W, 7): 4/26 W
7. LA Lakers (L, 10) vs. Minnesota (T, 13): 4/25 T
8. Denver (N, 6) vs. LA Clippers (C, 9): 4/24 C

In [None]:
games = pd.DataFrame([[2, 11, pd.Timestamp('2025-04-26'), 0], 
         [1, 16, pd.Timestamp('2025-04-25'), 0], 
         [15, 28, pd.Timestamp('2025-04-24'), 0], 
         [17, 12, pd.Timestamp('2025-04-25'), 0], 
         [23, 26, pd.Timestamp('2025-04-24'), 0], 
         [8, 7, pd.Timestamp('2025-04-26'), 0], 
         [10, 13, pd.Timestamp('2025-04-25'), 0],
         [6, 9, pd.Timestamp('2025-04-24'), 0]], columns = ['TEAM_ONE', 'TEAM_TWO', 'GAME_DATE', 'HOME_ONE'])
preds, rolling_values, all_preds = playoff_predictions(act_values, rolling_values, games)
preds.to_csv('playoffs_round_one_three.csv', index = False)

Running these predictions, we found the model predicted the following outcomes:
1. Cleveland vs. Miami : 3-0
2. Boston vs Orlando : 3-0
3. New York vs Detroit: 3-0
4. Indiana vs. Milwaukee 2-1
5. Oklahoma City vs. Memphis : 3-0
6. Houston vs. Golden State : 3-0
7. LA Lakers vs. Minnesota : 1-2
8. Denver vs. LA Clippers : 2-1

We then assume the predicted values to be true and add them to our actual values.

In [None]:
all_preds = all_preds.add_suffix('_ONE')
all_preds = all_preds.rename(columns = {'OPP_ONE': 'TEAM_ID_TWO', 'GAME_DATE_ONE': 'GAME_DATE'})
all_preds = all_preds[act_values.columns]

winners = [2, 8, 1, 17, 10, 15, 23, 9]
all_preds['WIN_ONE'] = [1 if team in winners else 0 for team in all_preds['TEAM_ID_ONE']]

act_values = pd.concat([act_values, all_preds], ignore_index = True)

#### Game 4
1. Cleveland (C, 2) vs. Miami (H, 11): 4/28 H
2. Boston (C, 1) vs. Orlando (M, 16): 4/27 M
3. New York (K, 15) vs. Detroit (P, 28): 4/27 P
4. Indiana (P, 17) vs. Milwaukee (B, 12): 4/27 B
5. Oklahoma City (T, 23) vs. Memphis (G, 26): 4/26 G
6. Houston (R, 8) vs Golden State (W, 7): 4/28 W
7. LA Lakers (L, 10) vs. Minnesota (T, 13): 4/27 T
8. Denver (N, 6) vs. LA Clippers (C, 9): 4/26 C

In [None]:
games = pd.DataFrame([[2, 11, pd.Timestamp('2025-04-28'), 0], 
         [1, 16, pd.Timestamp('2025-04-27'), 0], 
         [15, 28, pd.Timestamp('2025-04-27'), 0], 
         [17, 12, pd.Timestamp('2025-04-27'), 0], 
         [23, 26, pd.Timestamp('2025-04-26'), 0], 
         [8, 7, pd.Timestamp('2025-04-28'), 0], 
         [10, 13, pd.Timestamp('2025-04-27'), 0],
         [6, 9, pd.Timestamp('2025-04-26'), 0]], columns = ['TEAM_ONE', 'TEAM_TWO', 'GAME_DATE', 'HOME_ONE'])
preds, rolling_values, all_preds = playoff_predictions(act_values, rolling_values, games)
preds.to_csv('playoffs_round_one_four.csv', index = False)

Running these predictions, we found the model predicted the following outcomes:
1. Cleveland vs. Miami : 4-0
2. Boston vs Orlando : 4-0
3. New York vs Detroit: 4-0
4. Indiana vs. Milwaukee 3-1
5. Oklahoma City vs. Memphis : 4-0
6. Houston vs. Golden State : 3-1
7. LA Lakers vs. Minnesota : 3-1
8. Denver vs. LA Clippers : 2-2

We then assume the predicted values to be true and add them to our actual values.

In [None]:
all_preds = all_preds.add_suffix('_ONE')
all_preds = all_preds.rename(columns = {'OPP_ONE': 'TEAM_ID_TWO', 'GAME_DATE_ONE': 'GAME_DATE'})
all_preds = all_preds[act_values.columns]

winners = [2, 8, 1, 15, 17, 10, 2, 8, 23, 6]
all_preds['WIN_ONE'] = [1 if team in winners else 0 for team in all_preds['TEAM_ID_ONE']]

act_values = pd.concat([act_values, all_preds], ignore_index = True)

#### Game 5
1. Cleveland (C, 2) vs. Miami (H, 11): COMPLETE
2. Boston (C, 1) vs. Orlando (M, 16): COMPLETE
3. New York (K, 15) vs. Detroit (P, 28): COMPLETE
4. Indiana (P, 17) vs. Milwaukee (B, 12): 4/29 P
5. Oklahoma City (T, 23) vs. Memphis (G, 26): COMPLETE
6. Houston (R, 8) vs Golden State (W, 7): 4/30 R
7. LA Lakers (L, 10) vs. Minnesota (T, 13): 4/30 L
8. Denver (N, 6) vs. LA Clippers (C, 9): 4/29 N

In [None]:
games = pd.DataFrame([[17, 12, pd.Timestamp('2025-04-29'), 1], 
         [8, 7, pd.Timestamp('2025-04-30'), 1], 
         [10, 13, pd.Timestamp('2025-04-30'), 1],
         [6, 9, pd.Timestamp('2025-04-29'), 1]], columns = ['TEAM_ONE', 'TEAM_TWO', 'GAME_DATE', 'HOME_ONE'])
preds, rolling_values, all_preds = playoff_predictions(act_values, rolling_values, games)
preds.to_csv('playoffs_round_one_five.csv', index = False)

Running these predictions, we found the model predicted the following outcomes:
1. Cleveland vs. Miami : 4-0
2. Boston vs Orlando : 4-0
3. New York vs Detroit: 4-0
4. Indiana vs. Milwaukee 3-2
5. Oklahoma City vs. Memphis : 4-0
6. Houston vs. Golden State : 3-2
7. LA Lakers vs. Minnesota : 3-2
8. Denver vs. LA Clippers : 2-3

We then assume the predicted values to be true and add them to our actual values.

In [None]:
all_preds = all_preds.add_suffix('_ONE')
all_preds = all_preds.rename(columns = {'OPP_ONE': 'TEAM_ID_TWO', 'GAME_DATE_ONE': 'GAME_DATE'})
all_preds = all_preds[act_values.columns]

winners = [12, 9, 7, 13]
all_preds['WIN_ONE'] = [1 if team in winners else 0 for team in all_preds['TEAM_ID_ONE']]

act_values = pd.concat([act_values, all_preds], ignore_index = True)

#### Game 6
1. Cleveland (C, 2) vs. Miami (H, 11): COMPLETE
2. Boston (C, 1) vs. Orlando (M, 16): COMPLETE
3. New York (K, 15) vs. Detroit (P, 28): COMPLETE
4. Indiana (P, 17) vs. Milwaukee (B, 12): 5/2 B
5. Oklahoma City (T, 23) vs. Memphis (G, 26): COMPLETE
6. Houston (R, 8) vs Golden State (W, 7): 5/2 W
7. LA Lakers (L, 10) vs. Minnesota (T, 13): 5/2 T
8. Denver (N, 6) vs. LA Clippers (C, 9): 5/1 C

In [None]:
games = pd.DataFrame([[17, 12, pd.Timestamp('2025-05-02'), 0], 
         [8, 7, pd.Timestamp('2025-05-02'), 0], 
         [10, 13, pd.Timestamp('2025-05-02'), 0],
         [6, 9, pd.Timestamp('2025-05-01'), 0]], columns = ['TEAM_ONE', 'TEAM_TWO', 'GAME_DATE', 'HOME_ONE'])
preds, rolling_values, all_preds = playoff_predictions(act_values, rolling_values, games)
preds.to_csv('playoffs_round_one_six.csv', index = False)

Running these predictions, we found the model predicted the following outcomes:
1. Cleveland vs. Miami : 4-0
2. Boston vs Orlando : 4-0
3. New York vs Detroit: 4-0
4. Indiana vs. Milwaukee 4-2
5. Oklahoma City vs. Memphis : 4-0
6. Houston vs. Golden State : 4-2
7. LA Lakers vs. Minnesota : 4-2
8. Denver vs. LA Clippers : 3-3

We then assume the predicted values to be true and add them to our actual values.

In [None]:
all_preds = all_preds.add_suffix('_ONE')
all_preds = all_preds.rename(columns = {'OPP_ONE': 'TEAM_ID_TWO', 'GAME_DATE_ONE': 'GAME_DATE'})
all_preds = all_preds[act_values.columns]

winners = [12, 9, 7, 13]
all_preds['WIN_ONE'] = [1 if team in winners else 0 for team in all_preds['TEAM_ID_ONE']]

act_values = pd.concat([act_values, all_preds], ignore_index = True)

#### Game 7
1. Cleveland (C, 2) vs. Miami (H, 11): COMPLETE
2. Boston (C, 1) vs. Orlando (M, 16): COMPLETE
3. New York (K, 15) vs. Detroit (P, 28): COMPLETE
4. Indiana (P, 17) vs. Milwaukee (B, 12): COMPLETE
5. Oklahoma City (T, 23) vs. Memphis (G, 26): COMPLETE
6. Houston (R, 8) vs Golden State (W, 7): COMPLETE
7. LA Lakers (L, 10) vs. Minnesota (T, 13): COMPLETE
8. Denver (N, 6) vs. LA Clippers (C, 9): 5/3 N

In [None]:
games = pd.DataFrame([[6, 9, pd.Timestamp('2025-05-03'), 1]], columns = ['TEAM_ONE', 'TEAM_TWO', 'GAME_DATE', 'HOME_ONE'])
preds, rolling_values, all_preds = playoff_predictions(act_values, rolling_values, games)
preds.to_csv('playoffs_round_one_seven.csv', index = False)

Running these predictions, we found the model predicted the following outcomes:
1. Cleveland vs. Miami : 4-0
2. Boston vs Orlando : 4-0
3. New York vs Detroit: 4-0
4. Indiana vs. Milwaukee 4-2
5. Oklahoma City vs. Memphis : 4-0
6. Houston vs. Golden State : 4-2
7. LA Lakers vs. Minnesota : 4-2
8. Denver vs. LA Clippers : 3-4

We then assume the predicted values to be true and add them to our actual values.

In [None]:
all_preds = all_preds.add_suffix('_ONE')
all_preds = all_preds.rename(columns = {'OPP_ONE': 'TEAM_ID_TWO', 'GAME_DATE_ONE': 'GAME_DATE'})
all_preds = all_preds[act_values.columns]

winners = [9]
all_preds['WIN_ONE'] = [1 if team in winners else 0 for team in all_preds['TEAM_ID_ONE']]

act_values = pd.concat([act_values, all_preds], ignore_index = True)

### Round 2
From these results, we get the following conference semifinal matchups. 
1. Oklahoma City (23) vs. LA Clippers (9) : OKC home first
2. Houston (8) vs. LA Lakers (10) : Houston home first
3. Cleveland (2) vs. Indiana (17) : Cleveland home first
4. Boston (1) vs. New York (15) : Boston home first

Games begin May 5-6 so we will just have each game be two days apart.

#### Game 1

In [None]:
games = pd.DataFrame([[23, 9, pd.Timestamp('2025-05-05'), 1], 
         [8, 10, pd.Timestamp('2025-05-05'), 1], 
         [2, 17, pd.Timestamp('2025-05-05'), 1],
         [1, 15, pd.Timestamp('2025-05-05'), 1]], columns = ['TEAM_ONE', 'TEAM_TWO', 'GAME_DATE', 'HOME_ONE'])
preds, rolling_values, all_preds = playoff_predictions(act_values, rolling_values, games)
preds.to_csv('playoffs_round_two_one.csv', index = False)

Running these predictions, we found the model predicted the following outcomes:
1. Oklahoma City vs. LA Clippers : 0-1
2. Houston vs. LA Lakers : 0-1
3. Cleveland vs. Indiana : 0-1
4. Boston vs. New York : 0-1

We then assume the predicted values to be true and add them to our actual values.

In [None]:
all_preds = all_preds.add_suffix('_ONE')
all_preds = all_preds.rename(columns = {'OPP_ONE': 'TEAM_ID_TWO', 'GAME_DATE_ONE': 'GAME_DATE'})
all_preds = all_preds[act_values.columns]

winners = [9, 10, 17, 15]
all_preds['WIN_ONE'] = [1 if team in winners else 0 for team in all_preds['TEAM_ID_ONE']]

act_values = pd.concat([act_values, all_preds], ignore_index = True)

#### Game 2

In [None]:
games = pd.DataFrame([[23, 9, pd.Timestamp('2025-05-07'), 1], 
         [8, 10, pd.Timestamp('2025-05-07'), 1], 
         [2, 17, pd.Timestamp('2025-05-07'), 1],
         [1, 15, pd.Timestamp('2025-05-07'), 1]], columns = ['TEAM_ONE', 'TEAM_TWO', 'GAME_DATE', 'HOME_ONE'])
preds, rolling_values, all_preds = playoff_predictions(act_values, rolling_values, games)
preds.to_csv('playoffs_round_two_two.csv', index = False)

Running these predictions, we found the model predicted the following outcomes:
1. Oklahoma City vs. LA Clippers : 0-2
2. Houston vs. LA Lakers : 0-2
3. Cleveland vs. Indiana : 0-2
4. Boston vs. New York : 0-2

We then assume the predicted values to be true and add them to our actual values.

In [None]:
all_preds = all_preds.add_suffix('_ONE')
all_preds = all_preds.rename(columns = {'OPP_ONE': 'TEAM_ID_TWO', 'GAME_DATE_ONE': 'GAME_DATE'})
all_preds = all_preds[act_values.columns]

winners = [9, 10, 17, 15]
all_preds['WIN_ONE'] = [1 if team in winners else 0 for team in all_preds['TEAM_ID_ONE']]

act_values = pd.concat([act_values, all_preds], ignore_index = True)

#### Game 3

In [None]:
games = pd.DataFrame([[23, 9, pd.Timestamp('2025-05-09'), 0], 
         [8, 10, pd.Timestamp('2025-05-09'), 0], 
         [2, 17, pd.Timestamp('2025-05-09'), 0],
         [1, 15, pd.Timestamp('2025-05-09'), 0]], columns = ['TEAM_ONE', 'TEAM_TWO', 'GAME_DATE', 'HOME_ONE'])
preds, rolling_values, all_preds = playoff_predictions(act_values, rolling_values, games)
preds.to_csv('playoffs_round_two_three.csv', index = False)

Running these predictions, we found the model predicted the following outcomes:
1. Oklahoma City vs. LA Clippers : 1-2
2. Houston vs. LA Lakers : 1-2
3. Cleveland vs. Indiana : 1-2
4. Boston vs. New York : 1-2

We then assume the predicted values to be true and add them to our actual values.

In [None]:
all_preds = all_preds.add_suffix('_ONE')
all_preds = all_preds.rename(columns = {'OPP_ONE': 'TEAM_ID_TWO', 'GAME_DATE_ONE': 'GAME_DATE'})
all_preds = all_preds[act_values.columns]

winners = [23, 8, 2, 1]
all_preds['WIN_ONE'] = [1 if team in winners else 0 for team in all_preds['TEAM_ID_ONE']]

act_values = pd.concat([act_values, all_preds], ignore_index = True)

#### Game 4

In [None]:
games = pd.DataFrame([[23, 9, pd.Timestamp('2025-05-11'), 0], 
         [8, 10, pd.Timestamp('2025-05-11'), 0], 
         [2, 17, pd.Timestamp('2025-05-11'), 0],
         [1, 15, pd.Timestamp('2025-05-11'), 0]], columns = ['TEAM_ONE', 'TEAM_TWO', 'GAME_DATE', 'HOME_ONE'])
preds, rolling_values, all_preds = playoff_predictions(act_values, rolling_values, games)
preds.to_csv('playoffs_round_two_four.csv', index = False)

Running these predictions, we found the model predicted the following outcomes:
1. Oklahoma City vs. LA Clippers : 2-2
2. Houston vs. LA Lakers : 2-2
3. Cleveland vs. Indiana : 2-2
4. Boston vs. New York : 2-2

We then assume the predicted values to be true and add them to our actual values.

In [None]:
all_preds = all_preds.add_suffix('_ONE')
all_preds = all_preds.rename(columns = {'OPP_ONE': 'TEAM_ID_TWO', 'GAME_DATE_ONE': 'GAME_DATE'})
all_preds = all_preds[act_values.columns]

winners = [23, 8, 2, 1]
all_preds['WIN_ONE'] = [1 if team in winners else 0 for team in all_preds['TEAM_ID_ONE']]

act_values = pd.concat([act_values, all_preds], ignore_index = True)

#### Game 5

In [None]:
games = pd.DataFrame([[23, 9, pd.Timestamp('2025-05-13'), 1], 
         [8, 10, pd.Timestamp('2025-05-13'), 1], 
         [2, 17, pd.Timestamp('2025-05-13'), 1],
         [1, 15, pd.Timestamp('2025-05-13'), 1]], columns = ['TEAM_ONE', 'TEAM_TWO', 'GAME_DATE', 'HOME_ONE'])
preds, rolling_values, all_preds = playoff_predictions(act_values, rolling_values, games)
preds.to_csv('playoffs_round_two_five.csv', index = False)

Running these predictions, we found the model predicted the following outcomes:
1. Oklahoma City vs. LA Clippers : 2-3
2. Houston vs. LA Lakers : 2-3
3. Cleveland vs. Indiana : 2-3
4. Boston vs. New York : 2-3

We then assume the predicted values to be true and add them to our actual values.

In [None]:
all_preds = all_preds.add_suffix('_ONE')
all_preds = all_preds.rename(columns = {'OPP_ONE': 'TEAM_ID_TWO', 'GAME_DATE_ONE': 'GAME_DATE'})
all_preds = all_preds[act_values.columns]

winners = [9, 10, 17, 15]
all_preds['WIN_ONE'] = [1 if team in winners else 0 for team in all_preds['TEAM_ID_ONE']]

act_values = pd.concat([act_values, all_preds], ignore_index = True)

#### Game 6

In [None]:
games = pd.DataFrame([[23, 9, pd.Timestamp('2025-05-15'), 0], 
         [8, 10, pd.Timestamp('2025-05-15'), 0], 
         [2, 17, pd.Timestamp('2025-05-15'), 0],
         [1, 15, pd.Timestamp('2025-05-15'), 0]], columns = ['TEAM_ONE', 'TEAM_TWO', 'GAME_DATE', 'HOME_ONE'])
preds, rolling_values, all_preds = playoff_predictions(act_values, rolling_values, games)
preds.to_csv('playoffs_round_two_six.csv', index = False)

Running these predictions, we found the model predicted the following outcomes:
1. Oklahoma City vs. LA Clippers : 3-3
2. Houston vs. LA Lakers : 3-3
3. Cleveland vs. Indiana : 3-3
4. Boston vs. New York : 3-3

We then assume the predicted values to be true and add them to our actual values.

In [None]:
all_preds = all_preds.add_suffix('_ONE')
all_preds = all_preds.rename(columns = {'OPP_ONE': 'TEAM_ID_TWO', 'GAME_DATE_ONE': 'GAME_DATE'})
all_preds = all_preds[act_values.columns]

winners = [23, 8, 2, 1]
all_preds['WIN_ONE'] = [1 if team in winners else 0 for team in all_preds['TEAM_ID_ONE']]

act_values = pd.concat([act_values, all_preds], ignore_index = True)

#### Game 7

In [None]:
games = pd.DataFrame([[23, 9, pd.Timestamp('2025-05-17'), 1], 
         [8, 10, pd.Timestamp('2025-05-17'), 1], 
         [2, 17, pd.Timestamp('2025-05-17'), 1],
         [1, 15, pd.Timestamp('2025-05-17'), 1]], columns = ['TEAM_ONE', 'TEAM_TWO', 'GAME_DATE', 'HOME_ONE'])
preds, rolling_values, all_preds = playoff_predictions(act_values, rolling_values, games)
preds.to_csv('playoffs_round_two_seven.csv', index = False)

Running these predictions, we found the model predicted the following outcomes:
1. Oklahoma City vs. LA Clippers : 3-4
2. Houston vs. LA Lakers : 3-4
3. Cleveland vs. Indiana : 3-4
4. Boston vs. New York : 3-4

We then assume the predicted values to be true and add them to our actual values.

In [None]:
all_preds = all_preds.add_suffix('_ONE')
all_preds = all_preds.rename(columns = {'OPP_ONE': 'TEAM_ID_TWO', 'GAME_DATE_ONE': 'GAME_DATE'})
all_preds = all_preds[act_values.columns]

winners = [9, 10, 17, 15]
all_preds['WIN_ONE'] = [1 if team in winners else 0 for team in all_preds['TEAM_ID_ONE']]

act_values = pd.concat([act_values, all_preds], ignore_index = True)

### Round 3 : Conference Finals
From these results, we get the following conference final matchups. 
1. LA Lakers (10) vs. LA Clippers (9) : LA Lakers home first
2. New York (15) vs. Indiana (17) : New York home first

Games begin May 20-21 with the following schedule: 

Western: 5/20, 5/22, 5/24, 5/26, 5/28, 5/30, 6/1

Eastern: 5/21, 5/23, 5/25, 5/27, 5/29, 5/31, 6/2

#### Game 1

In [None]:
games = pd.DataFrame([[10, 9, pd.Timestamp('2025-05-20'), 1], 
         [15, 17, pd.Timestamp('2025-05-21'), 1]], columns = ['TEAM_ONE', 'TEAM_TWO', 'GAME_DATE', 'HOME_ONE'])
preds, rolling_values, all_preds = playoff_predictions(act_values, rolling_values, games)
preds.to_csv('playoffs_round_three_one.csv', index = False)

Running these predictions, we found the model predicted the following outcomes:
1. LA Lakers (10) vs. LA Clippers (9) : 0-1
2. New York (15) vs. Indiana (17) : 0-1

We then assume the predicted values to be true and add them to our actual values.

In [None]:
all_preds = all_preds.add_suffix('_ONE')
all_preds = all_preds.rename(columns = {'OPP_ONE': 'TEAM_ID_TWO', 'GAME_DATE_ONE': 'GAME_DATE'})
all_preds = all_preds[act_values.columns]

winners = [9, 17]
all_preds['WIN_ONE'] = [1 if team in winners else 0 for team in all_preds['TEAM_ID_ONE']]

act_values = pd.concat([act_values, all_preds], ignore_index = True)

#### Game 2

In [None]:
games = pd.DataFrame([[10, 9, pd.Timestamp('2025-05-22'), 1], 
         [15, 17, pd.Timestamp('2025-05-23'), 1]], columns = ['TEAM_ONE', 'TEAM_TWO', 'GAME_DATE', 'HOME_ONE'])
preds, rolling_values, all_preds = playoff_predictions(act_values, rolling_values, games)
preds.to_csv('playoffs_round_three_two.csv', index = False)

Running these predictions, we found the model predicted the following outcomes:
1. LA Lakers (10) vs. LA Clippers (9) : 0-2
2. New York (15) vs. Indiana (17) : 0-2

We then assume the predicted values to be true and add them to our actual values.

In [None]:
all_preds = all_preds.add_suffix('_ONE')
all_preds = all_preds.rename(columns = {'OPP_ONE': 'TEAM_ID_TWO', 'GAME_DATE_ONE': 'GAME_DATE'})
all_preds = all_preds[act_values.columns]

winners = [9, 17]
all_preds['WIN_ONE'] = [1 if team in winners else 0 for team in all_preds['TEAM_ID_ONE']]

act_values = pd.concat([act_values, all_preds], ignore_index = True)

#### Game 3

In [None]:
games = pd.DataFrame([[10, 9, pd.Timestamp('2025-05-24'), 0], 
         [15, 17, pd.Timestamp('2025-05-25'), 0]], columns = ['TEAM_ONE', 'TEAM_TWO', 'GAME_DATE', 'HOME_ONE'])
preds, rolling_values, all_preds = playoff_predictions(act_values, rolling_values, games)
preds.to_csv('playoffs_round_three_three.csv', index = False)

Running these predictions, we found the model predicted the following outcomes:
1. LA Lakers (10) vs. LA Clippers (9) : 1-2
2. New York (15) vs. Indiana (17) : 1-2

We then assume the predicted values to be true and add them to our actual values.

In [None]:
all_preds = all_preds.add_suffix('_ONE')
all_preds = all_preds.rename(columns = {'OPP_ONE': 'TEAM_ID_TWO', 'GAME_DATE_ONE': 'GAME_DATE'})
all_preds = all_preds[act_values.columns]

winners = [10, 15]
all_preds['WIN_ONE'] = [1 if team in winners else 0 for team in all_preds['TEAM_ID_ONE']]

act_values = pd.concat([act_values, all_preds], ignore_index = True)

#### Game 4

In [None]:
games = pd.DataFrame([[10, 9, pd.Timestamp('2025-05-26'), 0], 
         [15, 17, pd.Timestamp('2025-05-27'), 0]], columns = ['TEAM_ONE', 'TEAM_TWO', 'GAME_DATE', 'HOME_ONE'])
preds, rolling_values, all_preds = playoff_predictions(act_values, rolling_values, games)
preds.to_csv('playoffs_round_three_four.csv', index = False)

Running these predictions, we found the model predicted the following outcomes:
1. LA Lakers (10) vs. LA Clippers (9) : 2-2
2. New York (15) vs. Indiana (17) : 2-2

We then assume the predicted values to be true and add them to our actual values.

In [None]:
all_preds = all_preds.add_suffix('_ONE')
all_preds = all_preds.rename(columns = {'OPP_ONE': 'TEAM_ID_TWO', 'GAME_DATE_ONE': 'GAME_DATE'})
all_preds = all_preds[act_values.columns]

winners = [10, 15]
all_preds['WIN_ONE'] = [1 if team in winners else 0 for team in all_preds['TEAM_ID_ONE']]

act_values = pd.concat([act_values, all_preds], ignore_index = True)

#### Game 5

In [None]:
games = pd.DataFrame([[10, 9, pd.Timestamp('2025-05-28'), 1], 
         [15, 17, pd.Timestamp('2025-05-29'), 1]], columns = ['TEAM_ONE', 'TEAM_TWO', 'GAME_DATE', 'HOME_ONE'])
preds, rolling_values, all_preds = playoff_predictions(act_values, rolling_values, games)
preds.to_csv('playoffs_round_three_five.csv', index = False)

Running these predictions, we found the model predicted the following outcomes:
1. LA Lakers (10) vs. LA Clippers (9) : 2-3
2. New York (15) vs. Indiana (17) : 2-3

We then assume the predicted values to be true and add them to our actual values.

In [None]:
all_preds = all_preds.add_suffix('_ONE')
all_preds = all_preds.rename(columns = {'OPP_ONE': 'TEAM_ID_TWO', 'GAME_DATE_ONE': 'GAME_DATE'})
all_preds = all_preds[act_values.columns]

winners = [9, 17]
all_preds['WIN_ONE'] = [1 if team in winners else 0 for team in all_preds['TEAM_ID_ONE']]

act_values = pd.concat([act_values, all_preds], ignore_index = True)

#### Game 6

In [None]:
games = pd.DataFrame([[10, 9, pd.Timestamp('2025-05-30'), 0], 
         [15, 17, pd.Timestamp('2025-05-31'), 0]], columns = ['TEAM_ONE', 'TEAM_TWO', 'GAME_DATE', 'HOME_ONE'])
preds, rolling_values, all_preds = playoff_predictions(act_values, rolling_values, games)
preds.to_csv('playoffs_round_three_six.csv', index = False)

Running these predictions, we found the model predicted the following outcomes:
1. LA Lakers (10) vs. LA Clippers (9) : 3-3
2. New York (15) vs. Indiana (17) : 3-3

We then assume the predicted values to be true and add them to our actual values.

In [None]:
all_preds = all_preds.add_suffix('_ONE')
all_preds = all_preds.rename(columns = {'OPP_ONE': 'TEAM_ID_TWO', 'GAME_DATE_ONE': 'GAME_DATE'})
all_preds = all_preds[act_values.columns]

winners = [10, 15]
all_preds['WIN_ONE'] = [1 if team in winners else 0 for team in all_preds['TEAM_ID_ONE']]

act_values = pd.concat([act_values, all_preds], ignore_index = True)

#### Game 7

In [None]:
games = pd.DataFrame([[10, 9, pd.Timestamp('2025-06-01'), 1], 
         [15, 17, pd.Timestamp('2025-06-02'), 1]], columns = ['TEAM_ONE', 'TEAM_TWO', 'GAME_DATE', 'HOME_ONE'])
preds, rolling_values, all_preds = playoff_predictions(act_values, rolling_values, games)
preds.to_csv('playoffs_round_three_seven.csv', index = False)

Running these predictions, we found the model predicted the following outcomes:
1. LA Lakers (10) vs. LA Clippers (9) : 3-4
2. New York (15) vs. Indiana (17) : 3-4

We then assume the predicted values to be true and add them to our actual values.

In [None]:
all_preds = all_preds.add_suffix('_ONE')
all_preds = all_preds.rename(columns = {'OPP_ONE': 'TEAM_ID_TWO', 'GAME_DATE_ONE': 'GAME_DATE'})
all_preds = all_preds[act_values.columns]

winners = [9, 17]
all_preds['WIN_ONE'] = [1 if team in winners else 0 for team in all_preds['TEAM_ID_ONE']]

act_values = pd.concat([act_values, all_preds], ignore_index = True)

### Round 4 : Finals
From these results, we get the following final matchups. 
1. Indiana (17) vs. LA Clippers (9) : Indiana home first

Games begin June 5 and have the following schedule : 6/5, 6/8, 6/11, 6/13, 6/16, 6/19, 6/22

#### Game 1

In [None]:
games = pd.DataFrame([[17, 9, pd.Timestamp('2025-06-05'), 1]], columns = ['TEAM_ONE', 'TEAM_TWO', 'GAME_DATE', 'HOME_ONE'])
preds, rolling_values, all_preds = playoff_predictions(act_values, rolling_values, games)
preds.to_csv('playoffs_round_four_one.csv', index = False)

Running these predictions, we found the model predicted the following outcomes:
1. Indiana (17) vs. LA Clippers (9): 0-1

We then assume the predicted values to be true and add them to our actual values.

In [None]:
all_preds = all_preds.add_suffix('_ONE')
all_preds = all_preds.rename(columns = {'OPP_ONE': 'TEAM_ID_TWO', 'GAME_DATE_ONE': 'GAME_DATE'})
all_preds = all_preds[act_values.columns]

winners = [9]
all_preds['WIN_ONE'] = [1 if team in winners else 0 for team in all_preds['TEAM_ID_ONE']]

act_values = pd.concat([act_values, all_preds], ignore_index = True)

#### Game 2

In [None]:
games = pd.DataFrame([[17, 9, pd.Timestamp('2025-06-08'), 1]], columns = ['TEAM_ONE', 'TEAM_TWO', 'GAME_DATE', 'HOME_ONE'])
preds, rolling_values, all_preds = playoff_predictions(act_values, rolling_values, games)
preds.to_csv('playoffs_round_four_two.csv', index = False)

Running these predictions, we found the model predicted the following outcomes:
1. Indiana (17) vs. LA Clippers (9): 0-2

We then assume the predicted values to be true and add them to our actual values.

In [None]:
all_preds = all_preds.add_suffix('_ONE')
all_preds = all_preds.rename(columns = {'OPP_ONE': 'TEAM_ID_TWO', 'GAME_DATE_ONE': 'GAME_DATE'})
all_preds = all_preds[act_values.columns]

winners = [9]
all_preds['WIN_ONE'] = [1 if team in winners else 0 for team in all_preds['TEAM_ID_ONE']]

act_values = pd.concat([act_values, all_preds], ignore_index = True)

#### Game 3

In [None]:
games = pd.DataFrame([[17, 9, pd.Timestamp('2025-06-11'), 0]], columns = ['TEAM_ONE', 'TEAM_TWO', 'GAME_DATE', 'HOME_ONE'])
preds, rolling_values, all_preds = playoff_predictions(act_values, rolling_values, games)
preds.to_csv('playoffs_round_four_three.csv', index = False)

Running these predictions, we found the model predicted the following outcomes:
1. Indiana (17) vs. LA Clippers (9): 1-2

We then assume the predicted values to be true and add them to our actual values.

In [None]:
all_preds = all_preds.add_suffix('_ONE')
all_preds = all_preds.rename(columns = {'OPP_ONE': 'TEAM_ID_TWO', 'GAME_DATE_ONE': 'GAME_DATE'})
all_preds = all_preds[act_values.columns]

winners = [17]
all_preds['WIN_ONE'] = [1 if team in winners else 0 for team in all_preds['TEAM_ID_ONE']]

act_values = pd.concat([act_values, all_preds], ignore_index = True)

#### Game 4

In [None]:
games = pd.DataFrame([[17, 9, pd.Timestamp('2025-06-13'), 0]], columns = ['TEAM_ONE', 'TEAM_TWO', 'GAME_DATE', 'HOME_ONE'])
preds, rolling_values, all_preds = playoff_predictions(act_values, rolling_values, games)
preds.to_csv('playoffs_round_four_four.csv', index = False)

Running these predictions, we found the model predicted the following outcomes:
1. Indiana (17) vs. LA Clippers (9): 2-2

We then assume the predicted values to be true and add them to our actual values.

In [None]:
all_preds = all_preds.add_suffix('_ONE')
all_preds = all_preds.rename(columns = {'OPP_ONE': 'TEAM_ID_TWO', 'GAME_DATE_ONE': 'GAME_DATE'})
all_preds = all_preds[act_values.columns]

winners = [17]
all_preds['WIN_ONE'] = [1 if team in winners else 0 for team in all_preds['TEAM_ID_ONE']]

act_values = pd.concat([act_values, all_preds], ignore_index = True)

#### Game 5

In [None]:
games = pd.DataFrame([[17, 9, pd.Timestamp('2025-06-16'), 1]], columns = ['TEAM_ONE', 'TEAM_TWO', 'GAME_DATE', 'HOME_ONE'])
preds, rolling_values, all_preds = playoff_predictions(act_values, rolling_values, games)
preds.to_csv('playoffs_round_four_five.csv', index = False)

Running these predictions, we found the model predicted the following outcomes:
1. Indiana (17) vs. LA Clippers (9): 2-3

We then assume the predicted values to be true and add them to our actual values.

In [None]:
all_preds = all_preds.add_suffix('_ONE')
all_preds = all_preds.rename(columns = {'OPP_ONE': 'TEAM_ID_TWO', 'GAME_DATE_ONE': 'GAME_DATE'})
all_preds = all_preds[act_values.columns]

winners = [9]
all_preds['WIN_ONE'] = [1 if team in winners else 0 for team in all_preds['TEAM_ID_ONE']]

act_values = pd.concat([act_values, all_preds], ignore_index = True)

#### Game 6

In [None]:
games = pd.DataFrame([[17, 9, pd.Timestamp('2025-06-19'), 0]], columns = ['TEAM_ONE', 'TEAM_TWO', 'GAME_DATE', 'HOME_ONE'])
preds, rolling_values, all_preds = playoff_predictions(act_values, rolling_values, games)
preds.to_csv('playoffs_round_four_six.csv', index = False)

Running these predictions, we found the model predicted the following outcomes:
1. Indiana (17) vs. LA Clippers (9): 3-3

We then assume the predicted values to be true and add them to our actual values.

In [None]:
all_preds = all_preds.add_suffix('_ONE')
all_preds = all_preds.rename(columns = {'OPP_ONE': 'TEAM_ID_TWO', 'GAME_DATE_ONE': 'GAME_DATE'})
all_preds = all_preds[act_values.columns]

winners = [17]
all_preds['WIN_ONE'] = [1 if team in winners else 0 for team in all_preds['TEAM_ID_ONE']]

act_values = pd.concat([act_values, all_preds], ignore_index = True)

#### Game 7

In [None]:
games = pd.DataFrame([[17, 9, pd.Timestamp('2025-06-22'), 1]], columns = ['TEAM_ONE', 'TEAM_TWO', 'GAME_DATE', 'HOME_ONE'])
preds, rolling_values, all_preds = playoff_predictions(act_values, rolling_values, games)
preds.to_csv('playoffs_round_four_seven.csv', index = False)

Running these predictions, we found the model predicted the following outcomes:
1. Indiana (17) vs. LA Clippers (9): 3-4

We then assume the predicted values to be true and add them to our actual values.

In [None]:
all_preds = all_preds.add_suffix('_ONE')
all_preds = all_preds.rename(columns = {'OPP_ONE': 'TEAM_ID_TWO', 'GAME_DATE_ONE': 'GAME_DATE'})
all_preds = all_preds[act_values.columns]

winners = [9]
all_preds['WIN_ONE'] = [1 if team in winners else 0 for team in all_preds['TEAM_ID_ONE']]

act_values = pd.concat([act_values, all_preds], ignore_index = True)