# Basketball Playoffs Qualification

## Task description

Basketball tournaments are usually split in two parts. First, all teams play each other aiming to achieve the greatest number of wins possible. Then, at the end of the first part of the season, a pre determined number of teams which were able to win the most games are qualified to the playoff season, where they play series of knock-out matches for the trophy.

For the 10 years, data from players, teams, coaches, games and several other metrics were gathered and arranged on this dataset. The goal is to use this data to predict which teams will qualify for the playoffs in the next season.

## Data preparation

### Creating the database

First, we need to convert the CSV files to tables in an SQLite database, so we can analyze, manipulate and prepare data more easily. This was done with a couple of SQlite3 commands:

```
.mode csv
.import dataset/awards_players.csv awards_players
.import dataset/coaches.csv coaches
.import dataset/players.csv players
.import dataset/players_teams.csv players_teams
.import dataset/series_post.csv series_post
.import dataset/teams_post.csv teams_post
.import dataset/teams.csv teams
.save database.db
```

### Filtering unneeded rows and columns

Upon closer inspection of the dataset, we found some rows which had no effect or could have a negative impact in our models training, such as rows in the players table which corresponded to current coaches, and thus had no information related to their height, weight, etc.

## Model performance measures

### The Game Score measure
The Game Score measure, created by John Hollinger, attempts to give an estimation of a player's productivity for a single game. We will start working on our model based on this measure, applying it to each player based on a whole season's stats and dividing it by the amount of games played.


## Data Preparation and Metrics

In [72]:
import sqlite3
import pandas as pd
import numpy as np

Create dataframes based on the database and relations between data

In [73]:
con = sqlite3.connect("database.db")

# Player <-> Awards
pl_aw = pd.read_sql_query('''
    SELECT players_teams.playerID, players_teams.tmID,
        awards_players.award, awards_players.year
    FROM awards_players 
    LEFT JOIN players_teams
    ON (
        awards_players.playerID = players_teams.playerID 
        AND awards_players.year = players_teams.year
    )''', con)

# Coach <-> Awards
cc_aw = pd.read_sql_query('''
        SELECT playerID, award, c.year, c.tmID
        FROM awards_players
        INNER JOIN
        (
                SELECT coaches.coachID, teams.year, coaches.year, teams.tmID, coaches.tmID
                FROM teams
                INNER JOIN coaches
                ON 
                (coaches.tmID = teams.tmID AND coaches.year = teams.year)    
        ) AS c
        ON
        (awards_players.playerID = c.coachID AND awards_players.year = c.year)
''', con)

# Player <-> Teams
pl_tm = pd.read_sql_query("SELECT * FROM players_teams INNER JOIN players ON players_teams.playerID = players.bioID", con)

# Teams <-> Post Season Results (aggregated)
tm_psa = pd.read_sql_query('''
    SELECT teams.year, teams.lgID, teams.tmID, franchID,
       confID, divID, rank, playoff, seeded, firstRound, semis,
       finals, name, o_fgm, o_fga, o_ftm, o_fta, o_3pm, o_3pa,
       o_oreb, o_dreb, o_reb, o_asts, o_pf, o_stl, o_to, o_blk,
       o_pts, d_fgm, d_fga, d_ftm, d_fta, d_3pm, d_3pa, d_oreb,
       d_dreb, d_reb, d_asts, d_pf, d_stl, d_to, d_blk, d_pts,
       tmORB, tmDRB, tmTRB, opptmORB, opptmDRB, opptmTRB, won,
       lost, GP, homeW, homeL, awayW, awayL, confW, confL,
       min, attend, arena,W, L
    FROM teams_post 
    INNER JOIN teams 
    ON (
        teams_post.tmID = teams.tmID 
        AND teams_post.year = teams.year
    )''', con)

# Coach <-> Teams
cc_tm = pd.read_sql_query("SELECT * FROM coaches INNER JOIN teams ON (coaches.tmID = teams.tmID AND coaches.year = teams.year)", con)

# Teams <-> Post Series Results
tm_pss = pd.read_sql_query('''
    SELECT winners.winnersID, winners.year, winners.winnersPlayoff, winners.winnersRank, losers.tmID, losers.playoff, losers.rank
    FROM
    (
        SELECT teams.tmID AS winnersID, teams.year AS year, teams.playoff AS winnersPlayoff, teams.rank AS winnersRank, series_post.tmIDLoser AS tmIDLoser
        FROM series_post 
        INNER JOIN teams
        ON
        (series_post.tmIDWinner = teams.tmID AND series_post.year = teams.year)
    ) AS winners
    JOIN teams AS losers
    ON
    (winners.tmIDLoser = losers.tmID AND winners.year = losers.year)
''', con)


Create the dataframe, `df`, to be used with the models

In [74]:
df = pd.read_sql_query("SELECT * FROM teams", con)
df['year'] = df['year'].astype(int)
df.sort_values(by=['year'], inplace=True)
df

Unnamed: 0,year,lgID,tmID,franchID,confID,divID,rank,playoff,seeded,firstRound,...,GP,homeW,homeL,awayW,awayL,confW,confL,min,attend,arena
63,1,WNBA,MIA,MIA,EA,,6,N,0,,...,32,9,7,4,12,9,12,6475,127721,AmericanAirlines Arena
24,1,WNBA,DET,DET,EA,,5,N,0,,...,32,8,8,6,10,10,11,6425,107289,The Palace of Auburn Hills
89,1,WNBA,PHO,PHO,WE,,4,Y,0,L,...,32,11,5,9,7,11,10,6425,161075,US Airways Center
129,1,WNBA,UTA,SAS,WE,,5,N,0,,...,32,12,4,6,10,13,8,6400,103442,EnergySolutions Arena
99,1,WNBA,POR,POR,WE,,7,N,0,,...,32,6,10,4,12,4,17,6525,133076,Rose Garden Arena
...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...
75,10,WNBA,MIN,MIN,WE,,5,N,0,,...,34,9,8,5,12,7,13,6875,128127,Target Center
85,10,WNBA,NYL,NYL,EA,,7,N,0,,...,34,8,9,5,12,9,13,6900,166604,Madison Square Garden (IV)
98,10,WNBA,PHO,PHO,WE,,1,Y,0,W,...,34,12,5,11,6,13,7,6900,144884,US Airways Center
52,10,WNBA,IND,IND,EA,,1,Y,0,W,...,34,14,3,8,9,17,5,6925,134964,Conseco Fieldhouse


In [75]:
# pd.set_option('display.max_rows', None)
# pd.set_option('display.max_columns', None)
pd.options.display.float_format = '{:.2f}'.format

In [76]:
for col in ['o_pts', 'o_fgm', 'o_fga', 'o_3pm', 'o_fta', 'o_ftm', 'o_oreb', 'o_dreb', 'o_stl', 'o_asts', 'o_blk', 'o_pf', 'o_to', 'GP']:
    df[col] = df[col].astype(int)

for col in ['d_pts', 'd_fgm', 'd_fga', 'd_3pm', 'd_fta', 'd_ftm', 'd_oreb', 'd_dreb', 'd_stl', 'd_asts', 'd_blk', 'd_pf', 'd_to', 'GP']:
    df[col] = df[col].astype(int)

df['metric_game_score'] = (df['o_pts'] + 0.4 * df['o_fgm'] - 0.7 * df['o_fga'] - 0.4 * (df['o_fta'] - df['o_ftm']) + 0.7 * df['o_oreb'] + 0.3 * df['o_dreb'] + df['o_stl'] + 0.7 * df['o_asts'] + 0.7 * df['o_blk'] - 0.4 * df['o_pf'] - df['o_to']) / df['GP']
df['def_metric_game_score'] = (df['d_pts'] + 0.4 * df['d_fgm'] - 0.7 * df['d_fga'] - 0.4 * (df['d_fta'] - df['d_ftm']) + 0.7 * df['d_oreb'] + 0.3 * df['d_dreb'] + df['d_stl'] + 0.7 * df['d_asts'] + 0.7 * df['d_blk'] - 0.4 * df['d_pf'] - df['d_to']) / df['GP']
df['metric_game_score'] = (df['metric_game_score'] + df['def_metric_game_score'])/2

print(df.sort_values(by='metric_game_score', ascending=False)['metric_game_score'])

98    66.12
96    62.60
97    62.56
95    61.68
1     57.69
       ... 
120   40.95
64    39.85
119   39.50
14    39.04
63    35.54
Name: metric_game_score, Length: 142, dtype: float64


Game Score, applied to the season and to the players

In [88]:
for col in ['points', 'fgMade', 'fgAttempted', 'ftAttempted', 'ftMade', 'oRebounds', 'dRebounds', 'steals', 'assists', 'blocks', 'PF', 'turnovers', 'GP']:
    pl_tm[col] = pl_tm[col].astype(int)

pl_tm['metric_game_score'] = (pl_tm['points'] + 0.4 * pl_tm['fgMade'] - 0.7 * pl_tm['fgAttempted'] - 0.4 * (pl_tm['ftAttempted'] - pl_tm['ftMade']) + 0.7 * pl_tm['oRebounds'] + 0.3 * pl_tm['dRebounds'] + pl_tm['steals'] + 0.7 * pl_tm['assists'] + 0.7 * pl_tm['blocks'] - 0.4 * pl_tm['PF'] - pl_tm['turnovers']) / pl_tm['GP']

mean_gs = pl_tm.groupby(['tmID', 'year'])["metric_game_score"].mean().reset_index().sort_values(by='metric_game_score', ascending=False)

for idx, x in mean_gs.iterrows():
    year_condition = df['year'] == int(x['year'])
    tmID_condition = df['tmID'] == str(x['tmID'])
    df.loc[year_condition & tmID_condition, "mean player game score"] = x['metric_game_score']

df

Unnamed: 0,year,lgID,tmID,franchID,confID,divID,rank,playoff,seeded,firstRound,...,Coach of the Year,Defensive Player of the Year,Kim Perrot Sportsmanship Award,Most Improved Player,Most Valuable Player,Rookie of the Year,Sixth Woman of the Year,WNBA Finals Most Valuable Player,playoff_results,mean player game score
63,1,WNBA,MIA,MIA,EA,,6,0,0,,...,0.00,0.00,0.00,0.00,0.00,0.00,0.00,0.00,-1,2.73
24,1,WNBA,DET,DET,EA,,5,0,0,,...,0.00,0.00,0.00,0.00,0.00,0.00,0.00,0.00,-1,4.21
89,1,WNBA,PHO,PHO,WE,,4,1,0,L,...,0.00,0.00,0.00,0.00,0.00,0.00,0.00,0.00,0,4.13
129,1,WNBA,UTA,SAS,WE,,5,0,0,,...,0.00,0.00,0.00,0.00,0.00,0.00,0.00,0.00,-1,4.69
99,1,WNBA,POR,POR,WE,,7,0,0,,...,0.00,0.00,0.00,0.00,0.00,0.00,0.00,0.00,-1,3.53
...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...
75,10,WNBA,MIN,MIN,WE,,5,0,0,,...,0.00,0.00,0.00,0.00,0.00,0.00,0.00,0.00,-1,5.74
85,10,WNBA,NYL,NYL,EA,,7,0,0,,...,0.00,0.00,0.00,0.00,0.00,0.00,0.00,0.00,-1,4.73
98,10,WNBA,PHO,PHO,WE,,1,1,0,W,...,0.00,0.00,0.00,0.00,1.00,0.00,1.00,1.00,3,6.33
52,10,WNBA,IND,IND,EA,,1,1,0,W,...,0.00,1.00,0.00,0.00,0.00,0.00,0.00,0.00,2,4.86


Create features that represent each award

In [78]:
awards = pd.read_sql_query("SELECT * FROM awards_players", con)
players_teams = pd.read_sql_query("SELECT * FROM players_teams", con)
coaches = pd.read_sql_query("SELECT * FROM coaches", con)

awards = awards[awards["award"] != "WNBA All-Decade Team"]
awards = awards[awards["award"] != "WNBA All Decade Team Honorable Mention"]

merged = awards.merge(players_teams, on=["playerID", "year"], how="left")
merged = merged.merge(coaches, left_on=["playerID", "year"], right_on=["coachID", "year"], how="left")

merged.replace("Kim Perrot Sportsmanship", "Kim Perrot Sportsmanship Award", inplace=True)

merged['tmID_x'] = merged['tmID_x'].fillna(merged['tmID_y'])

for idx, x in merged.iterrows():
    year_condition = df['year'] == int(x['year'])
    tmID_condition = df['tmID'] == str(x['tmID_x'])
    df.loc[year_condition & tmID_condition, x["award"]] = 1

awards = merged["award"].unique()

for award in awards:
    df[award].replace({1.00: 1, np.nan: 0}, inplace=True)

df

Unnamed: 0,year,lgID,tmID,franchID,confID,divID,rank,playoff,seeded,firstRound,...,mean game score,All-Star Game Most Valuable Player,Coach of the Year,Defensive Player of the Year,Kim Perrot Sportsmanship Award,Most Improved Player,Most Valuable Player,Rookie of the Year,Sixth Woman of the Year,WNBA Finals Most Valuable Player
63,1,WNBA,MIA,MIA,EA,,6,N,0,,...,2.73,0.00,0.00,0.00,0.00,0.00,0.00,0.00,0.00,0.00
24,1,WNBA,DET,DET,EA,,5,N,0,,...,4.21,0.00,0.00,0.00,0.00,0.00,0.00,0.00,0.00,0.00
89,1,WNBA,PHO,PHO,WE,,4,Y,0,L,...,4.13,0.00,0.00,0.00,0.00,0.00,0.00,0.00,0.00,0.00
129,1,WNBA,UTA,SAS,WE,,5,N,0,,...,4.69,0.00,0.00,0.00,0.00,0.00,0.00,0.00,0.00,0.00
99,1,WNBA,POR,POR,WE,,7,N,0,,...,3.53,0.00,0.00,0.00,0.00,0.00,0.00,0.00,0.00,0.00
...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...
75,10,WNBA,MIN,MIN,WE,,5,N,0,,...,5.74,0.00,0.00,0.00,0.00,0.00,0.00,0.00,0.00,0.00
85,10,WNBA,NYL,NYL,EA,,7,N,0,,...,4.73,0.00,0.00,0.00,0.00,0.00,0.00,0.00,0.00,0.00
98,10,WNBA,PHO,PHO,WE,,1,Y,0,W,...,6.33,0.00,0.00,0.00,0.00,0.00,1.00,0.00,1.00,1.00
52,10,WNBA,IND,IND,EA,,1,Y,0,W,...,4.86,0.00,0.00,1.00,0.00,0.00,0.00,0.00,0.00,0.00


Condense a team's playoff run into a single column

In [79]:
def condense_playoff(x):
    result = 0;
    if (x['playoff'] == "N"):
        result = -1
    else:
        result += 1 if x['firstRound'] == "W" else 0
        result += 1 if x['semis'] == "W" else 0
        result += 1 if x['finals'] == "W" else 0
    
    return result

df['playoff_results'] = df.apply(condense_playoff, axis=1)

df

Unnamed: 0,year,lgID,tmID,franchID,confID,divID,rank,playoff,seeded,firstRound,...,All-Star Game Most Valuable Player,Coach of the Year,Defensive Player of the Year,Kim Perrot Sportsmanship Award,Most Improved Player,Most Valuable Player,Rookie of the Year,Sixth Woman of the Year,WNBA Finals Most Valuable Player,playoff_results
63,1,WNBA,MIA,MIA,EA,,6,N,0,,...,0.00,0.00,0.00,0.00,0.00,0.00,0.00,0.00,0.00,-1
24,1,WNBA,DET,DET,EA,,5,N,0,,...,0.00,0.00,0.00,0.00,0.00,0.00,0.00,0.00,0.00,-1
89,1,WNBA,PHO,PHO,WE,,4,Y,0,L,...,0.00,0.00,0.00,0.00,0.00,0.00,0.00,0.00,0.00,0
129,1,WNBA,UTA,SAS,WE,,5,N,0,,...,0.00,0.00,0.00,0.00,0.00,0.00,0.00,0.00,0.00,-1
99,1,WNBA,POR,POR,WE,,7,N,0,,...,0.00,0.00,0.00,0.00,0.00,0.00,0.00,0.00,0.00,-1
...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...
75,10,WNBA,MIN,MIN,WE,,5,N,0,,...,0.00,0.00,0.00,0.00,0.00,0.00,0.00,0.00,0.00,-1
85,10,WNBA,NYL,NYL,EA,,7,N,0,,...,0.00,0.00,0.00,0.00,0.00,0.00,0.00,0.00,0.00,-1
98,10,WNBA,PHO,PHO,WE,,1,Y,0,W,...,0.00,0.00,0.00,0.00,0.00,1.00,0.00,1.00,1.00,3
52,10,WNBA,IND,IND,EA,,1,Y,0,W,...,0.00,0.00,1.00,0.00,0.00,0.00,0.00,0.00,0.00,2


## Creating and training the model

In [80]:
from sklearn.model_selection import train_test_split
from sklearn.metrics import accuracy_score, precision_score, f1_score

### Decision Tree Classifier

In [81]:
from sklearn.tree import DecisionTreeClassifier
from sklearn import tree

df["playoff"].replace({"N": 0, "Y": 1}, inplace=True)

model = DecisionTreeClassifier(random_state=48)

X_file, Y_file = df.drop("playoff", axis=1), df["playoff"]

for column in df.columns:
    if (column not in ["playoff", "eFG%", "FTA_rate", "metric_game_score"]):
        X_file.drop(column, axis=1, inplace=True)

# Fit the model to the training data
x_train, x_test, y_train, y_test = train_test_split(X_file, Y_file, test_size=0.19, shuffle=False)
trained_model = model.fit(x_train, y_train)

# Predict using the trained model
y_prediction = trained_model.predict(x_test)

print(accuracy_score(y_test, y_prediction))
print(precision_score(y_test, y_prediction))
print(f1_score(y_test, y_prediction))

[print(f"{trained_model.feature_names_in_[idx]}: {round(x*100, 2)}%") for idx, x in enumerate(trained_model.feature_importances_)]


0.5185185185185185
0.6
0.5806451612903225
metric_game_score: 100.0%


[None]

### Naive Bayes Gaussian and Mulitnomial

In [82]:
from sklearn.naive_bayes import GaussianNB
from sklearn.naive_bayes import MultinomialNB

model = GaussianNB()

X_file, Y_file = df.drop("playoff", axis=1), df["playoff"]

for column in df.columns:
    if (column not in ["playoff", "metric_game_score","d_fgm","d_fga","d_ftm","d_fta","d_3pm","d_3pa","d_oreb","d_dreb","d_reb","d_asts","d_pf","d_stl","d_to","d_blk","d_pts"]):
        X_file.drop(column, axis=1, inplace=True)

# Fit the model to the training data
x_train, x_test, y_train, y_test = train_test_split(X_file, Y_file, test_size=0.19, shuffle=False)
trained_model = model.fit(x_train, y_train)

# Predict using the trained model
y_prediction = trained_model.predict(x_test)

print(accuracy_score(y_test, y_prediction))
print(precision_score(y_test, y_prediction))
print(f1_score(y_test, y_prediction))
print()
model = MultinomialNB()

X_file, Y_file = df.drop("playoff", axis=1), df["playoff"]

for column in df.columns:
    if (column not in ["playoff", "metric_game_score","d_fgm","d_fga","d_ftm","d_fta","d_3pm","d_3pa","d_oreb","d_dreb","d_reb","d_asts","d_pf","d_stl","d_to","d_blk","d_pts"]):
        X_file.drop(column, axis=1, inplace=True)

# Fit the model to the training data
x_train, x_test, y_train, y_test = train_test_split(X_file, Y_file, test_size=0.19, shuffle=False)
trained_model = model.fit(x_train, y_train)

# Predict using the trained model
y_prediction = trained_model.predict(x_test)

print(accuracy_score(y_test, y_prediction))
print(precision_score(y_test, y_prediction))
print(f1_score(y_test, y_prediction))

# [print(f"{trained_model.feature_names_in_[idx]}: {x}") for idx, x in enumerate(trained_model.feature_importances_)]

0.5555555555555556
0.75
0.5

0.6666666666666666
0.7058823529411765
0.7272727272727272


### KNNeighbors

In [83]:
from sklearn.neighbors import KNeighborsClassifier

model = KNeighborsClassifier()

X_file, Y_file = df.drop("playoff", axis=1), df["playoff"]

for column in df.columns:
    if (column not in ["playoff", "metric_game_score","d_fgm","d_fga","d_ftm","d_fta","d_3pm","d_3pa","d_oreb","d_dreb","d_reb","d_asts","d_pf","d_stl","d_to","d_blk","d_pts"]):
        X_file.drop(column, axis=1, inplace=True)

# Fit the model to the training data
x_train, x_test, y_train, y_test = train_test_split(X_file, Y_file, test_size=0.19, shuffle=False)
trained_model = model.fit(x_train, y_train)

# Predict using the trained model
y_prediction = trained_model.predict(x_test)

print(accuracy_score(y_test, y_prediction))
print(precision_score(y_test, y_prediction))
print(f1_score(y_test, y_prediction))


0.6296296296296297
0.6875
0.6875


### Random Forest

In [84]:
from sklearn.ensemble import RandomForestClassifier

model = RandomForestClassifier()

X_file, Y_file = df.drop("playoff", axis=1), df["playoff"]

for column in df.columns:
    if (column not in ["playoff", "metric_game_score","d_fgm","d_fga","d_ftm","d_fta","d_3pm","d_3pa","d_oreb","d_dreb","d_reb","d_asts","d_pf","d_stl","d_to","d_blk","d_pts"]):
        X_file.drop(column, axis=1, inplace=True)

# Fit the model to the training data
x_train, x_test, y_train, y_test = train_test_split(X_file, Y_file, test_size=0.19, shuffle=False)
trained_model = model.fit(x_train, y_train)

# Predict using the trained model
y_prediction = trained_model.predict(x_test)

print(accuracy_score(y_test, y_prediction))
print(precision_score(y_test, y_prediction))
print(f1_score(y_test, y_prediction))


0.6666666666666666
0.7058823529411765
0.7272727272727272


### Logistic Regression

In [85]:
from sklearn.linear_model import LogisticRegression

model = LogisticRegression(max_iter=1000)

X_file, Y_file = df.drop("playoff", axis=1), df["playoff"]

for column in df.columns:
    if (column not in ["playoff", "metric_game_score","d_fgm","d_fga","d_ftm","d_fta","d_3pm","d_3pa","d_oreb","d_dreb","d_reb","d_asts","d_pf","d_stl","d_to","d_blk","d_pts"]):
        X_file.drop(column, axis=1, inplace=True)

# Fit the model to the training data
x_train, x_test, y_train, y_test = train_test_split(X_file, Y_file, test_size=0.19, shuffle=False)
trained_model = model.fit(x_train, y_train)

# Predict using the trained model
y_prediction = trained_model.predict(x_test)

print(accuracy_score(y_test, y_prediction))
print(precision_score(y_test, y_prediction))
print(f1_score(y_test, y_prediction))


0.8148148148148148
0.9230769230769231
0.8275862068965517


### Support Vector Machine

In [86]:
from sklearn.svm import SVC

model = SVC()

X_file, Y_file = df.drop("playoff", axis=1), df["playoff"]

for column in df.columns:
    if (column not in ["playoff", "metric_game_score","d_fgm","d_fga","d_ftm","d_fta","d_3pm","d_3pa","d_oreb","d_dreb","d_reb","d_asts","d_pf","d_stl","d_to","d_blk","d_pts"]):
        X_file.drop(column, axis=1, inplace=True)

# Fit the model to the training data
x_train, x_test, y_train, y_test = train_test_split(X_file, Y_file, test_size=0.19, shuffle=False)
trained_model = model.fit(x_train, y_train)

# Predict using the trained model
y_prediction = trained_model.predict(x_test)

print(accuracy_score(y_test, y_prediction))
print(precision_score(y_test, y_prediction))
print(f1_score(y_test, y_prediction))


0.5925925925925926
0.5925925925925926
0.7441860465116279


### Gradient Boosted Trees

In [87]:
from sklearn.ensemble import GradientBoostingClassifier

model = GradientBoostingClassifier()

X_file, Y_file = df.drop("playoff", axis=1), df["playoff"]

for column in df.columns:
    if (column not in ["playoff", "metric_game_score","d_fgm","d_fga","d_ftm","d_fta","d_3pm","d_3pa","d_oreb","d_dreb","d_reb","d_asts","d_pf","d_stl","d_to","d_blk","d_pts"]):
        X_file.drop(column, axis=1, inplace=True)

# Fit the model to the training data
x_train, x_test, y_train, y_test = train_test_split(X_file, Y_file, test_size=0.19, shuffle=False)
trained_model = model.fit(x_train, y_train)

# Predict using the trained model
y_prediction = trained_model.predict(x_test)

print(accuracy_score(y_test, y_prediction))
print(precision_score(y_test, y_prediction))
print(f1_score(y_test, y_prediction))

[print(f"{trained_model.feature_names_in_[idx]}: {x}") for idx, x in enumerate(trained_model.feature_importances_)]


0.6296296296296297
0.65
0.7222222222222223
d_fgm: 0.022408136998388683
d_fga: 0.08818702413759011
d_ftm: 0.05770756478182807
d_fta: 0.06485657693430147
d_3pm: 0.10381169346162217
d_3pa: 0.1391152098677398
d_oreb: 0.03444211738802579
d_dreb: 0.021805594313196897
d_reb: 0.03263623030168877
d_asts: 0.04431618962656821
d_pf: 0.0481037554828007
d_stl: 0.0851991322698658
d_to: 0.033784423705231166
d_blk: 0.10332403300570689
d_pts: 0.061811361361126485
metric_game_score: 0.058490956364318926


[None,
 None,
 None,
 None,
 None,
 None,
 None,
 None,
 None,
 None,
 None,
 None,
 None,
 None,
 None,
 None]