# March Madness Bracket Predictor

### The objective of this project is to use machine learning models in order to predict the outcomes of March Madness games, then using these predictions to generate a bracket. 

*This project is a work in progress. The initial goal is to get a minimum working example and then progressively improve the results via data exploration, additional models, further hyperparameter tuning, etc.*

## Outline:
1. Problem Definition
2. Data Explaination
3. Evaluation
4. Feature Selection
5. Modeling
6. Results and Summary

## 1. Problem Definition
March Madness is the post season, 64-team bracket for (Men's) NCAA Basketball. The winner of the tournament is deemed the national champion. It is common practice to individually fill out these brackets prior to the start of the first game. The objective of this project is to predict the outcome of a game between two selected teams, this infomation will then be used to generate an entire bracket of predictions.

In [1]:
#Basic imports, they were added as needed

import pandas as pd
import numpy as np
import matplotlib.pyplot as plt
import seaborn as sns

from sklearn.model_selection import train_test_split
from sklearn.linear_model import LogisticRegression
from sklearn.ensemble import RandomForestClassifier
from sklearn.neighbors import KNeighborsClassifier

## 2. Data Explanation
This project is currently using data from [RealGM](https://basketball.realgm.com/ncaa/team-stats/2022/Totals/Team_Totals/0). The data contains numerous statistics of each team from the year 02-03 until 21-22 (excluding 19-20 and 20-21 due to COVID irregularities). These statistics include:
* seed = Seed in the NCAA tournament
* wins = Number of wins
* losses = Number of losses
* winp = Win percentage
* cwins = Number of conference wins
* closses = Number of conference losses
* cwinp = Conference win percentage
* pts = Number of points scored
* ptspg = Points per game
* fgm = Number of field goals made
* fga = Number of field goals attempted
* fgp = Field goal percentage
* 3pm = Number of 3 pointers made
* 3pa = Number of 3 pointers attempted
* 3pp = 3 pointer percentage 
* ftm = Number of free throws made
* fta = Number of free throws attempted
* ftp = Free throw percentage
* orb = Number of offensive rebounds
* orbpg = Offensive rebounds per game
* drb = Number of defensive rebounds
* drbpg = Defensive rebounds per game
* reb = Number of rebounds
* rebpg = Rebounds per game
* ast = Number of assists
* astpg = Assists per game
* stl = Number of steals
* stlpg = Steals per game
* blk = Number of blocks
* blkpg = Blocks per game
* tov = Number of turnovers
* tovpg = Turnovers per game
* pf = Number of personal fouls
* pfpg = Personal fouls per game

In [3]:
# load the stats dataframe
stats_df = pd.read_csv("stats-df.csv")
stats_df.columns

Index(['year', 'team', 'seed', 'wins', 'losses', 'winp', 'cwins', 'closses',
       'cwinp', 'pts', 'ptspg', 'fgm', 'fga', 'fgp', '3pm', '3pa', '3pp',
       'ftm', 'fta', 'ftp', 'orb', 'orbpg', 'drb', 'drbpg', 'reb', 'rebpg',
       'ast', 'astpg', 'stl', 'stlpg', 'blk', 'blkpg', 'tov', 'tovpg', 'pf',
       'pfpg'],
      dtype='object')

In [4]:
stats_df

Unnamed: 0,year,team,seed,wins,losses,winp,cwins,closses,cwinp,pts,...,ast,astpg,stl,stlpg,blk,blkpg,tov,tovpg,pf,pfpg
0,2003,Oklahoma,1,24,6,0.800,12,4,0.750,2135.0,...,425.0,14.167,208.0,6.933,113.0,3.767,353.0,11.767,558.0,18.600
1,2003,Kentucky,1,28,4,0.875,16,0,1.000,2481.0,...,519.0,16.219,248.0,7.750,166.0,5.188,447.0,13.969,557.0,17.406
2,2003,Texas,1,22,6,0.786,13,3,0.812,2208.0,...,406.0,14.500,179.0,6.393,108.0,3.857,375.0,13.393,570.0,20.357
3,2003,Arizona,1,26,2,0.929,17,1,0.944,2386.0,...,493.0,17.607,240.0,8.571,118.0,4.214,412.0,14.714,497.0,17.750
4,2003,Wake Forest,2,23,6,0.793,12,4,0.750,2274.0,...,423.0,14.586,186.0,6.414,130.0,4.483,431.0,14.862,534.0,18.414
...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...
1195,2022,Texas A&M-CC,16,23,11,0.676,7,7,0.500,2615.0,...,511.0,15.029,289.0,8.500,59.0,1.735,483.0,14.206,689.0,20.265
1196,2022,Bryant,16,22,9,0.710,15,2,0.882,2415.0,...,437.0,14.097,196.0,6.323,138.0,4.452,431.0,13.903,520.0,16.774
1197,2022,Wright State,16,21,13,0.618,15,7,0.682,2566.0,...,474.0,13.941,196.0,5.765,100.0,2.941,418.0,12.294,489.0,14.382
1198,2022,Texas Southern,16,18,12,0.600,13,5,0.722,2077.0,...,325.0,10.833,168.0,5.600,150.0,5.000,445.0,14.833,533.0,17.767
