# Predicting the NBA Draft

Welcome to our tutorial! In this project, the goal is to look at what factors are the most important for NBA draft picks. First, a little bit about the NBA. The National Basketball Association, or NBA, is the highest level professional basketball league in the world. It is composed of 30 teams that play 82 games each, before a playoff bracket is set. The most common way for players to enter the league (After 2006, read more about the rule change [here](https://en.wikipedia.org/wiki/Eligibility_for_the_NBA_draft)) is to be drafted in the NBA draft. The draft consists of 2 rounds, with each team getting 1 pick per round, for a total of 60 picks. 

The end goal of this project is to try and predict the [2021 NBA draft](https://en.wikipedia.org/wiki/2021_NBA_draft) that will take place on July 29th. Finding out the position of draft picks would be useful for those in the sports industry. A team would be able to predict what players other teams will select, see if players are over/under-rated, or possibly help set sports betting odds. However to start predicting the draft, we must look at past drafts to see what are the most important factors for an NBA draft pick. The project will be broken up into 3 sections:

1. Data Wrangling
2. Analysis of Previous Drafts
3. Predictions of 2021 Draft

Over the course of this guide, we hope the reader is able to understand how and why data analysis is done, and would be able to follow similar steps to do some data science on other topics!

## Part 0: Setup

The following imports and libraries will be used

In [1]:
import pandas as pd
import numpy as np
import requests
from bs4 import BeautifulSoup
!pip3 install lxml



## Part 1: Data Wrangling

Perhaps the most important part is to get the data. This first dataset needed is a collection of past NBA draft picks college stats. This is what we will do analysis on, and train our machine learning model on. We found the website [barttorvik.com](https://www.barttorvik.com/playerstat.php?link=y&minGP=15&year=all&start=-11101&end=all0501) that contains an interactive table with college players and their stats. To get the players we wanted, the website allowed us to query all players since 2008 that were drafted (set drafted to <= 60). *More about this later....*

Secondly, a dataset is needed for the current NBA draft class. With this data, we can feed that into our model and get predictions of the draft order

### Note to group

The data on barttorvik is a table generated using javascript. I have absolutely no idea how to scrape data in this way. To get the data, what I did was manually copy and paste the table into a blank csv file, and then pandas thankfully could interperet it decently enough. I still need to format the df. But if either of you know a better way to import the data, please do it that way/tell me and I can fix it, because copy/pasting is definitely not the best.

### Step a: Previous drafts

Here, we will show how the data was acquired from previous years....

In [30]:
pastPicks_df = pd.read_csv('past_college_players.csv', '\t')
pastPicks_df.head()

Unnamed: 0,Unnamed: 1,Unnamed: 2,Unnamed: 3.1,RK,PICK,PLAYER,Unnamed: 3,TEAM,CONF,MIN%,PRPG!,D-PRPG,BPM,...,AST,TO,BLK,STL,FTR,FC/40,2P,3P/100,3P,YEAR
1,25,So,6-7,Reggie Bullock (12),97.2,North Carolina,ACC,63.3,2.6,3.1,6.5,118.0,14.9,...,0.8,1.4,7.6,1.9,53-104,0.51,10.5,71-186,0.382,2012
2,11,Fr,6-6,Klay Thompson (09),91.8,Washington St.,P12,82.1,2.4,4.5,5.6,97.6,23.9,...,2.2,1.9,8.2,2.5,91-213,0.427,10.2,68-165,0.412,2009
3,46,Fr,6-2,Andrew Goudelock (08),0.0,College of Charlesto,SC,71.4,2.9,1.9,2.7,113.3,21.4,...,0.6,2.0,8.7,1.8,94-185,0.508,10.6,73-173,0.422,2008
4,46,Fr,6-3,Norman Powell (12),89.6,UCLA,P12,44.2,0.6,1.8,0.6,96.8,15.7,...,1.8,1.6,9.4,3.1,32-79,0.405,7.6,24-70,0.343,2012
5,19,Fr,6-7,Kevin Huerter (17),87.2,Maryland,B10,74.1,2.4,3.5,6.9,110.7,16.4,...,2.4,1.9,11.0,2.4,42-85,0.494,10.4,64-169,0.379,2017


### Step b: This year's draft

Unfortunately we will not be able to predict everyone in the draft. This is because the deadline for players to announce for the NBA draft is May 30th, and this project is due May 17th. However, most players who will be picked higher up in the rankings declare earlier for the draft, so we can use these early declarations for predictions. [This website](https://ca.nba.com/news/2021-nba-draft-ncaa-players-that-have-declared-for-the-2021-nba-draft-e/hphzsl6bo6dh16t9tpee5628y) contains a nice table of all the players who have declared early thus far. Next, we use this list of names as a cross reference for the 2021 season stats, getting the 2021 season players stats from [barttorvik.com](https://www.barttorvik.com/playerstat.php?link=y&minGP=15&year=2021&start=20201101&end=20210501) again, now looking at player stats from 2021.

In [25]:
# HTML Scraping the list of early draft commits using requests, BeautifulSoup, and pandas
r = requests.get("https://ca.nba.com/news/2021-nba-draft-ncaa-players-that-have-declared-for-the-2021-nba-draft-e/hphzsl6bo6dh16t9tpee5628y")
soup = BeautifulSoup(r.content, 'lxml')

# Find the table in the html, thankfully only one
html_table = soup.find("table")
earlycomm_df = pd.read_html(str(html_table), header=0)[0]
earlycomm_df.head()

Unnamed: 0,Player,Position,Year,School
0,Ochai Agbaji,SG,Jr.,Kansas
1,James Akinjo,PG,Jr.,Arizona
2,Keve Aluma,PF,R-Jr.,Virginia Tech
3,Jose Alvarado,PG,Sr.,Georgia Tech
4,Avery Anderson III,PG,So.,Oklahoma State


In [27]:
# Now add this to a list
earlyComm_list = earlycomm_df['Player'].tolist

### Note to group

Again, for this I manually copy/pasted the table into a blank csv and moved it in. Again, the table it meh, still a little messed up, but easily fixable with some code. So once again, if either of you can figure out how to query this/scrape it in a better way, please let me know/do it yourself.

In [32]:
stats2021_df = pd.read_csv('2021season.csv', '\t')
stats2021_df.head()

Unnamed: 0,Unnamed: 1,Unnamed: 2,Unnamed: 3,RK,PLAYER,TEAM,CONF,MIN%,PRPG!,BPM,ORTG,USG,EFG,...,OR,DR,AST,TO,BLK,STL,FTR,2P,3P/100,3P
1,Sr,6-11,Luka Garza,Iowa,B10,78.8,7.0,12.5,124.0,30.4,59.6,62.0,10.4,...,12.1,8.9,5.2,1.2,39.2,237-408,0.581,5.7,44-100,0.44
2,So,6-1,Max Abmas,Oral Roberts,Sum,95.8,6.6,6.5,121.8,28.9,58.1,63.0,1.4,...,22.3,12.3,0.4,2.1,33.2,119-231,0.515,12.4,97-224,0.433
3,Jr,5-11,Kendric Davis,SMU,Amer,86.8,6.5,9.7,121.6,27.6,53.4,58.3,1.6,...,46.2,14.1,0.2,2.6,35.4,89-170,0.524,6.4,25-67,0.373
4,Fr,6-4,Cameron Thomas,LSU,SEC,84.6,6.0,5.3,114.6,29.2,47.4,55.3,2.0,...,8.3,9.4,0.7,1.4,44.0,135-291,0.464,11.7,68-209,0.325
5,So,6-10,Drew Timme,Gonzaga,WCC,70.1,5.9,10.9,129.3,26.5,66.3,67.7,10.4,...,14.4,12.8,2.5,1.3,50.8,231-340,0.679,1.3,6-22,0.273


In [33]:
# Next, do a cross reference to get a smaller df, something like stats2021_df.where(earlyComm_lsit contains(stats2021['Player']))

## Part 2

Now that all the data has been collected, the data analysis can begin. The important things to analyze is..... 

## Part 3

Next we can try to train a model to predict the 2021 NBA draft. We have shown what parameters seem to be the most important, so we can train the model to look at those.......
