# Player Statistics Scraping

There are a couple of possible data sources for NBA data. The easiest to scrape is basketball-reference.com, but there are multiple issues with their data. The first issue, in my opinion, is they use estimated possessions to calculate their per-possession statistics, whereas NBA.com stats are calculated with actual possession counts. Secondly, nba.com has more statistics, including tracking statistics. 

The downside of NBA.com is that it is not an HTML website. It is completely dynamic, meaning I need to use Selenium to scrape the content. 

My general model for the NBA.com website is in 'NBA Project Organization.csv'

![](2022-08-18-09-18-30.png)



### General Scraping and Creating Methodology

This was not decided until December, 2022, so I have some edits to make. Assuming they are all done correctly, this will be able to be run through. 

It will go:
- Download all data into files / folders (initial scrape)
- Scrape games occurred
- Check Games Occured List against data in folders
- Scrape data where missing
- Append data
- Create Aggregate Files with appended data

### To Scrape

- NBA Player Data
    - General
    - Clutch
    - Playtype (Offense)
    - Playtype (Defense)
    - Tracking
    - Defensive Dashboard
    - Shot Dashboard
        - General
        - Shotclock
        - Dribbles
        - Touch Time
        - Closest Defender
    - Box Scores
    - Box Scores (Advanced)
    - Shooting
    - Opponent Shooting
    - Hustle
    - Box Outs




In [1]:
import pandas as pd
import numpy as np
import matplotlib.pyplot as plt
%matplotlib inline
import matplotlib.ticker as mtick
import sqlite3
import seaborn as sns
from matplotlib.offsetbox import OffsetImage, AnnotationBbox
from selenium import webdriver
from selenium.webdriver.common.keys import Keys
from bs4 import BeautifulSoup
from selenium.webdriver.common.by import By
from selenium.webdriver.support.ui import WebDriverWait
from selenium.webdriver.support import expected_conditions as EC
import time
import requests   
import shutil      
import datetime
from scipy.stats import norm
import os
import winsound

In [2]:
home_folder = 'C:\\Users\\Travis\\OneDrive\\Data Science\\Personal_Projects\\Sports\\NBA_Prediction_V3_1'
os.chdir(home_folder)

## Scraping Player Statistics

Note: Shooting and Opponent Shooting folders have been placed in "Shot Dashboard", because it seemed to make some sense.

### Define Functions

In [3]:
driver = webdriver.Chrome()

In [121]:
def replace_name_values(filename):
        # replace values with dashes for compatibility
    filename = filename.replace('%','_')
    filename = filename.replace('=','_')
    filename = filename.replace('?','_')
    filename = filename.replace('&','_')
    filename = filename.replace('20Season_','')
    filename = filename.replace('20Season','')
    return filename

In [620]:
def grab_player_data(url_list, file_folder):    
        
        # Scrape Season-Level player data from the url_list

        i = 0
        for u in url_list:
                
                driver.get(u)
                time.sleep(2)

                # if the page does not load, go to the next in the list
                try:
                        xpath = '//*[@id="__next"]/div[2]/div[2]/div[3]/section[2]/div/div[2]/div[2]/div[1]/div[3]/div/label/div/select/option[1]'
                        elem = WebDriverWait(driver, 30).until(EC.presence_of_element_located((By.XPATH, xpath)))
                except:
                        print(f'{u} did not load. Moving to next url.')
                        continue

                # click "all pages"
                xpath_all = '//*[@id="__next"]/div[2]/div[2]/div[3]/section[2]/div/div[2]/div[2]/div[1]/div[3]/div/label/div/select/option[1]' 
                elem = WebDriverWait(driver, 30).until(EC.presence_of_element_located((By.XPATH, xpath_all)))
                
                driver.find_element(by=By.XPATH, value=xpath_all).click()
                src = driver.page_source
                parser = BeautifulSoup(src, "lxml")
                table = parser.find("table", attrs = {"class":"Crom_table__p1iZz"})
                headers = table.findAll('th')
                headerlist = [h.text.strip() for h in headers[0:]] 
                row_names = table.findAll('a')                             # find rows
                row_list = [b.text.strip() for b in row_names[0:]] 
                rows = table.findAll('tr')[0:]
                player_stats = [[td.getText().strip() for td in rows[i].findAll('td')[0:]] for i in range(len(rows))]
                tot_cols = len(player_stats[1])                           #set the length to ignore hidden columns
                headerlist = headerlist[:tot_cols]   
                stats = pd.DataFrame(player_stats, columns = headerlist)

                # assign filename
                filename = file_folder + str(u[34:]).replace('/', '_') + '.csv'
                filename = replace_name_values(filename)
                pd.DataFrame.to_csv(stats, filename)
                i += 1
                lu = len(url_list)
                # close driver
                print(f'{filename} Completed Successfully! {i} / {lu} Complete!')

        winsound.Beep(523, 500)

In [20]:
p = 'data/player/general/'
files = os.listdir(p)
# find the year in the filename
files = pd.DataFrame(files, columns = ['filename'])
files

Unnamed: 0,filename


In [21]:
# add column that scrapes the numbers from the filename
files['year'] = files['filename'].str.extract('(\d+)', expand = True)
files['year'] = files['year'].astype(int)
files

Unnamed: 0,filename,year


In [22]:
def append_the_data(folder, data_prefix, filename_selector):
    # Appending data together via folder and/or file name

    path = folder
    p = os.listdir(path)
    pf = pd.DataFrame(p)


    # filter for files that contain the filename_selector
    pf_reg = pf.loc[pf[0].astype(str).str.contains(filename_selector)] 

    appended_data = []
    for file in pf_reg[0]:
        data = pd.read_csv(folder + '/' + file)
        # if "Season" a column, drop it
        if 'Season' in data.columns:
            data = data.drop(columns = ['Season'])
        
        data['season'] = file[(file.find('20')):(file.find('20'))+4]
        data['season_type'] = np.where('Regular' in file, 'Regular', 'Playoffs')
        # add prefix to columns
        data = data.add_prefix(data_prefix)
        data.columns = data.columns.str.lower()
        appended_data.append(data)
    
    appended_data = pd.concat(appended_data)
    return appended_data

In [517]:
def replace_name_values2(filename):
        # replace values with dashes for compatibility
    filename = filename.replace('%','_')
    filename = filename.replace('=','_')
    filename = filename.replace('?','_')
    filename = filename.replace('&','_')
    filename = filename.replace('20Season_','')
    filename = filename.replace('_20Season','')
    filename = filename.replace('SeasonType_','')
    filename = filename.replace('sort_gdate_dir_-1_','')
    filename = filename.replace('SeasonYear_','')
    return filename

In [208]:
def grab_player_clutch_stats(url_list, file_folder):
        # grabbing clutch stats is a bit different than the rest of the data
        # because the table is not in the same format

        i = 0
        for u in url_list:
                driver.get(u)
                time.sleep(2)

                # if the page does not load, go to the next in the list
                try:
                        xpath = '//*[@id="__next"]/div[2]/div[2]/div[3]/section[2]/div/div[2]/div[2]/div[1]/div[3]/div/label/div/select/option[1]'
                        elem = WebDriverWait(driver, 30).until(EC.presence_of_element_located((By.XPATH, xpath)))
                except:
                        print(f'{u} did not load. Moving to next url.')
                        continue

                # click "all pages"
                xpath_all = '//*[@id="__next"]/div[2]/div[2]/div[3]/section[2]/div/div[2]/div[2]/div[1]/div[3]/div/label/div/select/option[1]' 
                elem = WebDriverWait(driver, 30).until(EC.presence_of_element_located((By.XPATH, xpath_all)))
                
                driver.find_element(by=By.XPATH, value=xpath_all).click()
                src = driver.page_source
                parser = BeautifulSoup(src, "lxml")
                table = parser.find("table", attrs = {"class":"Crom_table__p1iZz"})
                headers = table.findAll('th')
                headerlist = [h.text.strip() for h in headers[0:]] 
                
                # find rows
                row_names = table.findAll('a')               
                row_list = [b.text.strip() for b in row_names[0:]] 
                rows = table.findAll('tr')[0:]
                player_stats = [[td.getText().strip() for td in rows[i].findAll('td')[0:]] for i in range(len(rows))]

                #set the length to ignore hidden columns
                tot_cols = len(player_stats[2])                          
                headerlist = headerlist[:tot_cols]       
                stats = pd.DataFrame(player_stats, columns = headerlist)
                filename = file_folder + str(u[34:]).replace('/', '_') + '.csv'
                filename = replace_name_values2(filename)
                filename = filename.replace('SeasonType_Regular_20Season','Reg_Season')
                pd.DataFrame.to_csv(stats, filename)
                i += 1
                lu = len(url_list)
                print(f'{filename} Completed Successfully! {i} / {lu} Complete!')

In [444]:
def grab_player_playtype_defense(url_list, file_folder):
        driver = webdriver.Chrome()
        i = 0
        for u in url_list:
                driver.get(u)
                time.sleep(1)

                #  go to defense page
                try:
                        xpath_defense = '//*[@id="__next"]/div[2]/div[2]/div[3]/section[1]/div/div/div[4]/label/div/select/option[2]'
                        elem = WebDriverWait(driver, 30).until(EC.presence_of_element_located((By.XPATH, xpath_defense)))

                except:
                        print(f'{u} did not load. Moving to next url.')
                        continue
                
                driver.find_element(by=By.XPATH, value=xpath_defense).click()

                # select all
                # if the page does not load, go to the next in the list
                try:
                        xpath = '//*[@id="__next"]/div[2]/div[2]/div[3]/section[2]/div/div[2]/div[2]/div[1]/div[3]/div/label/div/select/option[1]'
                        elem = WebDriverWait(driver, 30).until(EC.presence_of_element_located((By.XPATH, xpath)))
                except:
                        print(f'{u} did not load. Moving to next url.')
                        continue

                driver.find_element(by=By.XPATH, value=xpath).click()
                src = driver.page_source
                parser = BeautifulSoup(src, "lxml")
                table = parser.find("table", attrs = {"class":"Crom_table__p1iZz"})
                headers = table.findAll('th')
                headerlist = [h.text.strip() for h in headers[0:]] 
                row_names = table.findAll('a')                             # find rows
                row_list = [b.text.strip() for b in row_names[0:]] 
                rows = table.findAll('tr')[0:]
                player_stats = [[td.getText().strip() for td in rows[i].findAll('td')[0:]] for i in range(len(rows))]
                tot_cols = len(player_stats[1])                           #set the length to ignore hidden columns
                headerlist = headerlist[:tot_cols]   
                stats = pd.DataFrame(player_stats, columns = headerlist)
                filename = file_folder + str(u[34:]).replace('/', '_') + '.csv'
                filename = replace_name_values2(filename)
                filename = filename.replace('SeasonYear_','')
                pd.DataFrame.to_csv(stats, filename)
                i += 1
                lu = len(url_list)
                print(f'{filename} Completed Successfully! {i} / {lu} Complete!')
        winsound.Beep(523, 500)

In [25]:
def get_shot_dashboard(url_list, file_folder, option_numbers,option_names):
        driver = webdriver.Chrome()
        i = 0
        # get first option
        for u in url_list:
            for option in option_numbers:
                driver.get(u)
                time.sleep(1)
                # get option xpath
                op = option + 1
                xpath_option = '/html/body/main/div/div/div[2]/div/div/div[1]/div[4]/div/div/label/select/option[' + str(op) + ']'
                elem = WebDriverWait(driver, 30).until(EC.presence_of_element_located((By.XPATH, xpath_option)))
                driver.find_element(by=By.XPATH, value=xpath_option).click()
                # all pages
                xpath_all = '/html/body/main/div/div/div[2]/div/div/nba-stat-table/div[1]/div/div/select/option[1]' # click "all pages"
                elem = WebDriverWait(driver, 30).until(EC.presence_of_element_located((By.XPATH, xpath_all)))
                driver.find_element(by=By.XPATH, value=xpath_all).click()
                src = driver.page_source
                parser = BeautifulSoup(src, "lxml")
                table = parser.find("div", attrs = {"class":"nba-stat-table__overflow"})
                headers = table.findAll('th')
                headerlist = [h.text.strip() for h in headers[0:]] 
                row_names = table.findAll('a')                             # find rows
                row_list = [b.text.strip() for b in row_names[0:]] 
                rows = table.findAll('tr')[0:]
                player_stats = [[td.getText().strip() for td in rows[i].findAll('td')[0:]] for i in range(len(rows))]
                tot_cols = len(player_stats[2])                           #set the length to ignore hidden columns
                headerlist = headerlist[:tot_cols]   
                stats = pd.DataFrame(player_stats, columns = headerlist)
                filename = file_folder + str(u[34:]).replace('/', '_') + 'op_' + str(option_names[option]) + '.csv'
                filename = replace_name_values(filename)
                pd.DataFrame.to_csv(stats, filename)
                i += 1
                lu = len(url_list)
                print(f'{filename} Completed Successfully! {i} / {lu} Complete!')

In [26]:
# There are lots of tripple-to-sextuple merges to perform on the data.

def quad_merge(df1,df2,df3,df4, prefix1, prefix2, prefix3, prefix4):
    merge_cols1 = [(prefix1 + 'player'), (prefix1 + 'season'), (prefix1 + 'season_type')]
    merge_cols2 = [(prefix2 + 'player'), (prefix2 + 'season'), (prefix2 + 'season_type')]
    merge_cols3 = [(prefix3 + 'player'), (prefix3 + 'season'), (prefix3 + 'season_type')]
    merge_cols4 = [(prefix4 + 'player'), (prefix4 + 'season'), (prefix4 + 'season_type')]
    df = pd.merge(df1,df2, left_on=merge_cols1, right_on = merge_cols2, how='left')
    df = pd.merge(df,df3, left_on=merge_cols1, right_on = merge_cols3, how='left')
    df = pd.merge(df,df4, left_on=merge_cols1, right_on = merge_cols4, how='left')
    return df


def quintuple_merge(df1, df2, df3, df4, df5, prefix1, prefix2, prefix3, prefix4, prefix5):
    merge_cols1 = [(prefix1 + 'player'), (prefix1 + 'season'), (prefix1 + 'season_type')]
    merge_cols2 = [(prefix2 + 'player'), (prefix2 + 'season'), (prefix2 + 'season_type')]
    merge_cols3 = [(prefix3 + 'player'), (prefix3 + 'season'), (prefix3 + 'season_type')]
    merge_cols4 = [(prefix4 + 'player'), (prefix4 + 'season'), (prefix4 + 'season_type')]
    merge_cols5 = [(prefix5 + 'player'), (prefix5 + 'season'), (prefix5 + 'season_type')]
    df = pd.merge(df1,df2, left_on=merge_cols1, right_on = merge_cols2, how='left')
    df = pd.merge(df,df3, left_on=merge_cols1, right_on = merge_cols3, how='left')
    df = pd.merge(df,df4, left_on=merge_cols1, right_on = merge_cols4, how='left')
    df = pd.merge(df,df5, left_on=merge_cols1, right_on = merge_cols5, how='left')
    return df

def sextuple_merge(df1, df2, df3, df4, df5, df6, prefix1, prefix2, prefix3, prefix4, prefix5, prefix6):
    merge_cols1 = [(prefix1 + 'player'), (prefix1 + 'season'), (prefix1 + 'season_type')]
    merge_cols2 = [(prefix2 + 'player'), (prefix2 + 'season'), (prefix2 + 'season_type')]
    merge_cols3 = [(prefix3 + 'player'), (prefix3 + 'season'), (prefix3 + 'season_type')]
    merge_cols4 = [(prefix4 + 'player'), (prefix4 + 'season'), (prefix4 + 'season_type')]
    merge_cols5 = [(prefix5 + 'player'), (prefix5 + 'season'), (prefix5 + 'season_type')]
    merge_cols6 = [(prefix6 + 'player'), (prefix6 + 'season'), (prefix6 + 'season_type')]
    df = pd.merge(df1,df2, left_on=merge_cols1, right_on = merge_cols2, how='left')
    df = pd.merge(df,df3, left_on=merge_cols1, right_on = merge_cols3, how='left')
    df = pd.merge(df,df4, left_on=merge_cols1, right_on = merge_cols4, how='left')
    df = pd.merge(df,df5, left_on=merge_cols1, right_on = merge_cols5, how='left')
    df = pd.merge(df,df6, left_on=merge_cols1, right_on = merge_cols6, how='left')
    return df

def tripple_merge(df1, df2, df3, prefix1, prefix2, prefix3):
    merge_cols1 = [(prefix1 + 'player'), (prefix1 + 'season'), (prefix1 + 'season_type')]
    merge_cols2 = [(prefix2 + 'player'), (prefix2 + 'season'), (prefix2 + 'season_type')]
    merge_cols3 = [(prefix3 + 'player'), (prefix3 + 'season'), (prefix3 + 'season_type')]
    df = pd.merge(df1,df2, left_on=merge_cols1, right_on = merge_cols2, how='left')
    df = pd.merge(df,df3, left_on=merge_cols1, right_on = merge_cols3, how='left')
    return df

def double_merge(df1, df2, prefix1, prefix2):
    merge_cols1 = [(prefix1 + 'player'), (prefix1 + 'season'), (prefix1 + 'season_type')]
    merge_cols2 = [(prefix2 + 'player'), (prefix2 + 'season'), (prefix2 + 'season_type')]
    df = pd.merge(df1,df2, left_on=merge_cols1, right_on = merge_cols2, how='left')
    return df

def single_merge(df1, prefix1):
    merge_cols1 = [(prefix1 + 'player'), (prefix1 + 'season'), (prefix1 + 'season_type')]
    df = pd.merge(df1,df1, left_on=merge_cols1, right_on = merge_cols1, how='left')
    return df

# Season-Level Player Stats (Complete 12.20)

This grabs season-level (or playoff-level) player data. I.e., one-season aggregate.

In [45]:
# This gets a list of all the urls for the player general stats for PREVIOUS YEARS

years = ['2021-22', '2020-21', '2019-20', '2018-19', '2017-18', '2016-17', 
            '2015-16','2014-15', '2013-14', '2012-13', '2011-12']
stat_types = ['traditional', 'advanced', 'misc', 'scoring', 'usage','opponent', 'defense']
season_types = ['Playoffs', 'Regular%20Season']

player_general_urls = []

for year in years:
    for stattype in stat_types:
        for s_types in season_types:
            url = 'https://www.nba.com/stats/players/'+ stattype +'?SeasonType=' + s_types + '&Season=' + year 
            player_general_urls.append(url)

In [46]:
# add the 2022-23 season to the list of urls
year = '2022-23'
season_type = 'Regular%20Season'
for stattype in stat_types:
    url = 'https://www.nba.com/stats/players/'+ stattype +'?SeasonType=' + season_type + '&Season=' + year 
    player_general_urls.append(url)


In [47]:
# translate urls to naming convention
def trans_urls(url):
    new_url = str(url)[34:].replace('/', '_')
    filename = replace_name_values(new_url)
    return filename

In [48]:
pg_url = pd.DataFrame(player_general_urls, columns = ['url'])

In [49]:
# apply lambda function to get filename
pg_url['filename'] = pg_url.apply(lambda row: trans_urls(row['url']), axis=1)
pg_url.head(2)

Unnamed: 0,url,filename
0,https://www.nba.com/stats/players/traditional?...,traditional_Playoffs_Season_2021-22
1,https://www.nba.com/stats/players/traditional?...,traditional_Regular_Season_2021-22


In [50]:
# Get files in folder
folder = os.listdir('data/player/general')
folder = [x.replace('.csv', '') for x in folder]

In [51]:
# get list of files that need to be downloaded, the files that are not in the folder
to_download = pg_url[~pg_url['filename'].isin(folder)]
to_download


Unnamed: 0,url,filename
37,https://www.nba.com/stats/players/usage?Season...,usage_Regular_Season_2019-20


In [52]:
driver = webdriver.Chrome()

In [53]:
# if there are files to download, download them

# turn url to list
to_download_list = to_download['url'].tolist()

if len(to_download_list) > 0:
    grab_player_data(to_download_list, 'data/player/general/')
else:
    print('No files to download')

data/player/general/usage_Regular_Season_2019-20.csv Completed Successfully! 1 / 1 Complete!


##### Append Data

In [70]:
trad_data = append_the_data('data/player/general', 'trad_', 'traditional')
print(f' data shape: {trad_data.shape}')
trad_data.head(3)

 data shape: (8489, 33)


Unnamed: 0,trad_unnamed: 0,trad_unnamed: 1,trad_player,trad_team,trad_age,trad_gp,trad_w,trad_l,trad_min,trad_pts,...,trad_tov,trad_stl,trad_blk,trad_pf,trad_fp,trad_dd2,trad_td3,trad_+/-,trad_season,trad_season_type
0,0,,,,,,,,,,...,,,,,,,,,2011,Playoffs
1,1,1.0,LeBron James,MIA,27.0,23.0,16.0,7.0,42.8,30.3,...,3.5,1.9,0.7,2.0,54.6,11.0,1.0,8.7,2011,Playoffs
2,2,2.0,Kobe Bryant,LAL,33.0,12.0,5.0,7.0,39.7,30.0,...,2.8,1.3,0.2,2.8,44.0,0.0,0.0,-3.4,2011,Playoffs


In [71]:
trad_data.to_csv('data/player/aggregates/player_general_traditional_seasonview.csv')

In [73]:
adv_data = append_the_data('data/player/general/', 'adv_', 'advanced')
print(f' data shape: {adv_data.shape}')

 data shape: (8489, 27)


In [74]:
adv_data.to_csv('data/player/aggregates/player_general_advanced_seasonview.csv')

#### Defense stats

In [75]:
def_data = append_the_data('data/player/general', 'def_', 'defense')

In [76]:
def_data.to_csv('data/player/aggregates/player_general_defense_aggregates.csv')

#### Scoring Stats

In [77]:
scoring_data = append_the_data('data/player/general', 'scor_', 'scoring')

In [78]:
scoring_data.to_csv('data/player/aggregates/player_general_scoring_aggregates.csv')

#### Usage Stats

In [79]:
usage_data = append_the_data('data/player/general', 'usage_', 'usage')
usage_data.to_csv('data/player/aggregates/player_general_usage_aggregates.csv')

#### Opponent Stats

In [80]:
opponent_data = append_the_data('data/player/general', 'opp_', 'opponent')
opponent_data.to_csv('data/player/aggregates/player_general_opponent_aggregates.csv')

#### Misc stats

In [81]:
misc_data = append_the_data('data/player/general', 'misc_', 'misc')
misc_data.to_csv('data/player/aggregates/player_general_misc_aggregates.csv')

### Merge All General Stats in one File

In [83]:
trad_data

Unnamed: 0,trad_unnamed: 0,trad_unnamed: 1,trad_player,trad_team,trad_age,trad_gp,trad_w,trad_l,trad_min,trad_pts,...,trad_tov,trad_stl,trad_blk,trad_pf,trad_fp,trad_dd2,trad_td3,trad_+/-,trad_season,trad_season_type
0,0,,,,,,,,,,...,,,,,,,,,2011,Playoffs
1,1,1.0,LeBron James,MIA,27.0,23.0,16.0,7.0,42.8,30.3,...,3.5,1.9,0.7,2.0,54.6,11.0,1.0,8.7,2011,Playoffs
2,2,2.0,Kobe Bryant,LAL,33.0,12.0,5.0,7.0,39.7,30.0,...,2.8,1.3,0.2,2.8,44.0,0.0,0.0,-3.4,2011,Playoffs
3,3,3.0,Kevin Durant,OKC,23.0,20.0,13.0,7.0,41.8,28.5,...,3.2,1.5,1.2,2.6,47.7,7.0,0.0,3.5,2011,Playoffs
4,4,4.0,Carmelo Anthony,NYK,28.0,5.0,1.0,4.0,40.8,27.8,...,2.8,1.2,0.2,4.2,42.3,1.0,0.0,-12.8,2011,Playoffs
...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...
485,485,476.0,Marko Simonovic,CHI,23.0,1.0,0.0,1.0,1.7,0.0,...,0.0,0.0,0.0,0.0,0.0,0.0,0.0,-8.0,2022,Regular
486,486,476.0,Michael Foster Jr.,PHI,19.0,1.0,1.0,0.0,1.0,0.0,...,0.0,0.0,0.0,0.0,0.0,0.0,0.0,-2.0,2022,Regular
487,487,476.0,Trevelin Queen,IND,25.0,2.0,1.0,1.0,9.8,0.0,...,1.5,0.5,0.5,0.5,4.8,0.0,0.0,-1.0,2022,Regular
488,488,476.0,Trevor Keels,NYK,19.0,1.0,1.0,0.0,1.0,0.0,...,0.0,0.0,0.0,0.0,1.2,0.0,0.0,-2.0,2022,Regular


In [84]:
trad_data['trad_season'] = trad_data['trad_season'].astype(np.int64)

In [88]:
adv_data.adv_season = adv_data.adv_season.astype(np.int64)

In [91]:
# merge advanced and traditional data
all_gen_data = pd.merge(adv_data, trad_data.drop_duplicates(subset = ['trad_player','trad_season', 'trad_season_type']),
                 left_on= ['adv_player','adv_season', 'adv_season_type'], 
                 right_on= ['trad_player','trad_season', 'trad_season_type'], 
                 how = 'left')

In [92]:
def_data['def_season'] = def_data['def_season'].astype(np.int64)

In [93]:
# merge defense data with adv&trad data

all_gen_data = pd.merge(all_gen_data, def_data.drop_duplicates(subset = ['def_player','def_season', 'def_season_type']), 
    left_on= ['adv_player','adv_season', 'adv_season_type'], 
    right_on= ['def_player','def_season', 'def_season_type'], 
    how = 'left')

In [94]:
scoring_data['scor_season'] = scoring_data['scor_season'].astype(np.int64)

In [95]:
# merge scoring data with adv&trad&def data

all_gen_data = pd.merge(all_gen_data, scoring_data.drop_duplicates(subset = ['scor_player','scor_season', 'scor_season_type']), 
    left_on= ['adv_player','adv_season', 'adv_season_type'], 
    right_on= ['scor_player','scor_season', 'scor_season_type'], 
    how = 'left')

In [96]:
usage_data['usage_season'] = usage_data['usage_season'].astype(np.int64)

In [97]:
# merge usage data with adv&trad&def&scoring data

all_gen_data = pd.merge(all_gen_data, usage_data.drop_duplicates(subset = ['usage_player','usage_season', 'usage_season_type'] ), 
    left_on= ['adv_player','adv_season', 'adv_season_type'], 
    right_on= ['usage_player','usage_season', 'usage_season_type'], 
    how = 'left')

In [100]:
all_gen_data.to_csv('data/player/aggregates/player_general_all_aggregates_seasonview.csv')

## Player - Clutch Through 2021 Season (Complete 12.22)

In [275]:
# make clutch folder
if os.path.isdir('data/player/clutch') is False:
    os.mkdir('data/player/clutch')

In [296]:
# URLS
traditional_clutch = 'https://www.nba.com/stats/players/clutch-traditional/?Season='    
advanced_clutch = 'https://www.nba.com/stats/players/clutch-advanced/?Season='    
misc_clutch = 'https://www.nba.com/stats/players/clutch-misc/?Season=' 
scoring_clutch = 'https://www.nba.com/stats/players/clutch-scoring/?Season='
usage_clutch = 'https://www.nba.com/stats/players/clutch-usage/?Season='

clutch_stats = [traditional_clutch, advanced_clutch, misc_clutch, scoring_clutch, usage_clutch]
seasonz = ['2021-22', '2020-21', '2019-20', '2018-19', '2017-18', '2016-17', '2015-16', '2014-15', '2013-14']

clutch_urls = []
for s in seasonz:
        for c in clutch_stats:
                clutch_urls.append(c + s + '&SeasonType=Regular%20Season')

clutch_urls_playoffs = []
for s in seasonz:
        for c in clutch_stats:
                clutch_urls_playoffs.append(c + s + '&SeasonType=Playoffs')

In [299]:
def clutch_url_to_filename(url):
    l = len('https://www.nba.com/stats/players/')
    url = url[l:]
    filename = replace_name_values2(url)
    filename = filename.replace('/', '_')
    return filename

In [302]:
# check urls against downloaded files
clutch_urls = pd.DataFrame(clutch_urls, columns = ['url'])
clutch_urls['filename'] = clutch_urls.apply(lambda row: clutch_url_to_filename(row['url']), axis = 1)
# get rid of the .csv

clutch_urls.head(3)

Unnamed: 0,url,filename
0,https://www.nba.com/stats/players/clutch-tradi...,clutch-traditional__Season_2021-22_Regular
1,https://www.nba.com/stats/players/clutch-advan...,clutch-advanced__Season_2021-22_Regular
2,https://www.nba.com/stats/players/clutch-misc/...,clutch-misc__Season_2021-22_Regular


In [303]:
files_in_folder = os.listdir('data/player/clutch/regular_season')

# remove .csv
files_in_folder = [f[:-4] for f in files_in_folder]

In [304]:
# get the files that are not in the clutch folder
missing = clutch_urls[~clutch_urls['filename'].isin(files_in_folder)]
missing

Unnamed: 0,url,filename


In [305]:
if len(missing) > 0:
    driver=webdriver.Chrome()
    grab_player_clutch_stats(missing, 'data/player/clutch')
else:
    print('no missing files')


no missing files


In [306]:
# check if data/player/clutch/regular_season exists
if os.path.isdir('data/player/clutch/regular_season') is False:
    os.mkdir('data/player/clutch/regular_season')

    # check if data/player/clutch/playoffs exists
if os.path.isdir('data/player/clutch/playoffs') is False:
    os.mkdir('data/player/clutch/playoffs')

In [307]:
# move files to regular season folder or playoffs folder

for f in 'data/player/clutch':
    if '.csv' in f:
        if 'Playoffs' in f:
            os.rename('data/player/clutch/' + f + '.csv', 'data/player/clutch/playoffs/' + f + '.csv')
        else:
            os.rename('data/player/clutch/' + f + '.csv', 'data/player/clutch/regular_season/' + f + '.csv')


In [310]:
# append clutch advanced data
c_adv_reg = append_the_data('data/player/clutch/regular_season', 'c_adv_', 'advanced')
c_adv_reg.to_csv('data/player/aggregates/clutch_advanced_regular_season.csv')

c_adv_playoffs = append_the_data('data/player/clutch/playoffs', 'c_adv_', 'advanced')
c_adv_playoffs.to_csv('data/player/aggregates/clutch_advanced_playoffs.csv')

# merge these dataframes
clutch_advanced = pd.concat([c_adv_reg, c_adv_playoffs])
clutch_advanced.to_csv('data/player/aggregates/clutch_advanced_AllSeasons.csv')

# append clutch traditional data
c_trad_reg = append_the_data('data/player/clutch/regular_season', 'c_trad_', 'traditional')
c_trad_reg.to_csv('data/player/aggregates/clutch_traditional_regular_season.csv')

c_trad_playoffs = append_the_data('data/player/clutch/playoffs', 'c_trad_', 'traditional')
c_trad_playoffs.to_csv('data/player/aggregates/clutch_traditional_playoffs.csv')

# merge these dataframes
clutch_traditional = pd.concat([c_trad_reg, c_trad_playoffs])
clutch_traditional.to_csv('data/player/aggregates/clutch_traditional_AllSeasons.csv')

#append clutch misc data
c_misc_reg = append_the_data('data/player/clutch/regular_season', 'c_misc_', 'misc')
c_misc_reg.to_csv('data/player/aggregates/clutch_misc_regular_season.csv')

c_misc_playoffs = append_the_data('data/player/clutch/playoffs', 'c_misc_', 'misc')
c_misc_playoffs.to_csv('data/player/aggregates/clutch_misc_playoffs.csv')

# merge these dataframes
clutch_misc = pd.concat([c_misc_reg, c_misc_playoffs])
clutch_misc.to_csv('data/player/aggregates/clutch_misc_AllSeasons.csv')

# append clutch scoring data

c_scoring_reg = append_the_data('data/player/clutch/regular_season', 'c_scoring_', 'scoring')
c_scoring_reg.to_csv('data/player/aggregates/clutch_scoring_regular_season.csv')

c_scoring_playoffs = append_the_data('data/player/clutch/playoffs', 'c_scoring_', 'scoring')
c_scoring_playoffs.to_csv('data/player/aggregates/clutch_scoring_playoffs.csv')

# merge these dataframes
clutch_scoring = pd.concat([c_scoring_reg, c_scoring_playoffs])
clutch_scoring.to_csv('data/player/aggregates/clutch_scoring_AllSeasons.csv')

# append clutch usage data
c_usage_reg = append_the_data('data/player/clutch/regular_season', 'c_usage_', 'usage')
c_usage_reg.to_csv('data/player/aggregates/clutch_usage_regular_season.csv')

c_usage_playoffs = append_the_data('data/player/clutch/playoffs', 'c_usage_', 'usage')
c_usage_playoffs.to_csv('data/player/aggregates/clutch_usage_playoffs.csv')

# merge these dataframes
clutch_usage = pd.concat([c_usage_reg, c_usage_playoffs])
clutch_usage.to_csv('data/player/aggregates/clutch_usage_AllSeasons.csv')


In [315]:
# append all clutch data
adv = pd.read_csv('data/player/aggregates/clutch_advanced_AllSeasons.csv')
trad = pd.read_csv('data/player/aggregates/clutch_traditional_AllSeasons.csv')
misc = pd.read_csv('data/player/aggregates/clutch_misc_AllSeasons.csv')
scoring = pd.read_csv('data/player/aggregates/clutch_scoring_AllSeasons.csv')
usage = pd.read_csv('data/player/aggregates/clutch_usage_AllSeasons.csv')


In [316]:
print(f' Advanced is {adv.shape}, Traditional is {trad.shape}, Misc is {misc.shape}, Scoring is {scoring.shape}, Usage is {usage.shape}')

 Advanced is (5294, 27), Traditional is (5156, 34), Misc is (5294, 24), Scoring is (5294, 27), Usage is (5294, 30)


In [317]:
all_clutch = pd.merge(adv, trad.drop_duplicates(subset = ['c_trad_player', 'c_trad_season', 'c_trad_season_type']), 
                                                left_on= ['c_adv_player', 'c_adv_season', 'c_adv_season_type'], 
                                                right_on= ['c_trad_player', 'c_trad_season', 'c_trad_season_type'], 
                                                how= 'left')

In [318]:
all_clutch = pd.merge(all_clutch, misc.drop_duplicates(subset = ['c_misc_player', 'c_misc_season', 'c_misc_season_type']), 
                                                left_on= ['c_adv_player', 'c_adv_season', 'c_adv_season_type'], 
                                                right_on= ['c_misc_player', 'c_misc_season', 'c_misc_season_type'], 
                                                how= 'left')
all_clutch = pd.merge(all_clutch, scoring.drop_duplicates(subset = ['c_scoring_player', 'c_scoring_season', 'c_scoring_season_type']), 
                                                left_on= ['c_adv_player', 'c_adv_season', 'c_adv_season_type'], 
                                                right_on= ['c_scoring_player', 'c_scoring_season', 'c_scoring_season_type'], 
                                                how= 'left')
all_clutch = pd.merge(all_clutch, usage.drop_duplicates(subset = ['c_usage_player', 'c_usage_season', 'c_usage_season_type']), 
                                                left_on= ['c_adv_player', 'c_adv_season', 'c_adv_season_type'], 
                                                right_on= ['c_usage_player', 'c_usage_season', 'c_usage_season_type'], 
                                                how= 'left')
all_clutch.to_csv('data/player/aggregates/ALL_Clutch.csv')

  all_clutch = pd.merge(all_clutch, scoring.drop_duplicates(subset = ['c_scoring_player', 'c_scoring_season', 'c_scoring_season_type']),


## Player - Playtype - Offense (Complete 12.22)

Complete through 2021 season

In [346]:
years = ['2021-22', '2020-21', '2019-20', '2018-19', '2017-18', '2016-17', '2015-16']
playtypes = ['isolation', 'transition', 'ball-handler', 'roll-man', 
            'playtype-post-up','spot-up', 'hand-off', 'cut',
            'off-screen', 'putbacks', 'misc'] 
season_types = ['Playoffs', 'Regular%20Season']

playtype_urlz = []

for year in years:
    for play in playtypes:
        for s_types in season_types:
            url = 'https://www.nba.com/stats/players/'+ play + '?SeasonType=' + s_types + '&SeasonYear=' + year
            playtype_urlz.append(str(url))

# delete any misc playoff urls, as they do not work
playtype_urlz.remove('https://www.nba.com/stats/players/misc?SeasonType=Playoffs&SeasonYear=2015-16')
playtype_urlz.remove('https://www.nba.com/stats/players/misc?SeasonType=Playoffs&SeasonYear=2016-17')
playtype_urlz.remove('https://www.nba.com/stats/players/misc?SeasonType=Playoffs&SeasonYear=2017-18')
playtype_urlz.remove('https://www.nba.com/stats/players/misc?SeasonType=Playoffs&SeasonYear=2018-19')
playtype_urlz.remove('https://www.nba.com/stats/players/misc?SeasonType=Playoffs&SeasonYear=2019-20')
playtype_urlz.remove('https://www.nba.com/stats/players/misc?SeasonType=Playoffs&SeasonYear=2020-21')
playtype_urlz.remove('https://www.nba.com/stats/players/misc?SeasonType=Playoffs&SeasonYear=2021-22')

len(playtype_urlz)

147

In [348]:
def grab_playtype(url_list, file_folder):
        # Scrape Season-Level player data from the url_list

        i = 0
        for u in url_list:
                
                driver.get(u)
                time.sleep(2)

                # if the page does not load, go to the next in the list
                try:
                        xpath = '//*[@id="__next"]/div[2]/div[2]/div[3]/section[2]/div/div[2]/div[2]/div[1]/div[3]/div/label/div/select/option[1]'
                        elem = WebDriverWait(driver, 30).until(EC.presence_of_element_located((By.XPATH, xpath)))
                except:
                        print(f'{u} did not load. Moving to next url.')
                        continue

                # click "all pages"
                xpath_all = '//*[@id="__next"]/div[2]/div[2]/div[3]/section[2]/div/div[2]/div[2]/div[1]/div[3]/div/label/div/select/option[1]' 
                elem = WebDriverWait(driver, 30).until(EC.presence_of_element_located((By.XPATH, xpath_all)))
                
                driver.find_element(by=By.XPATH, value=xpath_all).click()
                src = driver.page_source
                parser = BeautifulSoup(src, "lxml")
                table = parser.find("table", attrs = {"class":"Crom_table__p1iZz"})
                headers = table.findAll('th')
                headerlist = [h.text.strip() for h in headers[0:]] 
                row_names = table.findAll('a')                             # find rows
                row_list = [b.text.strip() for b in row_names[0:]] 
                rows = table.findAll('tr')[0:]
                player_stats = [[td.getText().strip() for td in rows[i].findAll('td')[0:]] for i in range(len(rows))]
                tot_cols = len(player_stats[1])                           #set the length to ignore hidden columns
                headerlist = headerlist[:tot_cols]   
                stats = pd.DataFrame(player_stats, columns = headerlist)

                # assign filename
                filename = file_folder + str(u[34:]).replace('/', '_') + '.csv'
                filename = replace_name_values(filename)
                filename = filename.replace('SeasonYear_', '')
                pd.DataFrame.to_csv(stats, filename)
                i += 1
                lu = len(url_list)
                # close driver
                print(f'{filename} Completed Successfully! {i} / {lu} Complete!')

        winsound.Beep(523, 500)

In [349]:
to_download = pd.DataFrame(playtype_urlz, columns = ['url'])

In [350]:
def trans_urls(url):
    new_url = str(url)[34:].replace('/', '_')
    filename = replace_name_values2(new_url)
    filename = filename.replace('SeasonYear_', '')
    return filename

In [351]:
# create a new column with the filename
to_download['filename'] = to_download.apply(lambda row: trans_urls(row['url']), axis=1)
to_download

Unnamed: 0,url,filename
0,https://www.nba.com/stats/players/isolation?Se...,isolation_Playoffs_2021-22
1,https://www.nba.com/stats/players/isolation?Se...,isolation_Regular_2021-22
2,https://www.nba.com/stats/players/transition?S...,transition_Playoffs_2021-22
3,https://www.nba.com/stats/players/transition?S...,transition_Regular_2021-22
4,https://www.nba.com/stats/players/ball-handler...,ball-handler_Playoffs_2021-22
...,...,...
142,https://www.nba.com/stats/players/off-screen?S...,off-screen_Playoffs_2015-16
143,https://www.nba.com/stats/players/off-screen?S...,off-screen_Regular_2015-16
144,https://www.nba.com/stats/players/putbacks?Sea...,putbacks_Playoffs_2015-16
145,https://www.nba.com/stats/players/putbacks?Sea...,putbacks_Regular_2015-16


In [352]:
# get list of all downloaded files
downloaded_files = os.listdir('data/player/playtype')
downloaded_files = [x.replace('.csv', '') for x in downloaded_files]

In [353]:
# get list of files not yet downloaded
to_download = to_download[~to_download['filename'].isin(downloaded_files)]
to_download_list = to_download.url.to_list()

In [354]:
to_download_list

['https://www.nba.com/stats/players/isolation?SeasonType=Playoffs&SeasonYear=2021-22',
 'https://www.nba.com/stats/players/isolation?SeasonType=Regular%20Season&SeasonYear=2021-22',
 'https://www.nba.com/stats/players/transition?SeasonType=Playoffs&SeasonYear=2021-22',
 'https://www.nba.com/stats/players/transition?SeasonType=Regular%20Season&SeasonYear=2021-22',
 'https://www.nba.com/stats/players/ball-handler?SeasonType=Playoffs&SeasonYear=2021-22',
 'https://www.nba.com/stats/players/ball-handler?SeasonType=Regular%20Season&SeasonYear=2021-22',
 'https://www.nba.com/stats/players/roll-man?SeasonType=Playoffs&SeasonYear=2021-22',
 'https://www.nba.com/stats/players/roll-man?SeasonType=Regular%20Season&SeasonYear=2021-22',
 'https://www.nba.com/stats/players/playtype-post-up?SeasonType=Playoffs&SeasonYear=2021-22',
 'https://www.nba.com/stats/players/playtype-post-up?SeasonType=Regular%20Season&SeasonYear=2021-22',
 'https://www.nba.com/stats/players/spot-up?SeasonType=Playoffs&Season

In [355]:
len(to_download_list)

147

In [356]:
driver = webdriver.Chrome()
grab_playtype(to_download_list, 'data/player/playtype/')

data/player/playtype/isolation_Playoffs_2021-22.csv Completed Successfully! 1 / 147 Complete!
data/player/playtype/isolation_Regular_2021-22.csv Completed Successfully! 2 / 147 Complete!
data/player/playtype/transition_Playoffs_2021-22.csv Completed Successfully! 3 / 147 Complete!
data/player/playtype/transition_Regular_2021-22.csv Completed Successfully! 4 / 147 Complete!
data/player/playtype/ball-handler_Playoffs_2021-22.csv Completed Successfully! 5 / 147 Complete!
data/player/playtype/ball-handler_Regular_2021-22.csv Completed Successfully! 6 / 147 Complete!
data/player/playtype/roll-man_Playoffs_2021-22.csv Completed Successfully! 7 / 147 Complete!
data/player/playtype/roll-man_Regular_2021-22.csv Completed Successfully! 8 / 147 Complete!
data/player/playtype/playtype-post-up_Playoffs_2021-22.csv Completed Successfully! 9 / 147 Complete!
data/player/playtype/playtype-post-up_Regular_2021-22.csv Completed Successfully! 10 / 147 Complete!
data/player/playtype/spot-up_Playoffs_2021-2

In [357]:
# move files to proper respective folders
for file in os.listdir('data/player/playtype/'):
    if 'Playoffs' in file:
        shutil.move('data/player/playtype/' + file, 'data/player/playtype/playoffs/')
    elif 'Regular' in file:
        shutil.move('data/player/playtype/' + file, 'data/player/playtype/regular_season/')


In [367]:
# agg each sub-category
ball_handler_reg = append_the_data('data/player/playtype/regular_season/', 'playtype_ball_handler__', 'ball-handler')
ball_handler_playoffs = append_the_data('data/player/playtype/playoffs/', 'playtype_ball_handler__', 'ball-handler')
ball_handler = pd.concat([ball_handler_reg, ball_handler_playoffs], axis = 0)

ball_handler_reg.to_csv('data/player/aggregates/Playtype_Offense_Ball_Handler_Regular_Season.csv')
ball_handler_playoffs.to_csv('data/player/aggregates/Playtype_Offense_Ball_Handler_Playoffs.csv')
ball_handler.to_csv('data/player/aggregates/Playtype_Offense_Ball_Handler_ALL.csv')

cutter_reg = append_the_data('data/player/playtype/regular_season/', 'playtype_cut__', 'cut')
cutter_playoffs = append_the_data('data/player/playtype/playoffs/', 'playtype_cut__', 'cut')
cutter = pd.concat([cutter_reg, cutter_playoffs], axis = 0)

cutter_reg.to_csv('data/player/aggregates/Playtype_Offense_Cutter_Regular_Season.csv')
cutter_playoffs.to_csv('data/player/aggregates/Playtype_Offense_Cutter_Playoffs.csv')
cutter.to_csv('data/player/aggregates/Playtype_Offense_Cutter_ALL.csv')

hand_off_reg = append_the_data('data/player/playtype/regular_season/', 'playtype_hand_off__', 'hand-off')
hand_off_playoffs = append_the_data('data/player/playtype/playoffs/', 'playtype_hand_off__', 'hand-off')
hand_off = pd.concat([hand_off_reg, hand_off_playoffs], axis = 0)

hand_off_reg.to_csv('data/player/aggregates/Playtype_Offense_Hand_Off_Regular_Season.csv')
hand_off_playoffs.to_csv('data/player/aggregates/Playtype_Offense_Hand_Off_Playoffs.csv')
hand_off.to_csv('data/player/aggregates/Playtype_Offense_Hand_Off_ALL.csv')

iso_reg = append_the_data('data/player/playtype/regular_season/', 'playtype_iso__', 'isolation')
iso_playoffs = append_the_data('data/player/playtype/playoffs/', 'playtype_iso__', 'isolation')
iso = pd.concat([iso_reg, iso_playoffs], axis = 0)

iso_reg.to_csv('data/player/aggregates/Playtype_Offense_Isolation_Regular_Season.csv')
iso_playoffs.to_csv('data/player/aggregates/Playtype_Offense_Isolation_Playoffs.csv')
iso.to_csv('data/player/aggregates/Playtype_Offense_Isolation_ALL.csv')

# Note: NO PLAYOFF MISC
misc_reg = append_the_data('data/player/playtype/regular_season/', 'playtype_misc__', 'misc')
misc = misc_reg

misc_reg.to_csv('data/player/aggregates/Playtype_Offense_Misc_Regular_Season.csv')
misc.to_csv('data/player/aggregates/Playtype_Offense_Misc_ALL.csv')

off_screen_reg = append_the_data('data/player/playtype/regular_season/', 'playtype_off_screen__', 'off-screen')
off_screen_playoffs = append_the_data('data/player/playtype/playoffs/', 'playtype_off_screen__', 'off-screen')
off_screen = pd.concat([off_screen_reg, off_screen_playoffs], axis = 0)

off_screen_reg.to_csv('data/player/aggregates/Playtype_Offense_Off_Screen_Regular_Season.csv')
off_screen_playoffs.to_csv('data/player/aggregates/Playtype_Offense_Off_Screen_Playoffs.csv')
off_screen.to_csv('data/player/aggregates/Playtype_Offense_Off_Screen_ALL.csv')

postup_reg = append_the_data('data/player/playtype/regular_season/', 'playtype_postup__', 'post-up')
postup_playoffs = append_the_data('data/player/playtype/playoffs/', 'playtype_postup__', 'post-up')
postup = pd.concat([postup_reg, postup_playoffs], axis = 0)

postup_reg.to_csv('data/player/aggregates/Playtype_Offense_Post_Up_Regular_Season.csv')
postup_playoffs.to_csv('data/player/aggregates/Playtype_Offense_Post_Up_Playoffs.csv')
postup.to_csv('data/player/aggregates/Playtype_Offense_Post_Up_ALL.csv')

putback_reg = append_the_data('data/player/playtype/regular_season/', 'playtype_putback__', 'putback')
putback_playoffs = append_the_data('data/player/playtype/playoffs/', 'playtype_putback__', 'putback')
putback = pd.concat([putback_reg, putback_playoffs], axis = 0)

putback_reg.to_csv('data/player/aggregates/Playtype_Offense_Putback_Regular_Season.csv')
putback_playoffs.to_csv('data/player/aggregates/Playtype_Offense_Putback_Playoffs.csv')
putback.to_csv('data/player/aggregates/Playtype_Offense_Putback_ALL.csv')

rollman_reg = append_the_data('data/player/playtype/regular_season/', 'playtype_rollman__', 'roll-man')
rollman_playoffs = append_the_data('data/player/playtype/playoffs/', 'playtype_rollman__', 'roll-man')
rollman = pd.concat([rollman_reg, rollman_playoffs], axis = 0)

rollman_reg.to_csv('data/player/aggregates/Playtype_Offense_Roll_Man_Regular_Season.csv')
rollman_playoffs.to_csv('data/player/aggregates/Playtype_Offense_Roll_Man_Playoffs.csv')
rollman.to_csv('data/player/aggregates/Playtype_Offense_Roll_Man_ALL.csv')

spotups_reg = append_the_data('data/player/playtype/regular_season/', 'playtype_spot_up__', 'spot-up')
spotups_playoffs = append_the_data('data/player/playtype/playoffs/', 'playtype_spot_up__', 'spot-up')
spotups = pd.concat([spotups_reg, spotups_playoffs], axis = 0)

spotups_reg.to_csv('data/player/aggregates/Playtype_Offense_Spot_Up_Regular_Season.csv')
spotups_playoffs.to_csv('data/player/aggregates/Playtype_Offense_Spot_Up_Playoffs.csv')
spotups.to_csv('data/player/aggregates/Playtype_Offense_Spot_Up_ALL.csv')

transition_reg = append_the_data('data/player/playtype/regular_season/', 'playtype_transition__', 'transition')
transition_playoffs = append_the_data('data/player/playtype/playoffs/', 'playtype_transition__', 'transition')
transition = pd.concat([transition_reg, transition_playoffs], axis = 0)

transition_reg.to_csv('data/player/aggregates/Playtype_Offense_Transition_Regular_Season.csv')
transition_playoffs.to_csv('data/player/aggregates/Playtype_Offense_Transition_Playoffs.csv')
transition.to_csv('data/player/aggregates/Playtype_Offense_Transition_ALL.csv')

In [372]:
# get df sizes
print(f' ball_handler: {ball_handler.shape}, cutter: {cutter.shape}, hand_off: {hand_off.shape}, iso: {iso.shape}, misc: {misc.shape}, off_screen: {off_screen.shape}, postup: {postup.shape}, putbacks: {putback.shape}, rollman: {rollman.shape}, spotups: {spotups.shape}, transition: {transition.shape}')

 ball_handler: (3025, 20), cutter: (3475, 20), hand_off: (2666, 20), iso: (2826, 20), misc: (3430, 23), off_screen: (2421, 20), postup: (1916, 20), putbacks: (3078, 20), rollman: (2379, 20), spotups: (4406, 20), transition: (4339, 20)


Because the 'spotups' has the most observations, and I want to use left join for all, I will start with misc. 

#### Add Spotups to Transition

In [381]:
# turn season into int for both dfs
spotups['playtype_spot_up__season'] = spotups['playtype_spot_up__season'].astype(int)
transition['playtype_transition__season'] = transition['playtype_transition__season'].astype(int)

In [393]:
# There are duplicates for each team a player played for. We want to keep these, I believe. 

dups = spotups[spotups.duplicated(subset = ['playtype_spot_up__season', 'playtype_spot_up__player', 'playtype_spot_up__season_type', 'playtype_spot_up__team'], keep = False)]
dups = dups.sort_values(by = ['playtype_spot_up__player', 'playtype_spot_up__season'])
dups

Unnamed: 0,playtype_spot_up__unnamed: 0,playtype_spot_up__player,playtype_spot_up__team,playtype_spot_up__gp,playtype_spot_up__poss,playtype_spot_up__freq%,playtype_spot_up__ppp,playtype_spot_up__pts,playtype_spot_up__fgm,playtype_spot_up__fga,playtype_spot_up__fg%,playtype_spot_up__efg%,playtype_spot_up__ftfreq%,playtype_spot_up__tovfreq%,playtype_spot_up__sffreq%,playtype_spot_up__and onefreq%,playtype_spot_up__scorefreq%,playtype_spot_up__percentile,playtype_spot_up__season,playtype_spot_up__season_type


In [394]:
spotups_dd = spotups.drop_duplicates(subset = ['playtype_spot_up__season', 'playtype_spot_up__player', 'playtype_spot_up__season_type', 'playtype_spot_up__team'])
spotups_dd

Unnamed: 0,playtype_spot_up__unnamed: 0,playtype_spot_up__player,playtype_spot_up__team,playtype_spot_up__gp,playtype_spot_up__poss,playtype_spot_up__freq%,playtype_spot_up__ppp,playtype_spot_up__pts,playtype_spot_up__fgm,playtype_spot_up__fga,playtype_spot_up__fg%,playtype_spot_up__efg%,playtype_spot_up__ftfreq%,playtype_spot_up__tovfreq%,playtype_spot_up__sffreq%,playtype_spot_up__and onefreq%,playtype_spot_up__scorefreq%,playtype_spot_up__percentile,playtype_spot_up__season,playtype_spot_up__season_type
0,0,,,,,,,,,,,,,,,,,,2015,Regular
1,1,Wesley Matthews,DAL,78.0,4.6,36.2,1.11,5.1,1.7,4.3,39.3,56.1,2.8,3.4,2.2,0.6,39.4,86.0,2015,Regular
2,2,Kawhi Leonard,SAS,72.0,4.3,23.5,1.25,5.4,2.0,4.0,49.3,63.5,5.1,3.9,4.8,1.0,49.5,96.3,2015,Regular
3,3,Marvin Williams,CHA,81.0,4.3,39.5,1.12,4.8,1.7,4.0,41.5,56.7,2.9,2.9,2.6,0.6,41.6,86.9,2015,Regular
4,4,JR Smith,CLE,77.0,4.1,33.7,1.18,4.9,1.7,4.0,42.7,60.0,0.6,1.9,0.6,0.0,42.3,93.2,2015,Regular
...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...
183,183,Vlatko Cancar,DEN,2.0,1.0,40.0,0.00,0.0,0.0,0.5,0.0,0.0,0.0,50.0,0.0,0.0,0.0,0.0,2021,Playoffs
184,184,George Hill,MIL,5.0,0.4,28.6,0.00,0.0,0.0,0.4,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,2021,Playoffs
185,185,Jarrett Culver,MEM,3.0,1.0,23.1,0.00,0.0,0.0,0.7,0.0,0.0,0.0,33.3,0.0,0.0,0.0,0.0,2021,Playoffs
186,186,Malachi Flynn,TOR,6.0,0.8,71.4,0.00,0.0,0.0,0.8,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,2021,Playoffs


In [395]:
# check for duplicates
spotups_dd[spotups_dd.duplicated(subset = ['playtype_spot_up__season', 'playtype_spot_up__player', 'playtype_spot_up__season_type', 'playtype_spot_up__team'], keep = False)]

Unnamed: 0,playtype_spot_up__unnamed: 0,playtype_spot_up__player,playtype_spot_up__team,playtype_spot_up__gp,playtype_spot_up__poss,playtype_spot_up__freq%,playtype_spot_up__ppp,playtype_spot_up__pts,playtype_spot_up__fgm,playtype_spot_up__fga,playtype_spot_up__fg%,playtype_spot_up__efg%,playtype_spot_up__ftfreq%,playtype_spot_up__tovfreq%,playtype_spot_up__sffreq%,playtype_spot_up__and onefreq%,playtype_spot_up__scorefreq%,playtype_spot_up__percentile,playtype_spot_up__season,playtype_spot_up__season_type


In [396]:
all_playtype_data = pd.merge(spotups, transition,
                            how = 'left',
                            left_on = ['playtype_spot_up__player', 'playtype_spot_up__season', 'playtype_spot_up__season_type', 'playtype_spot_up__team'],
                            right_on = ['playtype_transition__player', 'playtype_transition__season', 'playtype_transition__season_type', 'playtype_transition__team'])

In [397]:
all_playtype_data

Unnamed: 0,playtype_spot_up__unnamed: 0,playtype_spot_up__player,playtype_spot_up__team,playtype_spot_up__gp,playtype_spot_up__poss,playtype_spot_up__freq%,playtype_spot_up__ppp,playtype_spot_up__pts,playtype_spot_up__fgm,playtype_spot_up__fga,...,playtype_transition__fg%,playtype_transition__efg%,playtype_transition__ftfreq%,playtype_transition__tovfreq%,playtype_transition__sffreq%,playtype_transition__and onefreq%,playtype_transition__scorefreq%,playtype_transition__percentile,playtype_transition__season,playtype_transition__season_type
0,0,,,,,,,,,,...,,,,,,,,,2015.0,Regular
1,1,Wesley Matthews,DAL,78.0,4.6,36.2,1.11,5.1,1.7,4.3,...,41.9,53.8,13.0,3.7,11.1,2.8,46.3,51.3,2015.0,Regular
2,2,Kawhi Leonard,SAS,72.0,4.3,23.5,1.25,5.4,2.0,4.0,...,57.5,62.3,15.9,10.0,11.2,4.7,55.9,73.2,2015.0,Regular
3,3,Marvin Williams,CHA,81.0,4.3,39.5,1.12,4.8,1.7,4.0,...,45.9,55.7,9.9,4.2,7.0,0.0,49.3,53.9,2015.0,Regular
4,4,JR Smith,CLE,77.0,4.1,33.7,1.18,4.9,1.7,4.0,...,46.2,59.8,4.2,3.5,4.2,0.0,45.1,56.1,2015.0,Regular
...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...
4401,183,Vlatko Cancar,DEN,2.0,1.0,40.0,0.00,0.0,0.0,0.5,...,,,,,,,,,,
4402,184,George Hill,MIL,5.0,0.4,28.6,0.00,0.0,0.0,0.4,...,50.0,75.0,0.0,0.0,0.0,0.0,50.0,0.0,2021.0,Playoffs
4403,185,Jarrett Culver,MEM,3.0,1.0,23.1,0.00,0.0,0.0,0.7,...,50.0,50.0,0.0,0.0,0.0,0.0,50.0,0.0,2021.0,Playoffs
4404,186,Malachi Flynn,TOR,6.0,0.8,71.4,0.00,0.0,0.0,0.8,...,,,,,,,,,,


In [398]:
print(f' Spotups shape: {spotups.shape}, Transition shape: {transition.shape}, All Playtype shape: {all_playtype_data.shape}')

 Spotups shape: (4406, 20), Transition shape: (4339, 20), All Playtype shape: (4406, 40)


#### Add Cutter

In [406]:
cutter.head(2)

Unnamed: 0,playtype_cut__unnamed: 0,playtype_cut__player,playtype_cut__team,playtype_cut__gp,playtype_cut__poss,playtype_cut__freq%,playtype_cut__ppp,playtype_cut__pts,playtype_cut__fgm,playtype_cut__fga,playtype_cut__fg%,playtype_cut__efg%,playtype_cut__ftfreq%,playtype_cut__tovfreq%,playtype_cut__sffreq%,playtype_cut__and onefreq%,playtype_cut__scorefreq%,playtype_cut__percentile,playtype_cut__season,playtype_cut__season_type
0,0,,,,,,,,,,,,,,,,,,2015,Regular
1,1,Marcin Gortat,WAS,75.0,4.1,31.4,1.13,4.6,2.0,3.5,57.7,57.7,11.4,5.2,10.4,1.3,58.3,35.3,2015,Regular


In [405]:
cutter['playtype_cut__season'] = cutter['playtype_cut__season'].astype(int)

In [407]:
all_playtype_data2 = pd.merge(all_playtype_data, cutter,
                            how = 'left',
                            left_on = ['playtype_spot_up__player', 'playtype_spot_up__season', 'playtype_spot_up__season_type', 'playtype_spot_up__team'],
                            right_on = ['playtype_cut__player', 'playtype_cut__season', 'playtype_cut__season_type', 'playtype_cut__team'])

print(f' previous shape: {all_playtype_data.shape}, cutter shape: {cutter.shape}, new shape: {all_playtype_data2.shape}')

 previous shape: (4406, 40), cutter shape: (3475, 20), new shape: (4406, 60)


#### Add Ball Handler

In [409]:
ball_handler['playtype_ball_handler__season'] = ball_handler['playtype_ball_handler__season'].astype(int)

In [410]:
all_playtype_data3 = pd.merge(all_playtype_data2, ball_handler,
                            how = 'left',
                            left_on = ['playtype_spot_up__player', 'playtype_spot_up__season', 'playtype_spot_up__season_type', 'playtype_spot_up__team'],
                            right_on = ['playtype_ball_handler__player', 'playtype_ball_handler__season', 'playtype_ball_handler__season_type', 'playtype_ball_handler__team'])

print(f' previous shape: {all_playtype_data2.shape}, ball_handler shape: {ball_handler.shape}, new shape: {all_playtype_data3.shape}')

 previous shape: (4406, 60), ball_handler shape: (3025, 20), new shape: (4406, 80)


#### Add Hand Off

In [411]:
hand_off['playtype_hand_off__season'] = hand_off['playtype_hand_off__season'].astype(int)

In [412]:
all_playtype_data4 = pd.merge(all_playtype_data3, hand_off,
                            how = 'left',
                            left_on = ['playtype_spot_up__player', 'playtype_spot_up__season', 'playtype_spot_up__season_type', 'playtype_spot_up__team'],
                            right_on = ['playtype_hand_off__player', 'playtype_hand_off__season', 'playtype_hand_off__season_type', 'playtype_hand_off__team'])

print(f' previous shape: {all_playtype_data3.shape}, hand_off shape: {hand_off.shape}, new shape: {all_playtype_data4.shape}')

 previous shape: (4406, 80), hand_off shape: (2666, 20), new shape: (4406, 100)


#### Add Isolation

In [414]:
iso['playtype_iso__season'] = iso['playtype_iso__season'].astype(int)

In [415]:
all_playtype_data5 = pd.merge(all_playtype_data4, iso,
                            how = 'left',
                            left_on = ['playtype_spot_up__player', 'playtype_spot_up__season', 'playtype_spot_up__season_type', 'playtype_spot_up__team'],
                            right_on = ['playtype_iso__player', 'playtype_iso__season', 'playtype_iso__season_type', 'playtype_iso__team'])

print(f' previous shape: {all_playtype_data4.shape}, iso shape: {iso.shape}, new shape: {all_playtype_data5.shape}')

 previous shape: (4406, 100), iso shape: (2826, 20), new shape: (4406, 120)


#### Add Off Screen

In [416]:
off_screen['playtype_off_screen__season'] = off_screen['playtype_off_screen__season'].astype(int)

In [417]:
all_playtype_data6 = pd.merge(all_playtype_data5, off_screen,
                            how = 'left',
                            left_on = ['playtype_spot_up__player', 'playtype_spot_up__season', 'playtype_spot_up__season_type', 'playtype_spot_up__team'],
                            right_on = ['playtype_off_screen__player', 'playtype_off_screen__season', 'playtype_off_screen__season_type', 'playtype_off_screen__team'])

print(f' previous shape: {all_playtype_data5.shape}, off_screen shape: {off_screen.shape}, new shape: {all_playtype_data6.shape}')

 previous shape: (4406, 120), off_screen shape: (2421, 20), new shape: (4406, 140)


#### Add Postup

In [418]:
postup['playtype_postup__season'] = postup['playtype_postup__season'].astype(int)

In [419]:
all_playtype_data7 = pd.merge(all_playtype_data6, postup,
                            how = 'left',
                            left_on = ['playtype_spot_up__player', 'playtype_spot_up__season', 'playtype_spot_up__season_type', 'playtype_spot_up__team'],
                            right_on = ['playtype_postup__player', 'playtype_postup__season', 'playtype_postup__season_type', 'playtype_postup__team'])

print(f' previous shape: {all_playtype_data6.shape}, postup shape: {postup.shape}, new shape: {all_playtype_data7.shape}')

 previous shape: (4406, 140), postup shape: (1916, 20), new shape: (4406, 160)


#### Add Putbacks

In [422]:
putback['playtype_putback__season'] = putback['playtype_putback__season'].astype(int)

In [423]:
all_playtype_data8 = pd.merge(all_playtype_data7, putback,
                            how = 'left',
                            left_on = ['playtype_spot_up__player', 'playtype_spot_up__season', 'playtype_spot_up__season_type', 'playtype_spot_up__team'],
                            right_on = ['playtype_putback__player', 'playtype_putback__season', 'playtype_putback__season_type', 'playtype_putback__team'])

print(f' previous shape: {all_playtype_data7.shape}, putback shape: {putback.shape}, new shape: {all_playtype_data8.shape}')

 previous shape: (4406, 160), putback shape: (3078, 20), new shape: (4406, 180)


#### Add Roll Man

In [424]:
rollman['playtype_rollman__season'] = rollman['playtype_rollman__season'].astype(int)

In [425]:
all_playtype_data9 = pd.merge(all_playtype_data8, rollman,
                            how = 'left',
                            left_on = ['playtype_spot_up__player', 'playtype_spot_up__season', 'playtype_spot_up__season_type', 'playtype_spot_up__team'],
                            right_on = ['playtype_rollman__player', 'playtype_rollman__season', 'playtype_rollman__season_type', 'playtype_rollman__team'])

print(f' previous shape: {all_playtype_data8.shape}, rollman shape: {rollman.shape}, new shape: {all_playtype_data9.shape}')

 previous shape: (4406, 180), rollman shape: (2379, 20), new shape: (4406, 200)


#### Add Misc

In [426]:
misc['playtype_misc__season'] = misc['playtype_misc__season'].astype(int)

In [427]:
all_playtype_data10 = pd.merge(all_playtype_data9, misc,
                            how = 'left',
                            left_on = ['playtype_spot_up__player', 'playtype_spot_up__season', 'playtype_spot_up__season_type', 'playtype_spot_up__team'],
                            right_on = ['playtype_misc__player', 'playtype_misc__season', 'playtype_misc__season_type', 'playtype_misc__team'])

print(f' previous shape: {all_playtype_data9.shape}, misc shape: {misc.shape}, new shape: {all_playtype_data10.shape}')

 previous shape: (4406, 200), misc shape: (3430, 23), new shape: (4406, 223)


In [428]:
all_playtype_data10.to_csv('data/player/aggregates/ALL_Playtypes_Offense.csv')

## Player - Playtype - Defense (Complete 12.22)

In [445]:
if os.path.isdir('data/player/playtype/defense') is False:
    os.mkdir('data/player/playtype/defense')

def_playtypes = ['isolation', 'ball-handler', 'roll-man', 
            'playtype-post-up','spot-up', 'hand-off','off-screen']

yearz = ['2021-22', '2020-21', '2019-20', '2018-19', '2017-18', '2016-17', '2015-16']
def_playtype_urls = []
for year in yearz:
    for play in def_playtypes:
        for s_types in season_types:
            url = 'https://www.nba.com/stats/players/'+ play +'/?SeasonYear=' + year + '&SeasonType=' + s_types
            def_playtype_urls.append(str(url))

In [446]:
def_playtypes = pd.DataFrame(def_playtype_urls, columns=['url'])
def_playtypes.head(2)

Unnamed: 0,url
0,https://www.nba.com/stats/players/isolation/?S...
1,https://www.nba.com/stats/players/isolation/?S...


In [447]:
def_playtypes['filename'] = def_playtypes.apply(lambda row: trans_urls(row['url']), axis=1)
def_playtypes.head(2)

Unnamed: 0,url,filename
0,https://www.nba.com/stats/players/isolation/?S...,isolation__2021-22_Playoffs
1,https://www.nba.com/stats/players/isolation/?S...,isolation__2021-22_Regular


In [448]:
# find already downloaded files
already_downloaded = os.listdir('data/player/playtype/defense')
already_downloaded = [x.replace('.csv', '') for x in already_downloaded]

# remove already downloaded files from the list
def_playtypes_not_dl = def_playtypes[~def_playtypes['filename'].isin(already_downloaded)]

need_to_dl = def_playtypes_not_dl['url'].to_list()

len(need_to_dl)

98

In [449]:
grab_player_playtype_defense(need_to_dl, 'data/player/playtype/defense/')

data/player/playtype/defense/isolation__2021-22_Playoffs.csv Completed Successfully! 1 / 98 Complete!
data/player/playtype/defense/isolation__2021-22_Regular.csv Completed Successfully! 2 / 98 Complete!
data/player/playtype/defense/ball-handler__2021-22_Playoffs.csv Completed Successfully! 3 / 98 Complete!
data/player/playtype/defense/ball-handler__2021-22_Regular.csv Completed Successfully! 4 / 98 Complete!
data/player/playtype/defense/roll-man__2021-22_Playoffs.csv Completed Successfully! 5 / 98 Complete!
data/player/playtype/defense/roll-man__2021-22_Regular.csv Completed Successfully! 6 / 98 Complete!
data/player/playtype/defense/playtype-post-up__2021-22_Playoffs.csv Completed Successfully! 7 / 98 Complete!
data/player/playtype/defense/playtype-post-up__2021-22_Regular.csv Completed Successfully! 8 / 98 Complete!
data/player/playtype/defense/spot-up__2021-22_Playoffs.csv Completed Successfully! 9 / 98 Complete!
data/player/playtype/defense/spot-up__2021-22_Regular.csv Completed Su

In [450]:
# move files to the correct folder
for file in os.listdir('data/player/playtype/defense/'):
    if file.endswith('.csv'):
        if 'Playoffs' in file:
            shutil.move('data/player/playtype/defense/'+ file, 'data/player/playtype/defense/playoffs/')

        elif 'Regular' in file:
            shutil.move('data/player/playtype/defense/'+ file, 'data/player/playtype/defense/regular_season/')
            

### Aggregate

In [451]:
playtype_defense_reg = os.listdir('data/player/playtype/defense/regular_season/')
playtype_defense_playoffs = os.listdir('data/player/playtype/defense/playoffs/')

In [452]:
# agg reg_season ball handler
# agg each sub-category

d_ball_handler_reg = append_the_data('data/player/playtype/defense/regular_season/', 'playtype_ball_handler__', 'ball-handler')
d_ball_handler_playoffs = append_the_data('data/player/playtype/defense/playoffs/', 'playtype_ball_handler__', 'ball-handler')
d_ball_handler = pd.concat([d_ball_handler_reg, d_ball_handler_playoffs], axis = 0)

d_ball_handler_reg.to_csv('data/player/aggregates/Playtype_Defense_Ball_Handler_Regular_Season.csv')
d_ball_handler_playoffs.to_csv('data/player/aggregates/Playtype_Defense_Ball_Handler_Playoffs.csv')
d_ball_handler.to_csv('data/player/aggregates/Playtype_Defense_Ball_Handler_ALL.csv')

d_hand_off_reg = append_the_data('data/player/playtype/defense/regular_season/', 'playtype_hand_off__', 'hand-off')
d_hand_off_playoffs = append_the_data('data/player/playtype/defense/playoffs/', 'playtype_hand_off__', 'hand-off')
d_hand_off = pd.concat([d_hand_off_reg, d_hand_off_playoffs], axis = 0)

d_hand_off_reg.to_csv('data/player/aggregates/Playtype_Defense_Hand_Off_Regular_Season.csv')
d_hand_off_playoffs.to_csv('data/player/aggregates/Playtype_Defense_Hand_Off_Playoffs.csv')
d_hand_off.to_csv('data/player/aggregates/Playtype_Defense_Hand_Off_ALL.csv')

d_iso_reg = append_the_data('data/player/playtype/defense/regular_season/', 'playtype_isolation__', 'isolation')
d_iso_playoffs = append_the_data('data/player/playtype/defense/playoffs/', 'playtype_isolation__', 'isolation')
d_iso = pd.concat([d_iso_reg, d_iso_playoffs], axis = 0)

d_iso_reg.to_csv('data/player/aggregates/Playtype_Defense_Isolation_Regular_Season.csv')
d_iso_playoffs.to_csv('data/player/aggregates/Playtype_Defense_Isolation_Playoffs.csv')
d_iso.to_csv('data/player/aggregates/Playtype_Defense_Isolation_ALL.csv')

d_off_screen_reg = append_the_data('data/player/playtype/defense/regular_season/', 'playtype_off_screen__', 'off-screen')
d_off_screen_playoffs = append_the_data('data/player/playtype/defense/playoffs/', 'playtype_off_screen__', 'off-screen')
d_off_screen = pd.concat([d_off_screen_reg, d_off_screen_playoffs], axis = 0)

d_off_screen_reg.to_csv('data/player/aggregates/Playtype_Defense_Off_Screen_Regular_Season.csv')
d_off_screen_playoffs.to_csv('data/player/aggregates/Playtype_Defense_Off_Screen_Playoffs.csv')
d_off_screen.to_csv('data/player/aggregates/Playtype_Defense_Off_Screen_ALL.csv')

d_postup_reg = append_the_data('data/player/playtype/defense/regular_season/', 'playtype_post_up__', 'playtype-post-up')
d_postup_playoffs = append_the_data('data/player/playtype/defense/playoffs/', 'playtype_post_up__', 'playtype-post-up')
d_postup = pd.concat([d_postup_reg, d_postup_playoffs], axis = 0)

d_postup_reg.to_csv('data/player/aggregates/Playtype_Defense_Post_Up_Regular_Season.csv')
d_postup_playoffs.to_csv('data/player/aggregates/Playtype_Defense_Post_Up_Playoffs.csv')
d_postup.to_csv('data/player/aggregates/Playtype_Defense_Post_Up_ALL.csv')

d_rollman_reg = append_the_data('data/player/playtype/defense/regular_season/', 'playtype_roll_man__', 'roll-man')
d_rollman_playoffs = append_the_data('data/player/playtype/defense/playoffs/', 'playtype_roll_man__', 'roll-man')
d_rollman = pd.concat([d_rollman_reg, d_rollman_playoffs], axis = 0)

d_rollman_reg.to_csv('data/player/aggregates/Playtype_Defense_Roll_Man_Regular_Season.csv')
d_rollman_playoffs.to_csv('data/player/aggregates/Playtype_Defense_Roll_Man_Playoffs.csv')
d_rollman.to_csv('data/player/aggregates/Playtype_Defense_Roll_Man_ALL.csv')

d_spotup_reg = append_the_data('data/player/playtype/defense/regular_season/', 'playtype_spot_up__', 'spot-up')
d_spotup_playoffs = append_the_data('data/player/playtype/defense/playoffs/', 'playtype_spot_up__', 'spot-up')
d_spotup = pd.concat([d_spotup_reg, d_spotup_playoffs], axis = 0)

d_spotup_reg.to_csv('data/player/aggregates/Playtype_Defense_Spot_Up_Regular_Season.csv')
d_spotup_playoffs.to_csv('data/player/aggregates/Playtype_Defense_Spot_Up_Playoffs.csv')
d_spotup.to_csv('data/player/aggregates/Playtype_Defense_Spot_Up_ALL.csv')


In [459]:
d_ball_handler

Unnamed: 0,playtype_ball_handler__unnamed: 0,playtype_ball_handler__player,playtype_ball_handler__team,playtype_ball_handler__gp,playtype_ball_handler__poss,playtype_ball_handler__freq%,playtype_ball_handler__ppp,playtype_ball_handler__pts,playtype_ball_handler__fgm,playtype_ball_handler__fga,playtype_ball_handler__fg%,playtype_ball_handler__efg%,playtype_ball_handler__ftfreq%,playtype_ball_handler__tovfreq%,playtype_ball_handler__sffreq%,playtype_ball_handler__and onefreq%,playtype_ball_handler__scorefreq%,playtype_ball_handler__percentile,playtype_ball_handler__season,playtype_ball_handler__season_type
0,0,,,,,,,,,,,,,,,,,,2015,Regular
1,1,Reggie Jackson,DET,79.0,11.3,55.9,0.88,9.9,4.0,9.1,44.5,47.6,7.9,14.2,7.3,2.0,41.3,77.2,2015,Regular
2,2,Damian Lillard,POR,75.0,10.9,43.0,0.92,10.0,3.5,8.4,41.2,47.8,11.4,13.6,9.6,2.3,40.9,84.8,2015,Regular
3,3,Chris Paul,LAC,75.0,10.0,51.9,0.94,9.4,3.8,8.1,47.0,50.3,7.7,12.6,3.9,0.9,44.5,89.0,2015,Regular
4,4,Kemba Walker,CHA,81.0,9.7,46.7,0.89,8.7,3.2,7.8,41.5,46.2,10.1,10.9,8.4,0.9,41.8,80.7,2015,Regular
...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...
46,46,Patty Mills,BKN,4.0,1.8,21.0,0.29,0.0,0.3,1.3,20.0,20.0,0.0,29.0,0.0,0.0,14.0,0.0,2021,Playoffs
47,47,Damion Lee,GSW,16.0,0.6,23.0,0.20,0.0,0.1,0.5,12.0,12.0,0.0,20.0,0.0,0.0,10.0,100.0,2021,Playoffs
48,48,Nikola Vucevic,CHI,5.0,0.2,1.0,3.00,1.0,0.2,0.2,100.0,150.0,0.0,0.0,0.0,0.0,100.0,0.0,2021,Playoffs
49,49,Andre Drummond,BKN,4.0,0.3,7.0,3.00,1.0,0.3,0.3,100.0,150.0,0.0,0.0,0.0,0.0,100.0,0.0,2021,Playoffs


In [461]:
print(f' ball handler size: {d_ball_handler.shape}, hand off size: {d_hand_off.shape}, iso size: {d_iso.shape}, off screen size: {d_off_screen.shape}, post up size: {d_postup.shape}, rollman: {d_rollman.shape}, spotup: {d_spotup.shape}')

 ball handler size: (2577, 20), hand off size: (2280, 20), iso size: (2306, 20), off screen size: (2295, 20), post up size: (1814, 20), rollman: (1931, 20), spotup: (3141, 20)


In [466]:
# Merge the data

defensive_playtypes1 = pd.merge(d_spotup, d_ball_handler, 
                                left_on = ['playtype_spot_up__team', 'playtype_spot_up__player', 'playtype_spot_up__season', 'playtype_spot_up__season_type'],
                                right_on = ['playtype_ball_handler__team', 'playtype_ball_handler__player', 'playtype_ball_handler__season', 'playtype_ball_handler__season_type'],
                                how = 'left')

defensive_playtypes2 = pd.merge(defensive_playtypes1, d_hand_off,
                                left_on = ['playtype_spot_up__team', 'playtype_spot_up__player', 'playtype_spot_up__season', 'playtype_spot_up__season_type'],
                                right_on = ['playtype_hand_off__team', 'playtype_hand_off__player', 'playtype_hand_off__season', 'playtype_hand_off__season_type'],
                                how = 'left')

defensive_playtypes3 = pd.merge(defensive_playtypes2, d_iso,
                                left_on = ['playtype_spot_up__team', 'playtype_spot_up__player', 'playtype_spot_up__season', 'playtype_spot_up__season_type'],
                                right_on = ['playtype_isolation__team', 'playtype_isolation__player', 'playtype_isolation__season', 'playtype_isolation__season_type'],
                                how = 'left')

defensive_playtypes4 = pd.merge(defensive_playtypes3, d_off_screen,
                                left_on = ['playtype_spot_up__team', 'playtype_spot_up__player', 'playtype_spot_up__season', 'playtype_spot_up__season_type'],
                                right_on = ['playtype_off_screen__team', 'playtype_off_screen__player', 'playtype_off_screen__season', 'playtype_off_screen__season_type'],
                                how = 'left')

defensive_playtypes5 = pd.merge(defensive_playtypes4, d_postup,
                                left_on = ['playtype_spot_up__team', 'playtype_spot_up__player', 'playtype_spot_up__season', 'playtype_spot_up__season_type'],
                                right_on = ['playtype_post_up__team', 'playtype_post_up__player', 'playtype_post_up__season', 'playtype_post_up__season_type'],
                                how = 'left')

defensive_playtypes6 = pd.merge(defensive_playtypes5, d_rollman,
                                left_on = ['playtype_spot_up__team', 'playtype_spot_up__player', 'playtype_spot_up__season', 'playtype_spot_up__season_type'],
                                right_on = ['playtype_roll_man__team', 'playtype_roll_man__player', 'playtype_roll_man__season', 'playtype_roll_man__season_type'],
                                how = 'left')

 defensive playtypes3 size: (3141, 80), defensive playtypes2 size: (3141, 60), iso size: (2306, 20)


In [471]:
# save to csv
defensive_playtypes6.to_csv('data/player/aggregates/Playtype_Defense_ALL.csv')

## Player - Tracking (Complete 12.22)

In [479]:
def grab_player_tracking_stats(url_list, file_folder):
        driver = webdriver.Chrome()
        i = 0
        for u in url_list:
                driver.get(u)
                time.sleep(1)
                # if the page does not load, go to the next in the list
                try:
                        xpath = '//*[@id="__next"]/div[2]/div[2]/div[3]/section[2]/div/div[2]/div[2]/div[1]/div[3]/div/label/div/select/option[1]'
                        elem = WebDriverWait(driver, 30).until(EC.presence_of_element_located((By.XPATH, xpath)))
                except:
                        print(f'{u} did not load. Moving to next url.')
                        continue
                
                elem = WebDriverWait(driver, 30).until(EC.presence_of_element_located((By.XPATH, xpath)))
                driver.find_element(by=By.XPATH, value=xpath).click()
                src = driver.page_source
                parser = BeautifulSoup(src, "lxml")
                table = parser.find("table", attrs = {"class":"Crom_table__p1iZz"})
                
                # get the headers

                headers = table.findAll('th')
                headerlist = [h.text.strip() for h in headers[0:]] 

                # if there are repreated headers in headerlist, delete them
                headerlist = [i for n, i in enumerate(headerlist) if i not in headerlist[:n]]

                row_names = table.findAll('a')             
                row_list = [b.text.strip() for b in row_names[0:]] 
                rows = table.findAll('tr')[0:]
                player_stats = [[td.getText().strip() for td in rows[i].findAll('td')[0:]] for i in range(len(rows))]
                tot_cols = len(headerlist)
                headerlist = headerlist[:tot_cols]       
                stats = pd.DataFrame(player_stats, columns = headerlist)


                filename = file_folder + str(u[34:]).replace('/', '_') + '.csv'
                filename = replace_name_values2(filename)
                filename = filename.replace('_Season_','')

                # save
                pd.DataFrame.to_csv(stats, filename)
                
                # increment counter
                i += 1
                lu = len(url_list)
                print(f'{filename} Completed Successfully! {i} / {lu} Complete!')

In [480]:
# Get URLS

drives = 'https://www.nba.com/stats/players/drives/?Season='    #=2018-19&SeasonType=Regular%20Season'
defensive_impact = 'https://www.nba.com/stats/players/defensive-impact/?Season='    #=2018-19&SeasonType=Regular%20Season'
catch_n_shoot = 'https://www.nba.com/stats/players/catch-shoot/?Season=' 
passing = 'https://www.nba.com/stats/players/passing/?Season='
touches = 'https://www.nba.com/stats/players/touches/?Season='
pullup_shooting = 'https://www.nba.com/stats/players/pullup/?Season='
rebounds = 'https://www.nba.com/stats/players/rebounding/?Season='
offensive_rebounding = 'https://www.nba.com/stats/players/offensive-rebounding/?Season='
defensive_rebounding = 'https://www.nba.com/stats/players/defensive-rebounding/?Season='
shooting_efficiency = 'https://www.nba.com/stats/players/shooting-efficiency/?Season='
speed_distance = 'https://www.nba.com/stats/players/speed-distance/?Season='
elbow_touch = 'https://www.nba.com/stats/players/elbow-touch/?Season='
postups= 'https://www.nba.com/stats/players/tracking-post-ups/?Season='
paint_touches = 'https://www.nba.com/stats/players/paint-touch/?Season='

tracking_stats = [drives, defensive_impact, catch_n_shoot, passing, touches, pullup_shooting, rebounds, offensive_rebounding, defensive_rebounding, shooting_efficiency, speed_distance, elbow_touch, postups, paint_touches]
seasonz = ['2021-22', '2020-21', '2019-20', '2018-19', '2017-18', '2016-17', '2015-16', '2014-15', '2013-14']

tracking_urls = []
for s in seasonz:
        for t in tracking_stats:
                tracking_urls.append(t + s + '&SeasonType=Regular%20Season')

In [481]:
grab_player_tracking_stats(tracking_urls, 'data/player/tracking/')

data/player/tracking/drives_2021-22_Regular.csv Completed Successfully! 1 / 126 Complete!
data/player/tracking/defensive-impact_2021-22_Regular.csv Completed Successfully! 2 / 126 Complete!
data/player/tracking/catch-shoot_2021-22_Regular.csv Completed Successfully! 3 / 126 Complete!
data/player/tracking/passing_2021-22_Regular.csv Completed Successfully! 4 / 126 Complete!
data/player/tracking/touches_2021-22_Regular.csv Completed Successfully! 5 / 126 Complete!
data/player/tracking/pullup_2021-22_Regular.csv Completed Successfully! 6 / 126 Complete!
data/player/tracking/rebounding_2021-22_Regular.csv Completed Successfully! 7 / 126 Complete!
data/player/tracking/offensive-rebounding_2021-22_Regular.csv Completed Successfully! 8 / 126 Complete!
data/player/tracking/defensive-rebounding_2021-22_Regular.csv Completed Successfully! 9 / 126 Complete!
data/player/tracking/shooting-efficiency_2021-22_Regular.csv Completed Successfully! 10 / 126 Complete!
data/player/tracking/speed-distance_2

### Folders

In [482]:
# move files to proper folders
files = os.listdir('data/player/tracking/')
for f in files:
    if '.csv' in f:
        if 'Regular' in f:
            shutil.move('data/player/tracking/' + f, 'data/player/tracking/regular_Season/' + f)
        elif 'Playoffs' in f:
            shutil.move('data/player/tracking/' + f, 'data/player/tracking/playoffs/' + f)

In [486]:
# Append
drives_df = append_the_data('data/player/tracking/regular_season/', 'tracking_drives__', 'drives')
defensive_impact_df = append_the_data('data/player/tracking/regular_season/', 'tracking_defensive_impact__', 'defensive-impact')
catch_n_shoot_df = append_the_data('data/player/tracking/regular_season/', 'tracking_catch_n_shoot__', 'catch-shoot')
passing_df = append_the_data('data/player/tracking/regular_season/', 'tracking_passing__', 'passing')
touches_df = append_the_data('data/player/tracking/regular_season/', 'tracking_touches__', 'touches')
pullup_shooting_df = append_the_data('data/player/tracking/regular_season/', 'tracking_pullup_shooting__', 'pullup')
offensive_rebounding_df = append_the_data('data/player/tracking/regular_season/', 'tracking_offensive_rebounding__', 'offensive-rebounding')
defensive_rebounding_df = append_the_data('data/player/tracking/regular_season/', 'tracking_defensive_rebounding__', 'defensive-rebounding')
shooting_efficiency_df = append_the_data('data/player/tracking/regular_season/', 'tracking_shooting_efficiency__', 'shooting-efficiency')
speed_distance_df = append_the_data('data/player/tracking/regular_season/', 'tracking_speed_distance__', 'speed-distance')
elbow_touch_df = append_the_data('data/player/tracking/regular_season/', 'tracking_elbow_touch__', 'elbow-touch')
postups_df = append_the_data('data/player/tracking/regular_season/', 'tracking_postups__', 'post-ups')
paint_touches_df = append_the_data('data/player/tracking/regular_season/', 'tracking_paint_touches__', 'paint-touch')

# Save
drives_df.to_csv('data/player/aggregates/tracking_drives.csv')
defensive_impact_df.to_csv('data/player/aggregates/tracking_defensive_impact.csv')
catch_n_shoot_df.to_csv('data/player/aggregates/tracking_catch_n_shoot.csv')
passing_df.to_csv('data/player/aggregates/tracking_passing.csv')
touches_df.to_csv('data/player/aggregates/tracking_touches.csv')
pullup_shooting_df.to_csv('data/player/aggregates/tracking_pullup_shooting.csv')
offensive_rebounding_df.to_csv('data/player/aggregates/tracking_offensive_rebounding.csv')
defensive_rebounding_df.to_csv('data/player/aggregates/tracking_defensive_rebounding.csv')
shooting_efficiency_df.to_csv('data/player/aggregates/tracking_shooting_efficiency.csv')
speed_distance_df.to_csv('data/player/aggregates/tracking_speed_distance.csv')
elbow_touch_df.to_csv('data/player/aggregates/tracking_elbow_touch.csv')
postups_df.to_csv('data/player/aggregates/tracking_postups.csv')
paint_touches_df.to_csv('data/player/aggregates/tracking_paint_touches.csv')

print(f' drives: {drives_df.shape}, defensive impact: {defensive_impact_df.shape}, catch n shoot: {catch_n_shoot_df.shape}, passing: {passing_df.shape}, touches: {touches_df.shape}, pullup shooting: {pullup_shooting_df.shape}, offensive rebounding: {offensive_rebounding_df.shape}, defensive rebounding: {defensive_rebounding_df.shape}, shooting efficiency: {shooting_efficiency_df.shape}, speed distance: {speed_distance_df.shape}, elbow touch: {elbow_touch_df.shape}, postups: {postups_df.shape}, paint touches: {paint_touches_df.shape}')

 drives: (4689, 26), defensive impact: (4689, 15), catch n shoot: (4689, 15), passing: (4689, 18), touches: (4689, 22), pullup shooting: (4689, 17), offensive rebounding: (4689, 17), defensive rebounding: (4689, 17), shooting efficiency: (4689, 23), speed distance: (4689, 16), elbow touch: (4083, 27), postups: (4689, 27), paint touches: (4689, 27)


In [487]:
drives_df.head()

Unnamed: 0,tracking_drives__unnamed: 0,tracking_drives__player,tracking_drives__team,tracking_drives__gp,tracking_drives__w,tracking_drives__l,tracking_drives__min,tracking_drives__drives,tracking_drives__fgm,tracking_drives__fga,...,tracking_drives__pass,tracking_drives__pass%,tracking_drives__ast,tracking_drives__ast%,tracking_drives__to,tracking_drives__tov%,tracking_drives__pf,tracking_drives__pf%,tracking_drives__season,tracking_drives__season_type
0,0,,,,,,,,,,...,,,,,,,,,2013,Regular
1,1,AJ Price,MIN,28.0,15.0,13.0,3.5,0.6,0.1,0.1,...,0.3,56.3,0.1,12.5,0.0,6.3,0.0,0.0,2013,Regular
2,2,Aaron Brooks,DEN,72.0,42.0,30.0,21.6,7.2,1.1,3.0,...,2.6,35.7,0.7,10.2,0.4,5.8,0.3,3.6,2013,Regular
3,3,Aaron Gray,SAC,36.0,11.0,25.0,9.7,0.1,0.0,0.0,...,0.0,50.0,0.0,0.0,0.0,0.0,0.0,0.0,2013,Regular
4,4,Adonis Thomas,PHI,6.0,2.0,4.0,6.3,0.3,0.0,0.2,...,0.2,50.0,0.0,0.0,0.0,0.0,0.0,0.0,2013,Regular


In [490]:
all_tracking_data = pd.merge(drives_df, defensive_impact_df,
                                left_on = ['tracking_drives__player', 'tracking_drives__team' ,'tracking_drives__season', 'tracking_drives__season_type'],
                                right_on = ['tracking_defensive_impact__player', 'tracking_defensive_impact__team', 'tracking_defensive_impact__season', 'tracking_defensive_impact__season_type'],
                                how = 'left')

print(f' drives: {drives_df.shape}, defensive impact: {defensive_impact_df.shape}, all_tracking_data: {all_tracking_data.shape}')

 drives: (4689, 26), defensive impact: (4689, 15), all_tracking_data: (4689, 41)


In [491]:
# add catch n shoot
all_tracking_data2 = pd.merge(all_tracking_data, catch_n_shoot_df,
                                left_on = ['tracking_drives__player', 'tracking_drives__team' ,'tracking_drives__season', 'tracking_drives__season_type'],
                                right_on = ['tracking_catch_n_shoot__player', 'tracking_catch_n_shoot__team', 'tracking_catch_n_shoot__season', 'tracking_catch_n_shoot__season_type'],
                                how = 'left')

print(f' drives: {drives_df.shape}, defensive impact: {defensive_impact_df.shape}, catch n shoot: {catch_n_shoot_df.shape}, all_tracking_data2: {all_tracking_data2.shape}')

 drives: (4689, 26), defensive impact: (4689, 15), catch n shoot: (4689, 15), all_tracking_data2: (4689, 56)


In [492]:
# add passing
all_tracking_data3 = pd.merge(all_tracking_data2, passing_df,
                                left_on = ['tracking_drives__player', 'tracking_drives__team' ,'tracking_drives__season', 'tracking_drives__season_type'],
                                right_on = ['tracking_passing__player', 'tracking_passing__team', 'tracking_passing__season', 'tracking_passing__season_type'],
                                how = 'left')

print(f' drives: {drives_df.shape}, defensive impact: {defensive_impact_df.shape}, catch n shoot: {catch_n_shoot_df.shape}, passing: {passing_df.shape}, all_tracking_data3: {all_tracking_data3.shape}')

 drives: (4689, 26), defensive impact: (4689, 15), catch n shoot: (4689, 15), passing: (4689, 18), all_tracking_data3: (4689, 74)


In [493]:
# add touches
all_tracking_data4 = pd.merge(all_tracking_data3, touches_df,
                                left_on = ['tracking_drives__player', 'tracking_drives__team' ,'tracking_drives__season', 'tracking_drives__season_type'],
                                right_on = ['tracking_touches__player', 'tracking_touches__team', 'tracking_touches__season', 'tracking_touches__season_type'],
                                how = 'left')

In [494]:
# add pullup shooting
all_tracking_data5 = pd.merge(all_tracking_data4, pullup_shooting_df,
                                left_on = ['tracking_drives__player', 'tracking_drives__team' ,'tracking_drives__season', 'tracking_drives__season_type'],
                                right_on = ['tracking_pullup_shooting__player', 'tracking_pullup_shooting__team', 'tracking_pullup_shooting__season', 'tracking_pullup_shooting__season_type'],
                                how = 'left')

In [495]:
# add offensive rebounding
all_tracking_data6 = pd.merge(all_tracking_data5, offensive_rebounding_df,
                                left_on = ['tracking_drives__player', 'tracking_drives__team' ,'tracking_drives__season', 'tracking_drives__season_type'],
                                right_on = ['tracking_offensive_rebounding__player', 'tracking_offensive_rebounding__team', 'tracking_offensive_rebounding__season', 'tracking_offensive_rebounding__season_type'],
                                how = 'left')

In [496]:
# add defensive rebounding
all_tracking_data7 = pd.merge(all_tracking_data6, defensive_rebounding_df,
                                left_on = ['tracking_drives__player', 'tracking_drives__team' ,'tracking_drives__season', 'tracking_drives__season_type'],
                                right_on = ['tracking_defensive_rebounding__player', 'tracking_defensive_rebounding__team', 'tracking_defensive_rebounding__season', 'tracking_defensive_rebounding__season_type'],
                                how = 'left')

In [497]:
# add shooting efficiency
all_tracking_data8 = pd.merge(all_tracking_data7, shooting_efficiency_df,
                                left_on = ['tracking_drives__player', 'tracking_drives__team' ,'tracking_drives__season', 'tracking_drives__season_type'],
                                right_on = ['tracking_shooting_efficiency__player', 'tracking_shooting_efficiency__team', 'tracking_shooting_efficiency__season', 'tracking_shooting_efficiency__season_type'],
                                how = 'left')

In [498]:
# add speed distance
all_tracking_data9 = pd.merge(all_tracking_data8, speed_distance_df,
                                left_on = ['tracking_drives__player', 'tracking_drives__team' ,'tracking_drives__season', 'tracking_drives__season_type'],
                                right_on = ['tracking_speed_distance__player', 'tracking_speed_distance__team', 'tracking_speed_distance__season', 'tracking_speed_distance__season_type'],
                                how = 'left')

In [501]:
all_tracking_data10 = pd.merge(all_tracking_data9, catch_n_shoot_df,
                                left_on = ['tracking_drives__player', 'tracking_drives__team' ,'tracking_drives__season', 'tracking_drives__season_type'],
                                right_on = ['tracking_catch_n_shoot__player', 'tracking_catch_n_shoot__team', 'tracking_catch_n_shoot__season', 'tracking_catch_n_shoot__season_type'],
                                how = 'left')

In [502]:
all_playtype_data11 = pd.merge(all_tracking_data10, paint_touches_df,
                                left_on = ['tracking_drives__player', 'tracking_drives__team' ,'tracking_drives__season', 'tracking_drives__season_type'],
                                right_on = ['tracking_paint_touches__player', 'tracking_paint_touches__team', 'tracking_paint_touches__season', 'tracking_paint_touches__season_type'],
                                how = 'left')

In [505]:
all_playtype_data_final = pd.merge(all_playtype_data11, postups_df,
                                left_on = ['tracking_drives__player', 'tracking_drives__team' ,'tracking_drives__season', 'tracking_drives__season_type'],
                                right_on = ['tracking_postups__player', 'tracking_postups__team', 'tracking_postups__season', 'tracking_postups__season_type'],
                                how = 'left')

In [506]:
all_playtype_data_final.to_csv('data/player/aggregates/All_tracking_data.csv', index = False)

## Player - Defensive Dashboard (Complete 12.22)

In [507]:
# This one is weird... https://www.nba.com/stats/players/defense-dash-overall/?SeasonYear=2018-19&SeasonType=Regular%20Season&Season=2018-19
years =['2021-22', '2020-21', '2019-20', '2018-19', '2017-18', '2016-17', '2015-16','2014-15', '2013-14']
types = ['defense-dash-overall', 'defense-dash-3pt', 'defense-dash-2pt', 'defense-dash-lt6',
         'defense-dash-lt10', 'defense-dash-gt15' ]
season_types = ['Playoffs', 'Regular%20Season']

def_dash_urls = []

for year in years:
    for typ in types:
        for s_types in season_types:
            url = 'https://www.nba.com/stats/players/'+ typ +'/?SeasonYear=' + year + '&SeasonType=' + s_types + '&Season=' + year
            def_dash_urls.append(str(url))

In [508]:
if os.path.isdir('data/player/defensive_dashboard') is False:
    os.mkdir('data/player/defensive_dashboard')

In [515]:
driver = webdriver.Chrome()

In [519]:
grab_player_data2(def_dash_urls, 'data/player/defensive_dashboard/')

data/player/defensive_dashboard/defense-dash-overall__2021-22_Playoffs_Season_2021-22.csv Completed Successfully! 1 / 108 Complete!
data/player/defensive_dashboard/defense-dash-overall__2021-22_Regular_Season_2021-22.csv Completed Successfully! 2 / 108 Complete!
data/player/defensive_dashboard/defense-dash-3pt__2021-22_Playoffs_Season_2021-22.csv Completed Successfully! 3 / 108 Complete!
data/player/defensive_dashboard/defense-dash-3pt__2021-22_Regular_Season_2021-22.csv Completed Successfully! 4 / 108 Complete!
data/player/defensive_dashboard/defense-dash-2pt__2021-22_Playoffs_Season_2021-22.csv Completed Successfully! 5 / 108 Complete!
data/player/defensive_dashboard/defense-dash-2pt__2021-22_Regular_Season_2021-22.csv Completed Successfully! 6 / 108 Complete!
data/player/defensive_dashboard/defense-dash-lt6__2021-22_Playoffs_Season_2021-22.csv Completed Successfully! 7 / 108 Complete!
data/player/defensive_dashboard/defense-dash-lt6__2021-22_Regular_Season_2021-22.csv Completed Succ

### Folders

In [520]:
if os.path.isdir('data/player/defensive_dashboard/regular_season') is False:
    os.mkdir('data/player/defensive_dashboard/regular_season')
if os.path.isdir('data/player/defensive_dashboard/playoffs') is False:
    os.mkdir('data/player/defensive_dashboard/playoffs')

In [521]:
# move files to correct folders
for file in os.listdir('data/player/defensive_dashboard'):
    if 'Regular' in file:
        shutil.move('data/player/defensive_dashboard/' + file, 'data/player/defensive_dashboard/regular_season/')
    if 'Playoffs' in file:
        shutil.move('data/player/defensive_dashboard/' + file, 'data/player/defensive_dashboard/playoffs/')


### Append Together

In [522]:
# append the files by subcat
dd_2_reg = append_the_data('data/player/defensive_dashboard/regular_season/', 'def_dash_2pt_', '2pt')
dd_2_play = append_the_data('data/player/defensive_dashboard/playoffs/', 'def_dash_2pt_', '2pt')
dd_2_tot = pd.concat([dd_2_reg, dd_2_play])
dd_2_tot



Unnamed: 0,def_dash_2pt_unnamed: 0,def_dash_2pt_player,def_dash_2pt_team,def_dash_2pt_age,def_dash_2pt_position,def_dash_2pt_gp,def_dash_2pt_g,def_dash_2pt_freq%,def_dash_2pt_dfgm,def_dash_2pt_dfga,def_dash_2pt_dfg%,def_dash_2pt_fg%,def_dash_2pt_diff%,def_dash_2pt_season,def_dash_2pt_season_type
0,0,,,,,,,,,,,,,2013,Regular
1,1,DeAndre Jordan,LAC,25.0,C,82.0,82.0,91.3,7.2,15.3,46.7,50.0,-3.2,2013,Regular
2,2,Robin Lopez,POR,26.0,C,82.0,82.0,93.2,6.6,14.9,44.3,49.7,-5.4,2013,Regular
3,3,Spencer Hawes,CLE,26.0,C,80.0,80.0,87.5,6.7,14.1,47.6,49.7,-2.1,2013,Regular
4,4,Marcin Gortat,WAS,30.0,C,80.0,80.0,91.7,6.7,14.0,48.1,49.5,-1.5,2013,Regular
...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...
207,207,Kevin Knox II,ATL,22.0,F,2.0,1.0,33.3,0.5,0.5,100.0,55.0,45.0,2021,Playoffs
208,208,Luca Vildoza,MIL,26.0,G,2.0,1.0,33.3,0.0,0.5,0.0,16.7,-16.7,2021,Playoffs
209,209,Zeke Nnaji,DEN,21.0,F-C,2.0,1.0,50.0,0.5,0.5,100.0,53.7,46.3,2021,Playoffs
210,210,Juwan Morgan,BOS,25.0,F,5.0,2.0,33.3,0.2,0.4,50.0,54.8,-4.8,2021,Playoffs


In [523]:
dd_3_reg = append_the_data('data/player/defensive_dashboard/regular_season/', 'def_dash_3pt_', '3pt')
dd_3_play = append_the_data('data/player/defensive_dashboard/playoffs/', 'def_dash_3pt_', '3pt')
dd_3_tot = pd.concat([dd_3_reg, dd_3_play])

lt_6_reg = append_the_data('data/player/defensive_dashboard/regular_season/', 'def_dash_lt6_', 'lt6')
lt_6_play = append_the_data('data/player/defensive_dashboard/playoffs/', 'def_dash_lt6_', 'lt6')
lt_6_tot = pd.concat([lt_6_reg, lt_6_play])

lt_10_reg = append_the_data('data/player/defensive_dashboard/regular_season/', 'def_dash_lt10_', 'lt10')
lt_10_play = append_the_data('data/player/defensive_dashboard/playoffs/', 'def_dash_lt10_', 'lt10')
lt_10_tot = pd.concat([lt_10_reg, lt_10_play])

gt_15_reg = append_the_data('data/player/defensive_dashboard/regular_season/', 'def_dash_gt15_', 'gt15')
gt_15_play = append_the_data('data/player/defensive_dashboard/playoffs/', 'def_dash_gt15_', 'gt15')
gt_15_tot = pd.concat([gt_15_reg, gt_15_play])

dd_overall_reg = append_the_data('data/player/defensive_dashboard/regular_season/', 'def_dash_overall_', 'overall')
dd_overall_play = append_the_data('data/player/defensive_dashboard/playoffs/', 'def_dash_overall_', 'overall')
dd_overall_tot = pd.concat([dd_overall_reg, dd_overall_play])

In [524]:
# check tot dataframe sizes
print(dd_2_tot.shape, dd_3_tot.shape, lt_6_tot.shape, lt_10_tot.shape, gt_15_tot.shape, dd_overall_tot.shape)


(5778, 15) (5413, 15) (6398, 15) (5755, 15) (6488, 15) (6561, 15)


In [526]:
# start with overall
all_def_dash = pd.merge(dd_overall_tot, dd_2_tot,
                                left_on = ['def_dash_overall_player', 'def_dash_overall_team' ,'def_dash_overall_season', 'def_dash_overall_season_type'],
                                right_on = ['def_dash_2pt_player', 'def_dash_2pt_team', 'def_dash_2pt_season', 'def_dash_2pt_season_type'],
                                how = 'left')

all_def_dash2 = pd.merge(all_def_dash, dd_3_tot,
                                left_on = ['def_dash_overall_player', 'def_dash_overall_team' ,'def_dash_overall_season', 'def_dash_overall_season_type'],
                                right_on = ['def_dash_3pt_player', 'def_dash_3pt_team', 'def_dash_3pt_season', 'def_dash_3pt_season_type'],
                                how = 'left')

all_def_dash3 = pd.merge(all_def_dash2, lt_6_tot,
                                left_on = ['def_dash_overall_player', 'def_dash_overall_team' ,'def_dash_overall_season', 'def_dash_overall_season_type'],
                                right_on = ['def_dash_lt6_player', 'def_dash_lt6_team', 'def_dash_lt6_season', 'def_dash_lt6_season_type'],
                                how = 'left')

all_def_dash4 = pd.merge(all_def_dash3, lt_10_tot,
                                left_on = ['def_dash_overall_player', 'def_dash_overall_team' ,'def_dash_overall_season', 'def_dash_overall_season_type'],
                                right_on = ['def_dash_lt10_player', 'def_dash_lt10_team', 'def_dash_lt10_season', 'def_dash_lt10_season_type'],
                                how = 'left')

all_def_dash5 = pd.merge(all_def_dash4, gt_15_tot,
                                left_on = ['def_dash_overall_player', 'def_dash_overall_team' ,'def_dash_overall_season', 'def_dash_overall_season_type'],
                                right_on = ['def_dash_gt15_player', 'def_dash_gt15_team', 'def_dash_gt15_season', 'def_dash_gt15_season_type'],
                                how = 'left')


 init_size = (6567, 30)


In [530]:
all_def_dash5 = pd.merge(all_def_dash4, gt_15_tot,
                                left_on = ['def_dash_overall_player', 'def_dash_overall_team' ,'def_dash_overall_season', 'def_dash_overall_season_type'],
                                right_on = ['def_dash_gt15_player', 'def_dash_gt15_team', 'def_dash_gt15_season', 'def_dash_gt15_season_type'],
                                how = 'left')

print(f' init_size = {all_def_dash5.shape}')

 init_size = (6715, 90)


In [531]:
all_def_dash5.to_csv('data/player/aggregates/All_Defensive_Dashboard.csv', index = False)

### Doubleheader Data Function

Some data has two headers, this is the general code to scrape it:

In [566]:
def get_data_w_double_header(url_list, file_folder, option_numbers,option_names, headers_to_skip): 
        i = 0
        optionz = np.arange(0,option_numbers, 1)
        # get first option
        for u in url_list:
            for option in optionz:
                driver.get(u)
                time.sleep(1)
                # get option xpath
                op = option + 1

                try:
                        xpath_option = '/html/body/div[1]/div[2]/div[2]/div[3]/section[1]/div/div/div[4]/label/div/select/option[' + str(op) + ']' # click option
                        elem = WebDriverWait(driver, 30).until(EC.presence_of_element_located((By.XPATH, xpath_option))).click()
                except:
                        print(f'{u} did not load. Moving to next url.')
                        continue
                
                elem = WebDriverWait(driver, 30).until(EC.presence_of_element_located((By.XPATH, xpath_option)))
                driver.find_element(by=By.XPATH, value=xpath_option).click()
                
                # all pages
                xpath_all = '//*[@id="__next"]/div[2]/div[2]/div[3]/section[2]/div/div[2]/div[2]/div[1]/div[3]/div/label/div/select/option[1]' 
                elem = WebDriverWait(driver, 30).until(EC.presence_of_element_located((By.XPATH, xpath_all)))
                driver.find_element(by=By.XPATH, value=xpath_all).click()
                src = driver.page_source
                parser = BeautifulSoup(src, "lxml")
                table = parser.find("table", attrs = {"class":"Crom_table__p1iZz"})
                headers = table.findAll('th')
                headerlist = [h.text.strip() for h in headers[headers_to_skip:]]  # THIS IS THE DOUBLE HEADER THAT WE ARE SKIPPING
                row_names = table.findAll('a')                             # find rows
                row_list = [b.text.strip() for b in row_names[0:]] 
                rows = table.findAll('tr')[0:] 
                player_stats = [[td.getText().strip() for td in rows[i].findAll('td')[0:]] for i in range(len(rows))]
                tot_cols = len(player_stats[2])                           #set the length to ignore hidden columns
                headerlist = headerlist[:tot_cols]   
                stats = pd.DataFrame(player_stats, columns = headerlist)
                filename = file_folder + str(u[34:]).replace('/', '_') + 'op_' + str(option_names[option]) + '.csv'
                filename = replace_name_values2(filename)
                pd.DataFrame.to_csv(stats, filename)
                i += 1
                lu = len(url_list)
                print(f'{filename} Completed Successfully! {i} / {lu} Complete!')

## Player - Shot Dashboard

In [542]:
player_general = 'https://www.nba.com/stats/players/shots-general/'      # 4 options - Overall, Catch&Shoot, Pullups, Less than 10 feet
player_shotclock = 'https://www.nba.com/stats/players/shots-shotclock/'  # 6 options - 24-22, 22-18, 18-15, 18-15, 15-7, 7-4, 4-0
player_dribbles = 'https://www.nba.com/stats/players/shots-dribbles/'    # 5 options - 0-drib, 1-drib, 2-drib, 3-6-drib, 7-drib
player_touchtime = 'https://www.nba.com/stats/players/shots-touch-time/' # 3 options - 0-2, 2-6, 6+ seconds
player_closest_defender = 'https://www.nba.com/stats/players/shots-closest-defender/' # 4 options - 0-2, 2-4, 4-6, 6+ feet
player_closest_defender_plus10 = 'https://www.nba.com/stats/players/shots-closest-defender-10/' # 4 options - 0-2, 2-4, 4-6, 6+ feet

In [None]:
driver = webdriver.Chrome()

In [543]:
def get_shot_dash_doubleheader(url_list, file_folder, option_numbers,option_names, headers_to_skip): 
        i = 0
        optionz = np.arange(0,option_numbers, 1)
        # get first option
        for u in url_list:
            for option in optionz:
                driver.get(u)
                time.sleep(1)
                # get option xpath
                op = option + 1

                try:
                        xpath_option = '//*[@id="__next"]/div[2]/div[2]/div[3]/section[1]/div/div/div[4]/label/div/select/option[' + str(op) + ']'
                        elem = WebDriverWait(driver, 30).until(EC.presence_of_element_located((By.XPATH, xpath_option)))
                        driver.find_element(by=By.XPATH, value=xpath_option).click()
                except:
                        print(f'{u} did not load. Moving to next url.')
                        continue
                
                
                # all pages
                xpath_all = '//*[@id="__next"]/div[2]/div[2]/div[3]/section[2]/div/div[2]/div[2]/div[1]/div[3]/div/label/div/select/option[1]' 
                elem = WebDriverWait(driver, 30).until(EC.presence_of_element_located((By.XPATH, xpath_all)))
                driver.find_element(by=By.XPATH, value=xpath_all).click()
                src = driver.page_source
                parser = BeautifulSoup(src, "lxml")
                table = parser.find("table", attrs = {"class":"Crom_table__p1iZz"})
                headers = table.findAll('th')
                headerlist = [h.text.strip() for h in headers[headers_to_skip:]]  # THIS IS THE DOUBLE HEADER THAT WE ARE SKIPPING
                row_names = table.findAll('a')                             # find rows
                row_list = [b.text.strip() for b in row_names[0:]] 
                rows = table.findAll('tr')[0:] 
                player_stats = [[td.getText().strip() for td in rows[i].findAll('td')[0:]] for i in range(len(rows))]
                tot_cols = len(player_stats[2])                           #set the length to ignore hidden columns
                headerlist = headerlist[:tot_cols]   
                stats = pd.DataFrame(player_stats, columns = headerlist)
                filename = file_folder + str(u[34:]).replace('/', '_') + 'op_' + str(option_names[option]) + '.csv'
                filename = replace_name_values(filename)
                pd.DataFrame.to_csv(stats, filename)
                i += 1
                lu = len(url_list)
                print(f'{filename} Completed Successfully! {i} / {lu} Complete!')

#### Shot Dashboard -- General

In [545]:
gen_urls = []
years =['2021-22', '2020-21', '2019-20', '2018-19', '2017-18', '2016-17', '2015-16','2014-15', '2013-14']
season_types = ['Regular Season', 'Playoffs']
for year in years:
    for s_types in season_types:
        url = player_general + '?SeasonYear=' + year + '&SeasonType=' + s_types + '&Season=' + year
        gen_urls.append(str(url))

In [546]:
get_shot_dash_doubleheader(gen_urls, 'data/player/shot_dashboard/general/', 4, ['Overall', 'Catch&Shoot', 'Pullups', 'Less_than_10_feet'], 4)

data/player/shot_dashboard/general/shots-general__SeasonYear_2021-22_Regular Season_Season_2021-22op_Overall.csv Completed Successfully! 1 / 18 Complete!
data/player/shot_dashboard/general/shots-general__SeasonYear_2021-22_Regular Season_Season_2021-22op_Catch_Shoot.csv Completed Successfully! 2 / 18 Complete!
data/player/shot_dashboard/general/shots-general__SeasonYear_2021-22_Regular Season_Season_2021-22op_Pullups.csv Completed Successfully! 3 / 18 Complete!
data/player/shot_dashboard/general/shots-general__SeasonYear_2021-22_Regular Season_Season_2021-22op_Less_than_10_feet.csv Completed Successfully! 4 / 18 Complete!
data/player/shot_dashboard/general/shots-general__SeasonYear_2021-22_Playoffs_Season_2021-22op_Overall.csv Completed Successfully! 5 / 18 Complete!
data/player/shot_dashboard/general/shots-general__SeasonYear_2021-22_Playoffs_Season_2021-22op_Catch_Shoot.csv Completed Successfully! 6 / 18 Complete!
data/player/shot_dashboard/general/shots-general__SeasonYear_2021-22_P

In [548]:
shotdash_files = os.listdir('data/player/shot_dashboard/general/')
for file in shotdash_files:
    if '.csv' in file:
        # move to correct folder
        if 'Regular' in file:
            shutil.move('data/player/shot_dashboard/general/' + file, 'data/player/shot_dashboard/general/regular_season/')
        else:
            shutil.move('data/player/shot_dashboard/general/' + file, 'data/player/shot_dashboard/general/playoffs/')

##### Append

In [None]:
# TODO: Append

In [None]:
# TODO: Save to csv

In [None]:
# TODO: Append total file, save to CSV

#### Shot Dashboard -- Shotclock

In [552]:
shotclock_urls = []
for year in years:
    for s_types in season_types:
        url = player_shotclock + '?SeasonYear=' + year + '&SeasonType=' + s_types + '&Season=' + year
        shotclock_urls.append(str(url))

In [558]:
driver = webdriver.Chrome()

In [559]:
get_data_w_double_header(shotclock_urls, 'data/player/shot_dashboard/shotclock/', 5, ['24-22', '22-18', '18-15', '15-7', '7-4', '4-0'], 6)

data/player/shot_dashboard/shotclock/shots-shotclock__2021-22_Regular Season_Season_2021-22op_24-22.csv Completed Successfully! 1 / 18 Complete!
data/player/shot_dashboard/shotclock/shots-shotclock__2021-22_Regular Season_Season_2021-22op_22-18.csv Completed Successfully! 2 / 18 Complete!
data/player/shot_dashboard/shotclock/shots-shotclock__2021-22_Regular Season_Season_2021-22op_18-15.csv Completed Successfully! 3 / 18 Complete!
data/player/shot_dashboard/shotclock/shots-shotclock__2021-22_Regular Season_Season_2021-22op_15-7.csv Completed Successfully! 4 / 18 Complete!
data/player/shot_dashboard/shotclock/shots-shotclock__2021-22_Regular Season_Season_2021-22op_7-4.csv Completed Successfully! 5 / 18 Complete!
data/player/shot_dashboard/shotclock/shots-shotclock__2021-22_Playoffs_Season_2021-22op_24-22.csv Completed Successfully! 6 / 18 Complete!
data/player/shot_dashboard/shotclock/shots-shotclock__2021-22_Playoffs_Season_2021-22op_22-18.csv Completed Successfully! 7 / 18 Complete!


In [560]:
player_general = 'https://www.nba.com/stats/players/shots-general/'      # 4 options - Overall, Catch&Shoot, Pullups, Less than 10 feet
player_shotclock = 'https://www.nba.com/stats/players/shots-shotclock/'  # 6 options - 24-22, 22-18, 18-15, 18-15, 15-7, 7-4, 4-0
player_dribbles = 'https://www.nba.com/stats/players/shots-dribbles/'    # 5 options - 0-drib, 1-drib, 2-drib, 3-6-drib, 7-drib
player_touchtime = 'https://www.nba.com/stats/players/shots-touch-time/' # 3 options - 0-2, 2-6, 6+ seconds
player_closest_defender = 'https://www.nba.com/stats/players/shots-closest-defender/' # 4 options - 0-2, 2-4, 4-6, 6+ feet
player_plus10 = 'https://www.nba.com/stats/players/shots-closest-defender-10/' # 4 options - 0-2, 2-4, 4-6, 6+ feet


#### Shot Dashboard -- Dribbles

In [581]:
dribbles_urls = []
for year in years:
    for s_types in season_types:
        url = player_dribbles + '?SeasonYear=' + year + '&SeasonType=' + s_types + '&Season=' + year
        dribbles_urls.append(str(url))

In [582]:
driver = webdriver.Chrome()

In [583]:
get_data_w_double_header(dribbles_urls, 'data/player/shot_dashboard/dribbles/', 6, ['0-drib', '1-drib', '2-drib', '3-6-drib', '7-drib'], 6)

data/player/shot_dashboard/dribbles/shots-dribbles__2021-22_Regular Season_Season_2021-22op_0-drib.csv Completed Successfully! 1 / 18 Complete!
data/player/shot_dashboard/dribbles/shots-dribbles__2021-22_Regular Season_Season_2021-22op_1-drib.csv Completed Successfully! 2 / 18 Complete!
data/player/shot_dashboard/dribbles/shots-dribbles__2021-22_Regular Season_Season_2021-22op_2-drib.csv Completed Successfully! 3 / 18 Complete!
data/player/shot_dashboard/dribbles/shots-dribbles__2021-22_Regular Season_Season_2021-22op_3-6-drib.csv Completed Successfully! 4 / 18 Complete!
data/player/shot_dashboard/dribbles/shots-dribbles__2021-22_Regular Season_Season_2021-22op_7-drib.csv Completed Successfully! 5 / 18 Complete!
data/player/shot_dashboard/dribbles/shots-dribbles__2021-22_Playoffs_Season_2021-22op_0-drib.csv Completed Successfully! 6 / 18 Complete!
data/player/shot_dashboard/dribbles/shots-dribbles__2021-22_Playoffs_Season_2021-22op_1-drib.csv Completed Successfully! 7 / 18 Complete!
da

In [584]:
# check if dribbles/regular_season folder exists
if not os.path.exists('data/player/shot_dashboard/dribbles/regular_season/'):
    os.makedirs('data/player/shot_dashboard/dribbles/regular_season/')

# check if dribbles/playoffs folder exists
if not os.path.exists('data/player/shot_dashboard/dribbles/playoffs/'):
    os.makedirs('data/player/shot_dashboard/dribbles/playoffs/')
    

In [585]:
# move files to correct folder
dribbles_files = os.listdir('data/player/shot_dashboard/dribbles/')

for file in dribbles_files:
    if '.csv' in file:
        # move to correct folder
        if 'Regular' in file:
            shutil.move('data/player/shot_dashboard/dribbles/' + file, 'data/player/shot_dashboard/dribbles/regular_season/')
        else:
            shutil.move('data/player/shot_dashboard/dribbles/' + file, 'data/player/shot_dashboard/dribbles/playoffs/')

In [586]:
# append the data
drib_1_reg = append_the_data('data/player/shot_dashboard/dribbles/regular_season/', 'sd_drib_0_dribbles__', '0-drib')
drib_2_reg = append_the_data('data/player/shot_dashboard/dribbles/regular_season/', 'sd_drib_1_dribbles__', '1-drib')
drib_3_reg = append_the_data('data/player/shot_dashboard/dribbles/regular_season/', 'sd_drib_2_dribbles__', '2-drib')
drib_4_reg = append_the_data('data/player/shot_dashboard/dribbles/regular_season/', 'sd_drib_3-6_dribbles__', '3-6-drib')
drib_5_reg = append_the_data('data/player/shot_dashboard/dribbles/regular_season/', 'sd_drib_7_dribbles__', '7-drib')

In [587]:
drib_1_play = append_the_data('data/player/shot_dashboard/dribbles/playoffs/', 'sd_drib_0_dribbles__', '0-drib')
drib_2_play = append_the_data('data/player/shot_dashboard/dribbles/playoffs/', 'sd_drib_1_dribbles__', '1-drib')
drib_3_play = append_the_data('data/player/shot_dashboard/dribbles/playoffs/', 'sd_drib_2_dribbles__', '2-drib')
drib_4_play = append_the_data('data/player/shot_dashboard/dribbles/playoffs/', 'sd_drib_3-6_dribbles__', '3-6-drib')
drib_5_play = append_the_data('data/player/shot_dashboard/dribbles/playoffs/', 'sd_drib_7_dribbles__', '7-drib')

In [588]:
drib_1_reg.columns

Index(['sd_drib_0_dribbles__unnamed: 0', 'sd_drib_0_dribbles__player',
       'sd_drib_0_dribbles__team', 'sd_drib_0_dribbles__age',
       'sd_drib_0_dribbles__gp', 'sd_drib_0_dribbles__g',
       'sd_drib_0_dribbles__freq%', 'sd_drib_0_dribbles__fgm',
       'sd_drib_0_dribbles__fga', 'sd_drib_0_dribbles__fg%',
       'sd_drib_0_dribbles__efg%', 'sd_drib_0_dribbles__2fg freq%',
       'sd_drib_0_dribbles__2fgm', 'sd_drib_0_dribbles__2fga',
       'sd_drib_0_dribbles__2fg%', 'sd_drib_0_dribbles__3fg freq%',
       'sd_drib_0_dribbles__3pm', 'sd_drib_0_dribbles__3pa',
       'sd_drib_0_dribbles__3p%', 'sd_drib_0_dribbles__season',
       'sd_drib_0_dribbles__season_type'],
      dtype='object')

In [589]:
all_dribbles_reg1 = pd.merge(drib_1_reg, drib_2_reg,
                                left_on = ['sd_drib_0_dribbles__player', 'sd_drib_0_dribbles__team', 'sd_drib_0_dribbles__season', 'sd_drib_0_dribbles__season_type'],
                                right_on = ['sd_drib_1_dribbles__player', 'sd_drib_1_dribbles__team', 'sd_drib_1_dribbles__season', 'sd_drib_1_dribbles__season_type'],
                                how = 'left')

all_dribbles_reg1

Unnamed: 0,sd_drib_0_dribbles__unnamed: 0,sd_drib_0_dribbles__player,sd_drib_0_dribbles__team,sd_drib_0_dribbles__age,sd_drib_0_dribbles__gp,sd_drib_0_dribbles__g,sd_drib_0_dribbles__freq%,sd_drib_0_dribbles__fgm,sd_drib_0_dribbles__fga,sd_drib_0_dribbles__fg%,...,sd_drib_1_dribbles__2fg freq%,sd_drib_1_dribbles__2fgm,sd_drib_1_dribbles__2fga,sd_drib_1_dribbles__2fg%,sd_drib_1_dribbles__3fg freq%,sd_drib_1_dribbles__3pm,sd_drib_1_dribbles__3pa,sd_drib_1_dribbles__3p%,sd_drib_1_dribbles__season,sd_drib_1_dribbles__season_type
0,0,,,,,,,,,,...,,,,,,,,,2013,Regular
1,0,,,,,,,,,,...,,,,,,,,,2013,Regular
2,1,,,,,,,,,,...,,,,,,,,,2013,Regular
3,1,,,,,,,,,,...,,,,,,,,,2013,Regular
4,2,Kevin Love,MIN,25.0,77.0,77.0,62.0,5.5,10.9,50.5,...,17.6,1.6,3.1,50.0,5.8,0.3,1.0,25.3,2013,Regular
...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...
4643,583,Wayne Selden,NYK,27.0,3.0,1.0,25.0,0.0,0.3,0.0,...,0.0,0.0,0.0,-,25.0,0.3,0.3,100,2021,Regular
4644,584,Craig Sword,WAS,28.0,3.0,1.0,25.0,0.0,0.3,0.0,...,50.0,0.7,0.7,100,0.0,0.0,0.0,-,2021,Regular
4645,585,Sharife Cooper,ATL,21.0,13.0,2.0,21.4,0.0,0.2,0.0,...,0.0,0.0,0.0,-,14.3,0.1,0.1,50.0,2021,Regular
4646,586,Myles Powell,PHI,24.0,11.0,1.0,11.8,0.1,0.2,50.0,...,,,,,,,,,,


In [590]:
all_dribbles_reg2 = pd.merge(all_dribbles_reg1, drib_3_reg,
                                left_on = ['sd_drib_0_dribbles__player', 'sd_drib_0_dribbles__team', 'sd_drib_0_dribbles__season', 'sd_drib_0_dribbles__season_type'],
                                right_on = ['sd_drib_2_dribbles__player', 'sd_drib_2_dribbles__team', 'sd_drib_2_dribbles__season', 'sd_drib_2_dribbles__season_type'],
                                how = 'left')

all_dribbles_reg3 = pd.merge(all_dribbles_reg2, drib_4_reg,
                                left_on = ['sd_drib_0_dribbles__player', 'sd_drib_0_dribbles__team', 'sd_drib_0_dribbles__season', 'sd_drib_0_dribbles__season_type'],
                                right_on = ['sd_drib_3-6_dribbles__player', 'sd_drib_3-6_dribbles__team', 'sd_drib_3-6_dribbles__season', 'sd_drib_3-6_dribbles__season_type'],
                                how = 'left')

all_dribbles_reg4 = pd.merge(all_dribbles_reg3, drib_5_reg,
                                left_on = ['sd_drib_0_dribbles__player', 'sd_drib_0_dribbles__team', 'sd_drib_0_dribbles__season', 'sd_drib_0_dribbles__season_type'],
                                right_on = ['sd_drib_7_dribbles__player', 'sd_drib_7_dribbles__team', 'sd_drib_7_dribbles__season', 'sd_drib_7_dribbles__season_type'],
                                how = 'left')

all_dribbles_reg4

Unnamed: 0,sd_drib_0_dribbles__unnamed: 0,sd_drib_0_dribbles__player,sd_drib_0_dribbles__team,sd_drib_0_dribbles__age,sd_drib_0_dribbles__gp,sd_drib_0_dribbles__g,sd_drib_0_dribbles__freq%,sd_drib_0_dribbles__fgm,sd_drib_0_dribbles__fga,sd_drib_0_dribbles__fg%,...,sd_drib_7_dribbles__2fg freq%,sd_drib_7_dribbles__2fgm,sd_drib_7_dribbles__2fga,sd_drib_7_dribbles__2fg%,sd_drib_7_dribbles__3fg freq%,sd_drib_7_dribbles__3pm,sd_drib_7_dribbles__3pa,sd_drib_7_dribbles__3p%,sd_drib_7_dribbles__season,sd_drib_7_dribbles__season_type
0,0,,,,,,,,,,...,,,,,,,,,2013,Regular
1,0,,,,,,,,,,...,,,,,,,,,2013,Regular
2,0,,,,,,,,,,...,,,,,,,,,2013,Regular
3,0,,,,,,,,,,...,,,,,,,,,2013,Regular
4,0,,,,,,,,,,...,,,,,,,,,2013,Regular
...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...
4895,583,Wayne Selden,NYK,27.0,3.0,1.0,25.0,0.0,0.3,0.0,...,,,,,,,,,,
4896,584,Craig Sword,WAS,28.0,3.0,1.0,25.0,0.0,0.3,0.0,...,,,,,,,,,,
4897,585,Sharife Cooper,ATL,21.0,13.0,2.0,21.4,0.0,0.2,0.0,...,14.3,0.0,0.1,0.0,0.0,0.0,0.0,-,2021,Regular
4898,586,Myles Powell,PHI,24.0,11.0,1.0,11.8,0.1,0.2,50.0,...,29.4,0.3,0.5,60.0,11.8,0.0,0.2,0.0,2021,Regular


In [591]:
# drop nan in player column
all_dribbles_reg4 = all_dribbles_reg4.dropna(subset = ['sd_drib_0_dribbles__player'])
all_dribbles_reg4

Unnamed: 0,sd_drib_0_dribbles__unnamed: 0,sd_drib_0_dribbles__player,sd_drib_0_dribbles__team,sd_drib_0_dribbles__age,sd_drib_0_dribbles__gp,sd_drib_0_dribbles__g,sd_drib_0_dribbles__freq%,sd_drib_0_dribbles__fgm,sd_drib_0_dribbles__fga,sd_drib_0_dribbles__fg%,...,sd_drib_7_dribbles__2fg freq%,sd_drib_7_dribbles__2fgm,sd_drib_7_dribbles__2fga,sd_drib_7_dribbles__2fg%,sd_drib_7_dribbles__3fg freq%,sd_drib_7_dribbles__3pm,sd_drib_7_dribbles__3pa,sd_drib_7_dribbles__3p%,sd_drib_7_dribbles__season,sd_drib_7_dribbles__season_type
32,2,Kevin Love,MIN,25.0,77.0,77.0,62.0,5.5,10.9,50.5,...,0.3,0.0,0.1,75.0,0.0,0.0,0.0,-,2013,Regular
33,3,Ryan Anderson,NOP,26.0,22.0,22.0,68.9,4.8,10.7,44.7,...,,,,,,,,,,
34,4,Al Horford,ATL,28.0,29.0,29.0,74.6,6.5,10.4,61.7,...,0.7,0.0,0.1,0.0,0.0,0.0,0.0,-,2013,Regular
35,5,LaMarcus Aldridge,POR,28.0,69.0,69.0,50.5,5.3,10.1,52.3,...,0.4,0.0,0.1,33.3,0.0,0.0,0.0,-,2013,Regular
36,6,Serge Ibaka,OKC,24.0,81.0,81.0,83.3,5.5,9.7,56.9,...,0.1,0.0,0.0,0.0,0.0,0.0,0.0,-,2013,Regular
...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...
4895,583,Wayne Selden,NYK,27.0,3.0,1.0,25.0,0.0,0.3,0.0,...,,,,,,,,,,
4896,584,Craig Sword,WAS,28.0,3.0,1.0,25.0,0.0,0.3,0.0,...,,,,,,,,,,
4897,585,Sharife Cooper,ATL,21.0,13.0,2.0,21.4,0.0,0.2,0.0,...,14.3,0.0,0.1,0.0,0.0,0.0,0.0,-,2021,Regular
4898,586,Myles Powell,PHI,24.0,11.0,1.0,11.8,0.1,0.2,50.0,...,29.4,0.3,0.5,60.0,11.8,0.0,0.2,0.0,2021,Regular


In [592]:
all_dribbles_reg4.to_csv('data/player/aggregates/ShotDash_Dribbles_Regular_Season.csv', index = False)

In [593]:
all_dribbles_play1 = pd.merge(drib_1_play, drib_2_play,
                                left_on = ['sd_drib_0_dribbles__player', 'sd_drib_0_dribbles__team', 'sd_drib_0_dribbles__season', 'sd_drib_0_dribbles__season_type'],
                                right_on = ['sd_drib_1_dribbles__player', 'sd_drib_1_dribbles__team', 'sd_drib_1_dribbles__season', 'sd_drib_1_dribbles__season_type'],
                                how = 'left')

all_dribbles_play2 = pd.merge(all_dribbles_play1, drib_3_play,
                                left_on = ['sd_drib_0_dribbles__player', 'sd_drib_0_dribbles__team', 'sd_drib_0_dribbles__season', 'sd_drib_0_dribbles__season_type'],
                                right_on = ['sd_drib_2_dribbles__player', 'sd_drib_2_dribbles__team', 'sd_drib_2_dribbles__season', 'sd_drib_2_dribbles__season_type'],
                                how = 'left')

all_dribbles_play3 = pd.merge(all_dribbles_play2, drib_4_play,
                                left_on = ['sd_drib_0_dribbles__player', 'sd_drib_0_dribbles__team', 'sd_drib_0_dribbles__season', 'sd_drib_0_dribbles__season_type'],
                                right_on = ['sd_drib_3-6_dribbles__player', 'sd_drib_3-6_dribbles__team', 'sd_drib_3-6_dribbles__season', 'sd_drib_3-6_dribbles__season_type'],
                                how = 'left')

all_dribbles_play4 = pd.merge(all_dribbles_play3, drib_5_play,
                                left_on = ['sd_drib_0_dribbles__player', 'sd_drib_0_dribbles__team', 'sd_drib_0_dribbles__season', 'sd_drib_0_dribbles__season_type'],
                                right_on = ['sd_drib_7_dribbles__player', 'sd_drib_7_dribbles__team', 'sd_drib_7_dribbles__season', 'sd_drib_7_dribbles__season_type'],
                                how = 'left')

all_dribbles_play4

Unnamed: 0,sd_drib_0_dribbles__unnamed: 0,sd_drib_0_dribbles__player,sd_drib_0_dribbles__team,sd_drib_0_dribbles__age,sd_drib_0_dribbles__gp,sd_drib_0_dribbles__g,sd_drib_0_dribbles__freq%,sd_drib_0_dribbles__fgm,sd_drib_0_dribbles__fga,sd_drib_0_dribbles__fg%,...,sd_drib_7_dribbles__2fg freq%,sd_drib_7_dribbles__2fgm,sd_drib_7_dribbles__2fga,sd_drib_7_dribbles__2fg%,sd_drib_7_dribbles__3fg freq%,sd_drib_7_dribbles__3pm,sd_drib_7_dribbles__3pa,sd_drib_7_dribbles__3p%,sd_drib_7_dribbles__season,sd_drib_7_dribbles__season_type
0,0,,,,,,,,,,...,,,,,,,,,2013,Playoffs
1,0,,,,,,,,,,...,,,,,,,,,2013,Playoffs
2,0,,,,,,,,,,...,,,,,,,,,2013,Playoffs
3,0,,,,,,,,,,...,,,,,,,,,2013,Playoffs
4,0,,,,,,,,,,...,,,,,,,,,2013,Playoffs
...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...
1856,200,Facundo Campazzo,DEN,31.0,4.0,1.0,50.0,0.0,0.3,0.0,...,,,,,,,,,,
1857,201,Malik Fitts,BOS,24.0,9.0,2.0,50.0,0.2,0.2,100.0,...,,,,,,,,,,
1858,202,Jaden Springer,PHI,19.0,5.0,1.0,16.7,0.0,0.2,0.0,...,,,,,,,,,,
1859,203,Isaiah Joe,PHI,22.0,7.0,1.0,20.0,0.0,0.1,0.0,...,,,,,,,,,,


In [594]:
# drop nan in player column
all_dribbles_play4 = all_dribbles_play4.dropna(subset = ['sd_drib_0_dribbles__player'])
all_dribbles_play4

Unnamed: 0,sd_drib_0_dribbles__unnamed: 0,sd_drib_0_dribbles__player,sd_drib_0_dribbles__team,sd_drib_0_dribbles__age,sd_drib_0_dribbles__gp,sd_drib_0_dribbles__g,sd_drib_0_dribbles__freq%,sd_drib_0_dribbles__fgm,sd_drib_0_dribbles__fga,sd_drib_0_dribbles__fg%,...,sd_drib_7_dribbles__2fg freq%,sd_drib_7_dribbles__2fgm,sd_drib_7_dribbles__2fga,sd_drib_7_dribbles__2fg%,sd_drib_7_dribbles__3fg freq%,sd_drib_7_dribbles__3pm,sd_drib_7_dribbles__3pa,sd_drib_7_dribbles__3p%,sd_drib_7_dribbles__season,sd_drib_7_dribbles__season_type
32,2,Al Jefferson,CHA,29.0,3.0,3.0,66.7,6.0,11.3,52.9,...,,,,,,,,,,
33,3,LaMarcus Aldridge,POR,28.0,11.0,11.0,43.2,4.8,9.6,50.5,...,1.6,0.4,0.4,100.0,0.0,0.0,0.0,-,2013,Playoffs
34,4,Marc Gasol,MEM,29.0,7.0,7.0,56.5,4.0,8.7,45.9,...,,,,,,,,,,
35,5,Kevin Durant,OKC,25.0,19.0,19.0,38.3,4.3,8.3,51.3,...,7.5,0.6,1.6,38.7,1.7,0.1,0.4,28.6,2013,Playoffs
36,6,Nene,WAS,31.0,10.0,10.0,63.8,4.0,8.3,48.2,...,,,,,,,,,,
...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...
1856,200,Facundo Campazzo,DEN,31.0,4.0,1.0,50.0,0.0,0.3,0.0,...,,,,,,,,,,
1857,201,Malik Fitts,BOS,24.0,9.0,2.0,50.0,0.2,0.2,100.0,...,,,,,,,,,,
1858,202,Jaden Springer,PHI,19.0,5.0,1.0,16.7,0.0,0.2,0.0,...,,,,,,,,,,
1859,203,Isaiah Joe,PHI,22.0,7.0,1.0,20.0,0.0,0.1,0.0,...,,,,,,,,,,


In [595]:
all_dribbles_reg4.to_csv('data/player/aggregates/ShotDash_Dribbles_Playoffs.csv', index = False)

In [597]:
# concat all dribbles
all_dribbles = pd.concat([all_dribbles_reg4, all_dribbles_play4])
all_dribbles.to_csv('data/player/aggregates/ShotDash_All_Dribbles.csv', index = False)

#### Shot Dashboard -- TouchTime

In [599]:
touchtime_urls = []
for year in years:
    for s_types in season_types:
        url = player_touchtime + '?SeasonYear=' + year + '&SeasonType=' + s_types + '&Season=' + year
        touchtime_urls.append(str(url))

In [602]:
get_data_w_double_header(touchtime_urls, 'data/player/shot_dashboard/touch_time/', 3, ['0-2', '2-6', '6+ seconds'], 6)

data/player/shot_dashboard/touch_time/shots-touch-time__2021-22_Regular Season_Season_2021-22op_0-2.csv Completed Successfully! 1 / 18 Complete!
data/player/shot_dashboard/touch_time/shots-touch-time__2021-22_Regular Season_Season_2021-22op_2-6.csv Completed Successfully! 2 / 18 Complete!
data/player/shot_dashboard/touch_time/shots-touch-time__2021-22_Regular Season_Season_2021-22op_6+ seconds.csv Completed Successfully! 3 / 18 Complete!
data/player/shot_dashboard/touch_time/shots-touch-time__2021-22_Playoffs_Season_2021-22op_0-2.csv Completed Successfully! 4 / 18 Complete!
data/player/shot_dashboard/touch_time/shots-touch-time__2021-22_Playoffs_Season_2021-22op_2-6.csv Completed Successfully! 5 / 18 Complete!
data/player/shot_dashboard/touch_time/shots-touch-time__2021-22_Playoffs_Season_2021-22op_6+ seconds.csv Completed Successfully! 6 / 18 Complete!
data/player/shot_dashboard/touch_time/shots-touch-time__2020-21_Regular Season_Season_2020-21op_0-2.csv Completed Successfully! 7 / 18

In [603]:
# check if regular season or playoffs folders exist
if not os.path.exists('data/player/shot_dashboard/touch_time/regular_season/'):
    os.makedirs('data/player/shot_dashboard/touch_time/regular_season/')
if not os.path.exists('data/player/shot_dashboard/touch_time/playoffs/'):
    os.makedirs('data/player/shot_dashboard/touch_time/playoffs/')

In [604]:
# move files to regular season or playoffs folders
for file in os.listdir('data/player/shot_dashboard/touch_time/'):
    if '.csv' in file:
        if 'Regular' in file:
            shutil.move('data/player/shot_dashboard/touch_time/' + file, 'data/player/shot_dashboard/touch_time/regular_season/' + file)
        elif 'Playoffs' in file:
            shutil.move('data/player/shot_dashboard/touch_time/' + file, 'data/player/shot_dashboard/touch_time/playoffs/' + file)
            

In [None]:
# TODO: Append

# TODO: SAVE


#### Shot Dashboard -- Closest Defenders

In [605]:
closest_defender_urls = []
for year in years:
    for s_types in season_types:
        url = player_closest_defender + '?SeasonYear=' + year + '&SeasonType=' + s_types + '&Season=' + year
        closest_defender_urls.append(str(url))

In [608]:
driver = webdriver.Chrome()

In [609]:
get_data_w_double_header(closest_defender_urls, 'data/player/shot_dashboard/closest_defender/', 4, ['0-2', '2-4', '4-6', '6+ feet'], 6)

data/player/shot_dashboard/closest_defender/shots-closest-defender__2021-22_Regular Season_Season_2021-22op_0-2.csv Completed Successfully! 1 / 18 Complete!
data/player/shot_dashboard/closest_defender/shots-closest-defender__2021-22_Regular Season_Season_2021-22op_2-4.csv Completed Successfully! 2 / 18 Complete!
data/player/shot_dashboard/closest_defender/shots-closest-defender__2021-22_Regular Season_Season_2021-22op_4-6.csv Completed Successfully! 3 / 18 Complete!
data/player/shot_dashboard/closest_defender/shots-closest-defender__2021-22_Regular Season_Season_2021-22op_6+ feet.csv Completed Successfully! 4 / 18 Complete!
data/player/shot_dashboard/closest_defender/shots-closest-defender__2021-22_Playoffs_Season_2021-22op_0-2.csv Completed Successfully! 5 / 18 Complete!
data/player/shot_dashboard/closest_defender/shots-closest-defender__2021-22_Playoffs_Season_2021-22op_2-4.csv Completed Successfully! 6 / 18 Complete!
data/player/shot_dashboard/closest_defender/shots-closest-defender

In [610]:
closest_defender_plus10_urls = []
for year in years:
    for s_types in season_types:
        url = player_closest_defender_plus10 + '?SeasonYear=' + year + '&SeasonType=' + s_types + '&Season=' + year
        closest_defender_plus10_urls.append(str(url))

In [611]:
get_data_w_double_header(closest_defender_plus10_urls, 'data/player/shot_dashboard/closest_defender_plus10/', 4, ['0-2', '2-4', '4-6', '6+ feet'], 6)

data/player/shot_dashboard/closest_defender_plus10/shots-closest-defender-10__2021-22_Regular Season_Season_2021-22op_0-2.csv Completed Successfully! 1 / 18 Complete!
data/player/shot_dashboard/closest_defender_plus10/shots-closest-defender-10__2021-22_Regular Season_Season_2021-22op_2-4.csv Completed Successfully! 2 / 18 Complete!
data/player/shot_dashboard/closest_defender_plus10/shots-closest-defender-10__2021-22_Regular Season_Season_2021-22op_4-6.csv Completed Successfully! 3 / 18 Complete!
data/player/shot_dashboard/closest_defender_plus10/shots-closest-defender-10__2021-22_Regular Season_Season_2021-22op_6+ feet.csv Completed Successfully! 4 / 18 Complete!
data/player/shot_dashboard/closest_defender_plus10/shots-closest-defender-10__2021-22_Playoffs_Season_2021-22op_0-2.csv Completed Successfully! 5 / 18 Complete!
data/player/shot_dashboard/closest_defender_plus10/shots-closest-defender-10__2021-22_Playoffs_Season_2021-22op_2-4.csv Completed Successfully! 6 / 18 Complete!
data/p

TimeoutException: Message: 


##### Combine Closest_Defender data

In [None]:
# combine closest defender dfs
cd2 = append_the_data('data/player/shot_dashboard/closest_defender/0-2ft/', 'closest_D_0-2ft__', '0-2')
cd2 = cdd.dropna(subset=['closest_d_0-2ft__player'])
cd4 = append_the_data('data/player/shot_dashboard/closest_defender/2-4ft/', 'closest_D_2-4ft__', '2-4')
cd4 = cd4.dropna(subset=['closest_d_2-4ft__player'])
cd6 = append_the_data('data/player/shot_dashboard/closest_defender/4-6ft/', 'closest_D_4-6ft__', '4-6')
cd6 = cd6.dropna(subset=['closest_d_4-6ft__player'])
cd10 = append_the_data('data/player/shot_dashboard/closest_defender/6_plusft/', 'closest_D_6_plusft__', '6')
cd10 = cd10.dropna(subset=['closest_d_6_plusft__player'])
cd10

In [None]:
print(f' cd2: {cd2.shape}, cd4: {cd4.shape}, cd6: {cd6.shape}, cd10: {cd10.shape}')

In [None]:
mergecols = ['player', 'season', 'season_type']
mergecols[2]

In [None]:
all_closest_defender = quad_merge(cd2,cd4,cd6,cd10, 'closest_d_0-2ft__', 'closest_d_2-4ft__', 'closest_d_4-6ft__', 'closest_d_6_plusft__')

if os.path.isfile('data/player/shot_dashboard/closest_defender/All_Closest_Defender.csv') is False:
    all_closest_defender.to_csv('data/player/shot_dashboard/closest_defender/All_Closest_Defender.csv')

#### Shotclock Folders

In [613]:
if os.path.isdir('data/player/shot_dashboard/shotclock/regular_season') is False:
    os.makedirs('data/player/shot_dashboard/shotclock/regular_season')
if os.path.isdir('data/player/shot_dashboard/shotclock/playoffs') is False:
    os.makedirs('data/player/shot_dashboard/shotclock/playoffs')


In [None]:
sd1 = append_the_data('data/player/shot_dashboard/shotclock/4-0sec/', 'shotclock_4-0_sec__', '4')
sd1 = sd1.dropna(subset=['shotclock_4-0_sec__player'])
sd2 = append_the_data('data/player/shot_dashboard/shotclock/7-4sec/', 'shotclock_7-4_sec__', '4')
sd2 = sd2.dropna(subset=['shotclock_7-4_sec__player'])
sd3 = append_the_data('data/player/shot_dashboard/shotclock/15-7sec/', 'shotclock_15-7_sec__', '7')
sd3 = sd3.dropna(subset=['shotclock_15-7_sec__player'])
sd4 = append_the_data('data/player/shot_dashboard/shotclock/18-15sec/', 'shotclock_18-15_sec__', '15')
sd4 = sd4.dropna(subset=['shotclock_18-15_sec__player'])
sd5 = append_the_data('data/player/shot_dashboard/shotclock/22-18sec/', 'shotclock_22-18_sec__', '18')
sd5 = sd5.dropna(subset=['shotclock_22-18_sec__player'])
sd6 = append_the_data('data/player/shot_dashboard/shotclock/24-22sec/', 'shotclock_24-22_sec__', '22')
sd6 = sd6.dropna(subset=['shotclock_24-22_sec__player'])

In [None]:
sd1.to_csv('data/player/shot_dashboard/shotclock/4-0sec/ALL_Shotclock_4-0_sec.csv')
sd2.to_csv('data/player/shot_dashboard/shotclock/7-4sec/ALL_Shotclock_7-4_sec.csv')
sd3.to_csv('data/player/shot_dashboard/shotclock/15-7sec/ALL_Shotclock_15-7_sec.csv')
sd4.to_csv('data/player/shot_dashboard/shotclock/18-15sec/ALL_Shotclock_18-15_sec.csv')
sd5.to_csv('data/player/shot_dashboard/shotclock/22-18sec/ALL_Shotclock_22-18_sec.csv')
sd6.to_csv('data/player/shot_dashboard/shotclock/24-22sec/ALL_Shotclock_24-22_sec.csv')

#### Touchtime Folders

In [614]:
if os.path.isdir('data/player/shot_dashboard/touch_time/regular_season') is False:
    os.makedirs('data/player/shot_dashboard/touch_time/regular_season')
if os.path.isdir('data/player/shot_dashboard/touch_time/playoffs') is False:
    os.makedirs('data/player/shot_dashboard/touch_time/playoffs')

In [None]:
# TODO: THIS

touchtime_files = os.listdir('data/player/shot_dashboard/touch_time')

for file in touchtime_files:
    if '.csv' in file:
        if ''
        

In [None]:
tt1 = append_the_data('data/player/shot_dashboard/touch_time/0-2sec/', 'touch_time_0-2_sec__', '2')
tt1 = tt1.dropna(subset=['touch_time_0-2_sec__player'])
tt2 = append_the_data('data/player/shot_dashboard/touch_time/2-6sec/', 'touch_time_2-6_sec__', '6')
tt2 = tt2.dropna(subset=['touch_time_2-6_sec__player'])
tt3 = append_the_data('data/player/shot_dashboard/touch_time/6_plussec/', 'touch_time_6_plus_sec__', '6')
tt3 = tt3.dropna(subset=['touch_time_6_plus_sec__player'])

In [None]:
all_touchtime = tripple_merge(tt1,tt2,tt3, 'touch_time_0-2_sec__', 'touch_time_2-6_sec__', 'touch_time_6_plus_sec__')
all_touchtime

In [None]:
if os.path.isfile('data/player/shot_dashboard/touch_time/All_TouchTime.csv') is False:
    all_touchtime.to_csv('data/player/shot_dashboard/touch_time/All_TouchTime.csv')

# Game-Level Player Stats (Player Box Scores)

In [518]:
def grab_player_data2(url_list, file_folder):    
        
        # Scrape Season-Level player data from the url_list

        i = 0
        for u in url_list:
                
                driver.get(u)
                time.sleep(2)

                # if the page does not load, go to the next in the list
                try:
                        xpath = '//*[@id="__next"]/div[2]/div[2]/div[3]/section[2]/div/div[2]/div[2]/div[1]/div[3]/div/label/div/select/option[1]'
                        elem = WebDriverWait(driver, 30).until(EC.presence_of_element_located((By.XPATH, xpath)))
                except:
                        print(f'{u} did not load. Moving to next url.')
                        continue

                # click "all pages"
                xpath_all = '//*[@id="__next"]/div[2]/div[2]/div[3]/section[2]/div/div[2]/div[2]/div[1]/div[3]/div/label/div/select/option[1]' 
                elem = WebDriverWait(driver, 30).until(EC.presence_of_element_located((By.XPATH, xpath_all)))
                
                driver.find_element(by=By.XPATH, value=xpath_all).click()
                src = driver.page_source
                parser = BeautifulSoup(src, "lxml")
                table = parser.find("table", attrs = {"class":"Crom_table__p1iZz"})
                headers = table.findAll('th')
                headerlist = [h.text.strip() for h in headers[0:]] 
                row_names = table.findAll('a')                             # find rows
                row_list = [b.text.strip() for b in row_names[0:]] 
                rows = table.findAll('tr')[0:]
                player_stats = [[td.getText().strip() for td in rows[i].findAll('td')[0:]] for i in range(len(rows))]
                tot_cols = len(player_stats[1])                           #set the length to ignore hidden columns
                headerlist = headerlist[:tot_cols]   
                stats = pd.DataFrame(player_stats, columns = headerlist)

                # assign filename
                filename = file_folder + str(u[34:]).replace('/', '_') + '.csv'
                filename = replace_name_values2(filename)
                pd.DataFrame.to_csv(stats, filename)
                i += 1
                lu = len(url_list)
                # close driver
                print(f'{filename} Completed Successfully! {i} / {lu} Complete!')

        winsound.Beep(523, 500)

In [150]:
today = datetime.date.today()
print(today)
month = today.month
print(month)

2022-12-21
12


In [170]:
# get months since oct 2022
today = datetime.date.today()
months_since_oct_2022 = (today.year - 2022) * 12 + today.month - 9
this_season_months = np.arange(1, months_since_oct_2022+1, 1)

s_17 = '2017-18'
s_18 = '2018-19'
s_19 = '2019-20'
s_20 = '2020-21'
s_21 = '2021-22'
s_22 = '2022-23'

s_16 = '2016-17'
s_15 = '2015-16'
s_14 = '2014-15'
s_13 = '2013-14'
s_12 = '2012-13'
s_11 = '2011-12'

yearz = ['2022', '2021', '2020', '2019', '2018', '2017', '2016', '2015', '2014', '2013', '2012', '2011']
s_years = [s_22, s_21, s_20, s_19, s_18, s_17, s_16, s_15, s_14, s_13, s_12, s_11]

#months played by year -- the months that HAD GAMES during said season
# October is counted as Month #1 --- Most should go 1-7. 
months_2022 = this_season_months
months_2021 = ['1', '2','3', '4', '5', '6', '7']
months_2020 = ['3', '4', '5', '6', '7','8']
months_2019 = ['1', '2','3', '4', '5', '6']
months_2018 = ['1', '2','3', '4', '5', '6', '7']
months_2017 = ['1', '2','3', '4', '5', '6', '7']
months_2016 = ['1', '2','3', '4', '5', '6', '7']
months_2015 = ['1', '2','3', '4', '5', '6', '7']
months_2014 = ['1', '2','3', '4', '5', '6', '7']
months_2013 = ['1', '2','3', '4', '5', '6', '7']
months_2012 = ['1', '2','3', '4', '5', '6', '7']
months_2011 = ['1', '2','3', '4', '5', '6', '7']



In [171]:
years = ['2022-23','2021-22', '2020-21', '2019-20', '2018-19', '2017-18', '2016-17', '2015-16', '2014-15', '2013-14', '2012-13', '2011-12']
types = ['boxscores-advanced', 'boxscores-traditional']
season_types = ['Regular%20Season', 'Playoffs']


In [172]:
# List all months to scrape
urls1 = []

for m in months_2022:
    for t in types:
        url = 'https://www.nba.com/stats/players/' + str(t) +'/?Season=2022-23&sort=gdate&dir=-1&Month='+str(m) +'&SeasonType=Regular%20Season'
        urls1.append(url)

for m in months_2021:
    for t in types: 
            url = 'https://www.nba.com/stats/players/' + str(t) +'/?Season=2021-22&sort=gdate&dir=-1&Month='+str(m) +'&SeasonType=Regular%20Season'
            urls1.append(url)

for m in months_2020:
    for t in types:
            url = 'https://www.nba.com/stats/players/' + str(t) +'/?Season=2020-21&sort=gdate&dir=-1&Month='+str(m) +'&SeasonType=Regular%20Season'
            urls1.append(url)

for m in months_2019: 
    for t in types:
            url = 'https://www.nba.com/stats/players/' + str(t) +'/?Season=2019-20&sort=gdate&dir=-1&Month='+str(m) +'&SeasonType=Regular%20Season'
            urls1.append(url)

for m in months_2018:
    for t in types:
            url = 'https://www.nba.com/stats/players/' + str(t) +'/?Season=2018-19&sort=gdate&dir=-1&Month='+str(m) +'&SeasonType=Regular%20Season'
            urls1.append(url)

for m in months_2017:
    for t in types:
            url = 'https://www.nba.com/stats/players/' + str(t) +'/?Season=2017-18&sort=gdate&dir=-1&Month='+str(m) +'&SeasonType=Regular%20Season'
            urls1.append(url)

for m in months_2016:
    for t in types:
            url = 'https://www.nba.com/stats/players/' + str(t) +'/?Season=2016-17&sort=gdate&dir=-1&Month='+str(m) +'&SeasonType=Regular%20Season'
            urls1.append(url)

for m in months_2015:
    for t in types:
            url = 'https://www.nba.com/stats/players/' + str(t) +'/?Season=2015-16&sort=gdate&dir=-1&Month='+str(m) +'&SeasonType=Regular%20Season'
            urls1.append(url)

for m in months_2014:
    for t in types:
            url = 'https://www.nba.com/stats/players/' + str(t) +'/?Season=2014-15&sort=gdate&dir=-1&Month='+str(m) +'&SeasonType=Regular%20Season'
            urls1.append(url)

for m in months_2013:
    for t in types:
            url = 'https://www.nba.com/stats/players/' + str(t) +'/?Season=2013-14&sort=gdate&dir=-1&Month='+str(m) +'&SeasonType=Regular%20Season'
            urls1.append(url)

for m in months_2012:
    for t in types:
            url = 'https://www.nba.com/stats/players/' + str(t) +'/?Season=2012-13&sort=gdate&dir=-1&Month='+str(m) +'&SeasonType=Regular%20Season'
            urls1.append(url)

lendo = len(urls1)
lendo

142

In [173]:
# get list of files in player/box_scores
files = os.listdir('data/player/box_scores')

# only csv files
files = [f for f in files if f.endswith('.csv')]
files


['boxscores-advanced__Season_2021-22_Month_1_Regular.csv',
 'boxscores-advanced__Season_2021-22_Month_2_Regular.csv',
 'boxscores-advanced__Season_2021-22_Month_3_Regular.csv',
 'boxscores-advanced__Season_2022-23_Month_1_Regular.csv',
 'boxscores-advanced__Season_2022-23_Month_2_Regular.csv',
 'boxscores-advanced__Season_2022-23_Month_3_Regular.csv',
 'boxscores-traditional__Season_2021-22_Month_1_Regular.csv',
 'boxscores-traditional__Season_2021-22_Month_2_Regular.csv',
 'boxscores-traditional__Season_2021-22_Month_3_Regular.csv',
 'boxscores-traditional__Season_2022-23_Month_1_Regular.csv',
 'boxscores-traditional__Season_2022-23_Month_2_Regular.csv',
 'boxscores-traditional__Season_2022-23_Month_3_Regular.csv']

In [174]:
driver = webdriver.Chrome()

In [175]:
grab_player_data2(urls1, 'data/player/box_scores/')

data/player/box_scores/boxscores-advanced__Season_2022-23_Month_1_Regular.csv Completed Successfully! 1 / 142 Complete!
data/player/box_scores/boxscores-traditional__Season_2022-23_Month_1_Regular.csv Completed Successfully! 2 / 142 Complete!
data/player/box_scores/boxscores-advanced__Season_2022-23_Month_2_Regular.csv Completed Successfully! 3 / 142 Complete!
data/player/box_scores/boxscores-traditional__Season_2022-23_Month_2_Regular.csv Completed Successfully! 4 / 142 Complete!
data/player/box_scores/boxscores-advanced__Season_2022-23_Month_3_Regular.csv Completed Successfully! 5 / 142 Complete!
data/player/box_scores/boxscores-traditional__Season_2022-23_Month_3_Regular.csv Completed Successfully! 6 / 142 Complete!
data/player/box_scores/boxscores-advanced__Season_2021-22_Month_1_Regular.csv Completed Successfully! 7 / 142 Complete!
data/player/box_scores/boxscores-traditional__Season_2021-22_Month_1_Regular.csv Completed Successfully! 8 / 142 Complete!
data/player/box_scores/boxsc

### Append boxscore data

In [177]:
all_boxes = append_the_data('data/player/box_scores', 'trad_', 'traditional')
all_boxes

Unnamed: 0,trad_unnamed: 0,trad_player,trad_team,trad_match up,trad_game date,trad_w/l,trad_min,trad_pts,trad_fgm,trad_fga,...,trad_dreb,trad_reb,trad_ast,trad_stl,trad_blk,trad_tov,trad_pf,trad_+/-,trad_season,trad_season_type
0,0,,,,,,,,,,...,,,,,,,,,2012,Regular
1,1,Damian Lillard,POR,POR vs. LAL,10/31/2012,W,35.0,23.0,7.0,17.0,...,3.0,3.0,11.0,1.0,0.0,6.0,2.0,16.0,2012,Regular
2,2,Kobe Bryant,LAL,LAL @ POR,10/31/2012,L,38.0,30.0,10.0,20.0,...,4.0,6.0,3.0,0.0,0.0,7.0,5.0,9.0,2012,Regular
3,3,Nicolas Batum,POR,POR vs. LAL,10/31/2012,W,40.0,26.0,9.0,16.0,...,2.0,6.0,1.0,3.0,1.0,1.0,2.0,0.0,2012,Regular
4,4,Al Jefferson,UTA,UTA vs. DAL,10/31/2012,W,29.0,12.0,4.0,11.0,...,9.0,14.0,2.0,3.0,0.0,1.0,2.0,8.0,2012,Regular
...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...
2960,2960,Killian Hayes,DET,DET vs. DAL,12/01/2022,W,33.0,22.0,10.0,13.0,...,4.0,4.0,8.0,0.0,0.0,2.0,5.0,9.0,2022,Regular
2961,2961,Christian Wood,DAL,DAL @ DET,12/01/2022,L,35.0,25.0,10.0,13.0,...,7.0,8.0,3.0,0.0,0.0,2.0,3.0,-9.0,2022,Regular
2962,2962,Marvin Bagley III,DET,DET vs. DAL,12/01/2022,W,34.0,19.0,7.0,10.0,...,9.0,13.0,2.0,1.0,0.0,1.0,3.0,10.0,2022,Regular
2963,2963,Tim Hardaway Jr.,DAL,DAL @ DET,12/01/2022,L,40.0,26.0,9.0,20.0,...,3.0,5.0,2.0,2.0,1.0,0.0,4.0,0.0,2022,Regular


In [178]:
all_boxes.to_csv('data/player/aggregates/trad_box_scores.csv', index=False)

In [180]:
all_boxes_adv = append_the_data('data/player/box_scores', 'adv_', 'advanced')
all_boxes_adv

Unnamed: 0,adv_unnamed: 0,adv_player,adv_team,adv_match up,adv_game date,adv_w/l,adv_min,adv_offrtg,adv_defrtg,adv_netrtg,...,adv_dreb%,adv_reb%,adv_to ratio,adv_efg%,adv_ts%,adv_usg%,adv_pace,adv_pie,adv_season,adv_season_type
0,0,,,,,,,,,,...,,,,,,,,,2012,Regular
1,1,Wesley Matthews,POR,POR vs. LAL,10/31/2012,W,37.0,118.9,113.7,5.2,...,7.1,3.2,0.0,81.8,80.6,15.5,95.74,14.3,2012,Regular
2,2,Sasha Pavlovic,POR,POR vs. LAL,10/31/2012,W,18.0,132.4,88.9,43.5,...,0.0,3.2,0.0,87.5,78.8,10.8,92.90,14.0,2012,Regular
3,3,Steve Blake,LAL,LAL @ POR,10/31/2012,L,28.0,117.0,107.3,9.7,...,8.0,6.4,0.0,87.5,87.5,6.5,93.41,13.6,2012,Regular
4,4,Brandan Wright,DAL,DAL @ UTA,10/31/2012,L,26.0,107.8,110.2,-2.4,...,8.1,5.1,9.1,87.5,84.5,18.2,92.72,16.9,2012,Regular
...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...
2960,2960,Jalen Duren,DET,DET vs. DAL,12/01/2022,W,18.0,105.1,109.8,-4.6,...,6.7,6.5,10.0,71.4,62.8,23.8,104.35,10.2,2022,Regular
2961,2961,Christian Wood,DAL,DAL @ DET,12/01/2022,L,35.0,104.3,115.7,-11.4,...,20.6,12.5,9.5,80.8,79.9,25.0,96.46,18.9,2022,Regular
2962,2962,Killian Hayes,DET,DET vs. DAL,12/01/2022,W,33.0,131.3,117.2,14.1,...,12.9,7.0,8.7,84.6,84.6,20.3,93.14,17.0,2022,Regular
2963,2963,Dwight Powell,DAL,DAL @ DET,12/01/2022,L,18.0,129.7,135.1,-5.4,...,7.7,3.6,44.4,100.0,73.5,23.1,101.26,-3.1,2022,Regular


In [181]:
all_boxes_adv.to_csv('data/player/aggregates/adv_box_scores.csv', index=False)

In [183]:
#merge
all_da_boxes = pd.merge(all_boxes, all_boxes_adv, 
                        left_on=['trad_player', 'trad_team', 'trad_game date'],
                        right_on=['adv_player', 'adv_team', 'adv_game date'],
                        how = 'left')

In [184]:
all_da_boxes = all_da_boxes.dropna(subset = ['trad_player'])
all_da_boxes

Unnamed: 0,trad_unnamed: 0,trad_player,trad_team,trad_match up,trad_game date,trad_w/l,trad_min,trad_pts,trad_fgm,trad_fga,...,adv_dreb%,adv_reb%,adv_to ratio,adv_efg%,adv_ts%,adv_usg%,adv_pace,adv_pie,adv_season,adv_season_type
71,1,Damian Lillard,POR,POR vs. LAL,10/31/2012,W,35.0,23.0,7.0,17.0,...,10.7,4.9,16.2,44.1,56.0,32.1,95.36,16.3,2012,Regular
72,2,Kobe Bryant,LAL,LAL @ POR,10/31/2012,L,38.0,30.0,10.0,20.0,...,10.3,8.6,21.2,60.0,65.0,31.9,99.16,10.3,2012,Regular
73,3,Nicolas Batum,POR,POR vs. LAL,10/31/2012,W,40.0,26.0,9.0,16.0,...,6.1,8.2,4.8,65.6,68.1,21.1,99.17,14.8,2012,Regular
74,4,Al Jefferson,UTA,UTA vs. DAL,10/31/2012,W,29.0,12.0,4.0,11.0,...,36.0,23.0,6.3,36.4,47.0,18.4,97.68,15.9,2012,Regular
75,5,Paul Millsap,UTA,UTA vs. DAL,10/31/2012,W,33.0,13.0,5.0,12.0,...,22.6,21.1,0.0,41.7,47.2,16.1,96.97,12.9,2012,Regular
...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...
266339,2960,Killian Hayes,DET,DET vs. DAL,12/01/2022,W,33.0,22.0,10.0,13.0,...,12.9,7.0,8.7,84.6,84.6,20.3,93.14,17.0,2022,Regular
266340,2961,Christian Wood,DAL,DAL @ DET,12/01/2022,L,35.0,25.0,10.0,13.0,...,20.6,12.5,9.5,80.8,79.9,25.0,96.46,18.9,2022,Regular
266341,2962,Marvin Bagley III,DET,DET vs. DAL,12/01/2022,W,34.0,19.0,7.0,10.0,...,28.1,21.3,6.3,75.0,75.2,17.7,93.97,15.1,2022,Regular
266342,2963,Tim Hardaway Jr.,DAL,DAL @ DET,12/01/2022,L,40.0,26.0,9.0,20.0,...,7.7,6.8,0.0,60.0,59.7,24.4,96.64,9.9,2022,Regular


In [186]:
all_da_boxes.to_csv('data/player/aggregates/Trad&Adv_box_scores_GameView.csv')

# In-Shot Dashboard
## Player - Shooting (in shot dashboard)

End Result: All CSVs in player/shot_dashboard/shooting

In [187]:
player_shooting = 'https://www.nba.com/stats/players/shooting/'
shooting_urls = []
years =['2021-22', '2020-21', '2019-20', '2018-19', '2017-18', '2016-17', '2015-16']

for year in years:
    for s_types in season_types:
        url = player_shooting + '?Season=' + year + '&SeasonType=' + s_types
        shooting_urls.append(str(url))

In [188]:
shooting_urls

['https://www.nba.com/stats/players/shooting/?Season=2021-22&SeasonType=Regular%20Season',
 'https://www.nba.com/stats/players/shooting/?Season=2021-22&SeasonType=Playoffs',
 'https://www.nba.com/stats/players/shooting/?Season=2020-21&SeasonType=Regular%20Season',
 'https://www.nba.com/stats/players/shooting/?Season=2020-21&SeasonType=Playoffs',
 'https://www.nba.com/stats/players/shooting/?Season=2019-20&SeasonType=Regular%20Season',
 'https://www.nba.com/stats/players/shooting/?Season=2019-20&SeasonType=Playoffs',
 'https://www.nba.com/stats/players/shooting/?Season=2018-19&SeasonType=Regular%20Season',
 'https://www.nba.com/stats/players/shooting/?Season=2018-19&SeasonType=Playoffs',
 'https://www.nba.com/stats/players/shooting/?Season=2017-18&SeasonType=Regular%20Season',
 'https://www.nba.com/stats/players/shooting/?Season=2017-18&SeasonType=Playoffs',
 'https://www.nba.com/stats/players/shooting/?Season=2016-17&SeasonType=Regular%20Season',
 'https://www.nba.com/stats/players/sho

In [192]:
def get_scoring2(url_list, file_folder, option_numbers, option_names): 

        # Function scrapes data from nba.com/stats/players/shooting/ and saves to csv files 
        # Scrapes both 5-ft and by-zone shooting data

        i = 0
        optionz = np.arange(0,option_numbers, 1)

        # get first option
        for u in url_list:
            for option in optionz:
                driver.get(u)
                time.sleep(1)
                # get option xpath
                op = option + 1
                print(f'Getting {option_names[option]} for {u}...')
                print(f' option is {op}')

                try:
                        # click OPTION
                        xpath_option = '//*[@id="__next"]/div[2]/div[2]/div[3]/section[2]/div/div[2]/div[2]/div[1]/div[3]/div/label/div/select/option[' + str(op) + ']' 
                        elem = WebDriverWait(driver, 30).until(EC.presence_of_element_located((By.XPATH, xpath_option)))
                except:
                        print(f'{u} did not load. Moving to next url.')
                        continue
                
                driver.find_element(by=By.XPATH, value=xpath_option).click()

                # all pages
                xpath_all = '//*[@id="__next"]/div[2]/div[2]/div[3]/section[2]/div/div[2]/div[2]/div[1]/div[3]/div/label/div/select/option[1]' 
                elem = WebDriverWait(driver, 30).until(EC.presence_of_element_located((By.XPATH, xpath_all)))
                driver.find_element(by=By.XPATH, value=xpath_all).click()

                src = driver.page_source
                parser = BeautifulSoup(src, "lxml")

                

                # if there are repreated headers in headerlist, delete them
                #headerlist = [i for n, i in enumerate(headerlist) if i not in headerlist[:n]]

                # deal with options
                
                if op == 1:
                        # Find Table
                        table = parser.find("table", attrs = {"class":"Crom_table__p1iZz"})
                        headers = table.findAll('th')
                        # Skip first header
                        headerlist = [h.text.strip() for h in headers[:]]  

                        # Find rows, cols, stats
                        row_names = table.findAll('a')                      
                        row_list = [b.text.strip() for b in row_names[0:]] 
                        rows = table.findAll('tr')[0:] 
                        player_stats = [[td.getText().strip() for td in rows[i].findAll('td')[0:]] for i in range(len(rows))]

                        cols = ['player', 'team', 'age', 'lessthan5_FGM', 'lessthan5_FGA', 'lessthan5_FG%', '5-9ft_FGM', '5-9ft_FGA', '5-9ft_FG%',
                        '10-14ft_FGM', '10-14ft_FGA', '10-14ft_FG%', '15-19ft_FGM', '15-19ft_FGA', '15-19ft_FG%', '20-24ft_FGM', '20-24ft_FGA',
                        '20-24ft_FG%', '25-29ft_FGM', '25-29ft_FGA', '25-29ft_FG%']

                        #set the length to ignore hidden columns
                        tot_cols = 21                        
                        headerlist = headerlist[:tot_cols]

                        stats = pd.DataFrame(player_stats, columns = cols)

                        filename = file_folder + str(u[34:]).replace('/', '_') + 'op_' + str(option_names[option]) + '.csv'
                        filename = replace_name_values(filename)
                        pd.DataFrame.to_csv(stats, filename)

                        i += 1
                        lu = len(url_list)
                        print(f'{filename} Completed Successfully! {i} / {lu} Complete!')
                        
                elif op == 2: 
                        print('option 2 - do nothing')

                elif op == 3:
                        

                        u2 = u + '&DistanceRange=By+Zone'
                        driver.get(u2)
                        time.sleep(1)
                        src = driver.page_source
                        parser = BeautifulSoup(src, "lxml")

                        try:
                                # click OPTION
                                xpath_option = '//*[@id="__next"]/div[2]/div[2]/div[3]/section[2]/div/div[2]/div[2]/div[1]/div[3]/div/label/div/select/option[' + str(op) + ']' 
                                elem = WebDriverWait(driver, 30).until(EC.presence_of_element_located((By.XPATH, xpath_option)))
                        except:
                                print(f'{u} did not load. Moving to next url.')
                                continue
                        
                        driver.find_element(by=By.XPATH, value=xpath_option).click()

                        # all pages
                        xpath_all = '//*[@id="__next"]/div[2]/div[2]/div[3]/section[2]/div/div[2]/div[2]/div[1]/div[3]/div/label/div/select/option[1]' 
                        elem = WebDriverWait(driver, 30).until(EC.presence_of_element_located((By.XPATH, xpath_all)))
                        driver.find_element(by=By.XPATH, value=xpath_all).click()

                        src = driver.page_source
                        parser = BeautifulSoup(src, "lxml")

                        # Find Table
                        table = parser.find("table", attrs = {"class":"Crom_table__p1iZz"})
                        headers = table.findAll('th')
                        # Skip first header
                        headerlist = [h.text.strip() for h in headers[:]]  

                        # Find rows, cols, stats
                        row_names = table.findAll('a')                      
                        row_list = [b.text.strip() for b in row_names[0:]] 
                        rows = table.findAll('tr')[0:] 
                        player_stats = [[td.getText().strip() for td in rows[i].findAll('td')[0:]] for i in range(len(rows))]

                        cols1 = ['player', 'team', 'age', 'restricted_fgm', 'restricted_fga', 'restricted_fg%', 'paint_fgm', 'paint_fga', 'paint_fg%',
                                'mid_range_fgm', 'mid_range_fga', 'mid_range_fg%', 'left_corn3_fgm', 'left_corn3_fga', 'left_corn3_fg%', 'right_corn3_fgm',
                                'right_corn3_fga', 'right_corn3_fg%', 'corner3_fgm', 'corner3_fga', 'corner3_fg%', 'above_break3_fgm', 'above_break3_fga', 'above_break3_fg%']
                        
                        #set the length to ignore hidden columns
                        tot_cols = 24                      
                        headerlist = headerlist[:tot_cols]
                        stats = pd.DataFrame(player_stats, columns = cols1)
                        filename = file_folder + str(u[34:]).replace('/', '_') + 'op_' + str(option_names[option]) + '.csv'
                        filename = replace_name_values2(filename)
                        pd.DataFrame.to_csv(stats, filename)
                        i += 1
                        lu = len(url_list)
                        print(f'{filename} Completed Successfully! {i} / {lu} Complete!')

In [193]:
driver = webdriver.Chrome()

In [194]:
get_scoring2(shooting_urls, 'data/player/shot_dashboard/shooting/', 3, ['5ft_range', '8ft_range', 'by_zone']) 

Getting 5ft_range for https://www.nba.com/stats/players/shooting/?Season=2021-22&SeasonType=Regular%20Season...
 option is 1
data/player/shot_dashboard/shooting/shooting__Season_2021-22_Regular_20Seasonop_5ft_range.csv Completed Successfully! 1 / 14 Complete!
Getting 8ft_range for https://www.nba.com/stats/players/shooting/?Season=2021-22&SeasonType=Regular%20Season...
 option is 2
option 2 - do nothing
Getting by_zone for https://www.nba.com/stats/players/shooting/?Season=2021-22&SeasonType=Regular%20Season...
 option is 3
data/player/shot_dashboard/shooting/shooting__Season_2021-22_Regularop_by_zone.csv Completed Successfully! 2 / 14 Complete!
Getting 5ft_range for https://www.nba.com/stats/players/shooting/?Season=2021-22&SeasonType=Playoffs...
 option is 1
data/player/shot_dashboard/shooting/shooting__Season_2021-22_Playoffsop_5ft_range.csv Completed Successfully! 3 / 14 Complete!
Getting 8ft_range for https://www.nba.com/stats/players/shooting/?Season=2021-22&SeasonType=Playoffs..

In [532]:
shootin_files = os.listdir('data/player/shot_dashboard/shooting')

#### Append Shooting (Not Done Dec22)

In [534]:
shot1 = append_the_data('data/player/shot_dashboard/shooting/', 'shooting_5ft__', '5ft')
shot1 = shot1.dropna(subset=['shooting_5ft__player'])
shot1 = shot1.dropna(subset=['shooting_5ft__lessthan5_fgm'])
shot1.to_csv('data/player/aggregates/ShotDash_Shooting_5ft.csv')


In [535]:
shot3 = append_the_data('data/player/shot_dashboard/shooting/', 'shooting_by_zone__', 'zone')
shot3 = shot3.dropna(subset=['shooting_by_zone__player'])
shot3 = shot3.dropna(subset=['shooting_by_zone__restricted_fgm'])
shot3.to_csv('data/player/aggregates/ShotDash_Shooting_by_zone.csv')

In [536]:
shootingz = double_merge(shot1, shot3, 'shooting_5ft__', 'shooting_by_zone__',)
shootingz.to_csv('data/player/aggregates/ShotDash_All_Shooting.csv')

## Player - Opponent Shooting  (ISD) (Done 12.22)

ISD = In Shot Dashboard

In [198]:
player_opp_shooting = 'https://www.nba.com/stats/players/opponent-shooting/'
opp_shooting_urls = []
years =['2021-22', '2020-21', '2019-20', '2018-19', '2017-18', '2016-17', '2015-16',]

for year in years:
    for s_types in season_types:
        url = player_opp_shooting + '?Season=' + year + '&SeasonType=' + s_types
        opp_shooting_urls.append(str(url))

In [199]:
get_scoring2(opp_shooting_urls, 'data/player/shot_dashboard/opponent_shooting/', 3, ['5ft_range', '8ft_range', 'by_zone'])

Getting 5ft_range for https://www.nba.com/stats/players/opponent-shooting/?Season=2021-22&SeasonType=Regular%20Season...
 option is 1
data/player/shot_dashboard/opponent_shooting/opponent-shooting__Season_2021-22_Regular_20Seasonop_5ft_range.csv Completed Successfully! 1 / 14 Complete!
Getting 8ft_range for https://www.nba.com/stats/players/opponent-shooting/?Season=2021-22&SeasonType=Regular%20Season...
 option is 2
option 2 - do nothing
Getting by_zone for https://www.nba.com/stats/players/opponent-shooting/?Season=2021-22&SeasonType=Regular%20Season...
 option is 3
data/player/shot_dashboard/opponent_shooting/opponent-shooting__Season_2021-22_Regularop_by_zone.csv Completed Successfully! 2 / 14 Complete!
Getting 5ft_range for https://www.nba.com/stats/players/opponent-shooting/?Season=2021-22&SeasonType=Playoffs...
 option is 1
data/player/shot_dashboard/opponent_shooting/opponent-shooting__Season_2021-22_Playoffsop_5ft_range.csv Completed Successfully! 3 / 14 Complete!
Getting 8ft_

#### Append Opp Shooting

In [538]:
opshot1 = append_the_data('data/player/shot_dashboard/opponent_shooting/', 'opp_shooting_5ft__', '5ft')
opshot1 = opshot1.dropna(subset=['opp_shooting_5ft__player'])
opshot1 = opshot1.dropna(subset=['opp_shooting_5ft__lessthan5_fgm'])

opshot1.to_csv('data/player/aggregates/ShotDash_Opponent_Shooting_5ft.csv')

In [539]:
opshot3 = append_the_data('data/player/shot_dashboard/opponent_shooting/', 'opp_shooting_by_zone__', 'zone')
opshot3 = opshot3.dropna(subset=['opp_shooting_by_zone__player'])
opshot3 = opshot3.dropna(subset=['opp_shooting_by_zone__restricted_fgm'])

opshot3.to_csv('data/player/aggregates/ShotDash_Opponent_Shooting_by_zone.csv')

In [540]:
opshootinz = double_merge(opshot1, opshot3, 'opp_shooting_5ft__', 'opp_shooting_by_zone__',)
opshootinz.to_csv('data/player/aggregates/ShotDash_All_Opponent_Shooting.csv')

### ALL Shot Dashboard Data -- Combine (NOTE: AM I MISSING SOME? Dec22)

In [None]:
closest_def = pd.read_csv('data/player/shot_dashboard/closest_defender/All_Closest_Defender.csv')
dribblez = pd.read_csv('data/player/shot_dashboard/dribbles/All_Dribbles.csv')
generall = pd.read_csv('data/player/shot_dashboard/general/All_shotdash_General.csv')
opp_shootingz = pd.read_csv('data/player/shot_dashboard/oppoonent_shooting/All_Opp_Shooting.csv')
shootingz = pd.read_csv('data/player/shot_dashboard/shooting/All_Shooting.csv')
shotclockz = pd.read_csv('data/player/shot_dashboard/shotclock/All_ShotClock.csv')
touchtimez = pd.read_csv('data/player/shot_dashboard/touch_time/All_TouchTime.csv')


In [None]:
print(f' closest defender: {closest_def.columns[2]}, {closest_def.shape} \n drib: {dribblez.columns[2]}, {dribblez.shape} \n general: {generall.columns[2]}, {generall.shape} \n opp_shooting: {opp_shootingz.columns[2]}, {opp_shootingz.shape} \n shooting: {shootingz.columns[2]}, {shootingz.shape} \n shotclock: {shotclockz.columns[2]}, {shotclockz.shape} \n touchtime: {touchtimez.columns[2]}, {touchtimez.shape}')

In [None]:
print(f' closest def nas: {closest_def.isnull().sum()} \n drib nas: {dribblez.isnull().sum()} \n general nas: {generall.isnull().sum()} \n opp_shooting nas: {opp_shootingz.isnull().sum()} \n shooting nas: {shootingz.isnull().sum()} \n shotclock nas: {shotclockz.isnull().sum()} \n touchtime nas: {touchtimez.isnull().sum()}')

In [None]:
closest_def['closest_d_0-2ft__season'] = closest_def['closest_d_0-2ft__season'].astype(int)
dribblez['sd_drib_0_dribbles__season'] = dribblez['sd_drib_0_dribbles__season'].astype(int)
generall['shot_dash_gen_overall__season'] = generall['shot_dash_gen_overall__season'].astype(int)
opp_shootingz['opp_shooting_5ft__season'] = opp_shootingz['opp_shooting_5ft__season'].astype(int)
shootingz['shooting_5ft__season'] = shootingz['shooting_5ft__season'].astype(int)
shotclockz['shotclock_4-0_sec__season'] = shotclockz['shotclock_4-0_sec__season'].astype(int)
touchtimez['touch_time_0-2_sec__season'] = touchtimez['touch_time_0-2_sec__season'].astype(int)

In [None]:
first_six = sextuple_merge(closest_def, dribblez, generall, opp_shootingz, shootingz, shotclockz, 'closest_d_0-2ft__', 'sd_drib_0_dribbles__', 'shot_dash_gen_overall__', 'opp_shooting_5ft__', 'shooting_5ft__' ,'shotclock_4-0_sec__' )


In [None]:
all = pd.merge(first_six, touchtimez.drop_duplicates(subset = ['touch_time_0-2_sec__player', 'touch_time_0-2_sec__season', 'touch_time_0-2_sec__season_type']), 
                left_on = ['closest_d_0-2ft__player', 'closest_d_0-2ft__season', 'closest_d_0-2ft__season_type'], 
                right_on = ['touch_time_0-2_sec__player', 'touch_time_0-2_sec__season', 'touch_time_0-2_sec__season_type'])
all

In [None]:
all.to_csv('data/player/shot_dashboard/All_shot_dashboard.csv')

## Player - Hustle (Done 12.23)

In [616]:
player_hustle = 'https://www.nba.com/stats/players/hustle/'
hustle_urls = []
years =['2021-22', '2020-21', '2019-20', '2018-19', '2017-18', '2016-17']

for year in years:
    for s_types in season_types:
        url = player_hustle + '?Season=' + year + '&SeasonType=' + s_types
        hustle_urls.append(str(url))


In [618]:
driver = webdriver.Chrome()

In [622]:
grab_player_data(hustle_urls, 'data/player/hustle/')

data/player/hustle/hustle__Season_2021-22_Regular Season.csv Completed Successfully! 1 / 12 Complete!
data/player/hustle/hustle__Season_2021-22_Playoffs.csv Completed Successfully! 2 / 12 Complete!
data/player/hustle/hustle__Season_2020-21_Regular Season.csv Completed Successfully! 3 / 12 Complete!
data/player/hustle/hustle__Season_2020-21_Playoffs.csv Completed Successfully! 4 / 12 Complete!
data/player/hustle/hustle__Season_2019-20_Regular Season.csv Completed Successfully! 5 / 12 Complete!
data/player/hustle/hustle__Season_2019-20_Playoffs.csv Completed Successfully! 6 / 12 Complete!
data/player/hustle/hustle__Season_2018-19_Regular Season.csv Completed Successfully! 7 / 12 Complete!
data/player/hustle/hustle__Season_2018-19_Playoffs.csv Completed Successfully! 8 / 12 Complete!
data/player/hustle/hustle__Season_2017-18_Regular Season.csv Completed Successfully! 9 / 12 Complete!
data/player/hustle/hustle__Season_2017-18_Playoffs.csv Completed Successfully! 10 / 12 Complete!
data/play

In [623]:
# move the files to the correct folder
for file in os.listdir('data/player/hustle/'):
    if '.csv' in file:
        if 'Playoffs' in file:
            os.rename('data/player/hustle/' + file, 'data/player/hustle/playoffs/' + file)
        else:
            os.rename('data/player/hustle/' + file, 'data/player/hustle/regular_season/' + file)

In [624]:
hustle_data = append_the_data('data/player/hustle/regular_season/', 'hust_', 'hustle')
hustle_data

Unnamed: 0,hust_unnamed: 0,hust_player,hust_team,hust_age,hust_gp,hust_min,hust_screenassists,hust_screenassists pts,hust_deflections,hust_off loose ballsrecovered,hust_def loose ballsrecovered,hust_loose ballsrecovered,hust_% loose ballsrecovered off,hust_% loose ballsrecovered def,hust_chargesdrawn,hust_contested2pt shots,hust_contested3pt shots,hust_contestedshots,hust_season,hust_season_type
0,0,,,,,,,,,,,,,,,,,,2016,Regular
1,1,AJ Hammons,DAL,24.0,22.0,7.4,1.0,2.3,0.2,0.0,0.0,0.1,0.0,0.0,0.00,2.0,0.4,2.4,2016,Regular
2,2,Aaron Brooks,IND,32.0,65.0,13.7,0.2,0.6,0.8,0.0,0.0,0.3,0.0,0.0,0.12,1.6,1.6,3.1,2016,Regular
3,3,Aaron Gordon,ORL,21.0,80.0,28.7,0.5,1.1,1.4,0.0,0.0,0.8,0.0,0.0,0.06,4.0,2.2,6.2,2016,Regular
4,4,Aaron Harrison,CHA,22.0,5.0,3.3,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.00,0.4,0.0,0.4,2016,Regular
...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...
601,601,Zach LaVine,CHI,27.0,67.0,34.7,0.5,0.9,1.0,0.4,0.2,0.6,61.9,38.1,0.01,2.5,3.4,5.9,2021,Regular
602,602,Zavier Simpson,OKC,25.0,4.0,43.5,0.0,0.0,2.3,0.3,0.8,1.0,25.0,75.0,0.00,3.8,6.0,9.8,2021,Regular
603,603,Zeke Nnaji,DEN,21.0,41.0,17.0,0.9,2.4,0.8,0.2,0.1,0.4,62.5,37.5,0.00,2.0,1.7,3.7,2021,Regular
604,604,Ziaire Williams,MEM,20.0,62.0,21.7,0.1,0.2,1.0,0.1,0.2,0.3,42.1,57.9,0.00,2.1,2.5,4.6,2021,Regular


In [625]:
hustle_data.to_csv('data/player/aggregates/All_Hustle.csv')

## Player - Box Outs (Done 12.23)

In [629]:
player_boxouts = 'https://www.nba.com/stats/players/box-outs/'
boxouts_urls = []
years =['2021-22', '2020-21', '2019-20', '2018-19', '2017-18']

for year in years:
    for s_types in season_types:
        url = player_boxouts + '?Season=' + year + '&SeasonType=' + s_types
        boxouts_urls.append(str(url))

In [630]:
grab_player_data (boxouts_urls, 'data/player/boxouts/')

data/player/boxouts/box-outs__Season_2021-22_Regular Season.csv Completed Successfully! 1 / 10 Complete!
data/player/boxouts/box-outs__Season_2021-22_Playoffs.csv Completed Successfully! 2 / 10 Complete!
data/player/boxouts/box-outs__Season_2020-21_Regular Season.csv Completed Successfully! 3 / 10 Complete!
data/player/boxouts/box-outs__Season_2020-21_Playoffs.csv Completed Successfully! 4 / 10 Complete!
data/player/boxouts/box-outs__Season_2019-20_Regular Season.csv Completed Successfully! 5 / 10 Complete!
data/player/boxouts/box-outs__Season_2019-20_Playoffs.csv Completed Successfully! 6 / 10 Complete!
data/player/boxouts/box-outs__Season_2018-19_Regular Season.csv Completed Successfully! 7 / 10 Complete!
data/player/boxouts/box-outs__Season_2018-19_Playoffs.csv Completed Successfully! 8 / 10 Complete!
data/player/boxouts/box-outs__Season_2017-18_Regular Season.csv Completed Successfully! 9 / 10 Complete!
data/player/boxouts/box-outs__Season_2017-18_Playoffs.csv Completed Successfull

In [631]:
# move the files to the correct folder
for file in os.listdir('data/player/boxouts/'):
    if '.csv' in file:
        if 'Playoffs' in file:
            os.rename('data/player/boxouts/' + file, 'data/player/boxouts/playoffs/' + file)
        else:
            os.rename('data/player/boxouts/' + file, 'data/player/boxouts/regular_season/' + file)

In [632]:
boxouts = append_the_data('data/player/boxouts/regular_season', 'boxouts_', 'box-outs')
boxouts

Unnamed: 0,boxouts_unnamed: 0,boxouts_player,boxouts_team,boxouts_age,boxouts_gp,boxouts_min,boxouts_box outs,boxouts_off box outs,boxouts_def box outs,boxouts_team rebon box outs,boxouts_player rebon box outs,boxouts_% box outs off,boxouts_% box outs def,boxouts_% team rebwhen box out,boxouts_% player rebwhen box out,boxouts_season,boxouts_season_type
0,0,,,,,,,,,,,,,,,2017,Regular
1,1,Aaron Brooks,MIN,33.0,32.0,5.9,0.2,0.0,0.2,0.1,0.0,0.0,100.0,75.0,25.0,2017,Regular
2,2,Aaron Gordon,ORL,22.0,58.0,32.9,2.7,0.2,2.5,1.3,0.4,7.0,93.0,85.2,23.9,2017,Regular
3,3,Aaron Harrison,DAL,23.0,9.0,25.9,0.8,0.0,0.8,0.7,0.0,0.0,100.0,85.7,0.0,2017,Regular
4,4,Aaron Jackson,HOU,32.0,1.0,34.5,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,2017,Regular
...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...
601,601,Zach LaVine,CHI,27.0,67.0,34.7,0.4,0.0,0.4,0.4,0.2,10.7,89.3,96.4,50.0,2021,Regular
602,602,Zavier Simpson,OKC,25.0,4.0,43.5,0.3,0.0,0.3,0.3,0.0,0.0,100.0,100.0,0.0,2021,Regular
603,603,Zeke Nnaji,DEN,21.0,41.0,17.0,1.3,0.5,0.7,1.1,0.8,42.3,57.7,97.9,64.6,2021,Regular
604,604,Ziaire Williams,MEM,20.0,62.0,21.7,0.2,0.0,0.2,0.2,0.1,0.0,100.0,100.0,25.0,2021,Regular


In [633]:
boxouts.to_csv('data/player/aggregates/All_Boxouts.csv')

## Player - Matchups

In [None]:
# load from my player matchups file

## Player - Images

Now, I have the player names and ids. From looking through the NBA.com website, I know I can access the player photos through this general url: 
https://cdn.nba.com/headshots/nba/latest/260x190/[PLAYER_ID].png

So, from here I add a player URL column to download the photos from. 


In [None]:
stats = pd.read_csv('data/player/nba_players_and_photo_urls.csv')

Now, to download the images

In [None]:
pic_url = stats['photo_url']
pic_url = pd.DataFrame(pic_url)
pic_url.head()

In [None]:
os.mkdir('data/player/photos')
os.chdir('data/player/photos')

# Download all listed pictures. Takes about 18 minutes. 


for pic in pic_url['photo_url']:
    image_url = pic
    filename = image_url.split("/")[-1]
    r = requests.get(image_url, stream = True)
    if r.status_code == 200:                         # Check if image found 
        r.raw.decode_content = True                 
        with open(filename,'wb') as f:              
         shutil.copyfileobj(r.raw, f)