# PSTAT134 Assignment 2
## Creating an Interactive Dashboard for NBA Statistics
### Andrew Zhang

## Data Download
In this step, we are downloading NBA data through the use of the get_nba_data function. For this assignment, we are interested in game data of Lebron James throughout the 2016-2017 regular season, so we are using the endpoint *playergamelog*.

In [None]:
import pandas as pd

def get_nba_data(endpt, params, return_url=False):

    ## endpt: https://github.com/seemethere/nba_py/wiki/stats.nba.com-Endpoint-Documentation
    ## params: dictionary of parameters: i.e., {'LeagueID':'00'}
    
    from pandas import DataFrame
    from urllib.parse import urlencode
    import json
    
    useragent = "\"Mozilla/5.0 (Macintosh; Intel Mac OS X 10_11_2) AppleWebKit/601.3.9 (KHTML, like Gecko) Version/9.0.2 Safari/601.3.9\""
    dataurl = "\"" + "http://stats.nba.com/stats/" + endpt + "?" + urlencode(params) + "\""
    
    # for debugging: just return the url
    if return_url:
        return(dataurl)
    
    jsonstr = !wget -q -O - --user-agent={useragent} {dataurl}
    
    data = json.loads(jsonstr[0])
    
    h = data['resultSets'][0]['headers']
    d = data['resultSets'][0]['rowSet']
    
    return(DataFrame(d, columns=h))

In [None]:
params = {'PlayerID':'2544',
          'Season':'2016-17',
          'SeasonType':'Regular Season'}

gamedata = get_nba_data('playergamelog', params)
gamedata.head()

In [None]:
# get all teams
params = {'LeagueID':'00'}
teams = get_nba_data('commonTeamYears', params)
teams[teams.MAX_YEAR == '2017']
teams.head()

In [None]:
# get all players
params = {'LeagueID':'00', 'Season': '2016-17', 'IsOnlyCurrentSeason': '0'}
players = get_nba_data('commonallplayers', params)
players = players[players.TO_YEAR == '2017']
players.head()

## Developing Interactive Widgets
Continuing from the data download portion, we are developing interactive widgets to adjust the parameters from the original data regarding performances throughout the 2016-2017 regular season. We will create widgets to update which year the data is covering, whether it is the regular season or playoffs, and which player is of interest to the user.

## Season Widget
Statistics for LeBron James in the regular season in varying seasons.

In [None]:
from ipywidgets import interact, Dropdown, Button
import ipywidgets as widgets

def update_season(season):
    params_season = {'PlayerID':'2544',
                     'Season':season,
                     'SeasonType':'Regular Season'
                    }
    gamedata_update = get_nba_data('playergamelog', params_season)
    display(gamedata_update.head())
    
drop_season = {'2012-13': '2012-13', '2013-14': '2013-14' , '2014-15': '2014-15', '2015-16': '2015-16'}

interact(update_season, season=drop_season)

## Season Type Widget
Statistics for LeBron James in the 2016-17 season during different parts of the season.

In [None]:
def update_type(season_type):
    params_type = {'PlayerID':'2544',
                   'Season':'2016-17',
                   'SeasonType':season_type}
    gamedata_update = get_nba_data('playergamelog', params_type)
    display(gamedata_update.head())
    
drop_type = {'Pre': 'Pre Season', 'Regular': 'Regular Season', 'Post': 'Playoffs', 'All-Star': 'All Star' }

interact(update_type, season_type=drop_type)

### Player Widget
Statistics in the regular season for the 2016-17 season for various players.

In [None]:
def update_player(ID):
    params_ID = {'PlayerID':ID,
                 'Season':'2016-17',
                 'SeasonType':'Regular Season'}
    gamedata_update = get_nba_data('playergamelog', params_ID)
    display(gamedata_update)
    
play_dd_text = players.DISPLAY_LAST_COMMA_FIRST
play_ID = dict(zip(play_dd_text, players.PERSON_ID))

interact(update_player, ID=play_ID)

## Changing Widget States
Combining interactive widgets for narrowing down players based on teams and using that to find game statistics for that particular player.

In [None]:
plyr_by_team_dd = dict()

for t, p in players.groupby('TEAM_ID'):
    
    plyr_by_team_dd[t] = dict(zip(p.DISPLAY_LAST_COMMA_FIRST, p.PERSON_ID))

plyr_by_team_dd

In [None]:
#Build team dictionary
team_dd_text = players.TEAM_CITY+' '+players.TEAM_NAME
team_dd = dict(zip(team_dd_text, players.TEAM_ID))
team_dd

In [None]:
selected = "Cleveland Cavaliers"

season = {'2016-2017': '2016-17'}

season_menu = Dropdown(options=season)
team_menu = Dropdown(options=team_dd, label = selected)
play_menu = Dropdown(options=play_ID, label='James, LeBron')
type_menu = Dropdown(options=drop_type)

fetch_button = Button(description="Get Stats")

display(season_menu, type_menu, team_menu, play_menu, fetch_button)

def update_team(change):
    play_menu.options = plyr_by_team_dd[change['new']]

def update_play(change):
    print(play_menu.label)
    params_ID = {'PlayerID':play_menu.value,
                 'Season':season_menu.value,
                 'SeasonType':type_menu.value}
    gamedata_update = get_nba_data('playergamelog', params_ID)
    if gamedata_update.empty == True:
        print('Not Applicable to Player')
    else:
        display(gamedata_update)
    
team_menu.observe(update_team, names='value')

fetch_button.on_click(update_play)


## Data Transformations and Visualizations
Comparing points, rebounds, assists, and turnovers in both loses/wins and home/away games.

### Wins vs. Loses
Comparing LeBron James's productivity in wins and loses.

In [None]:
for r, d in gamedata.groupby('WL'):
    if r == 'L':
        loses = d
    else:
        wins = d

In [None]:
loses.head()

In [None]:
gamedata.groupby('WL').mean()

In [None]:
%matplotlib inline
import matplotlib.pyplot as plt

a = range(len(loses))
b = loses.PTS
c = loses.REB
d = loses.AST
e = loses.TOV

f = range(len(wins))
g = wins.PTS
h = wins.REB
i = wins.AST
j = wins.TOV

plt.scatter(a, b)
plt.scatter(a, c)
plt.scatter(a, d)
plt.scatter(a, e)
plt.xlabel('Games')
plt.ylabel('Statistics')
plt.legend()
plt.title('LeBron James in Wins')
plt.show()

plt.scatter(f, g)
plt.scatter(f, h)
plt.scatter(f, i)
plt.scatter(f, j)
plt.xlabel('Games')
plt.ylabel('Statistics')
plt.legend()
plt.title('LeBron James in Loses')
plt.show()

In terms of averages, we can see that James performs significantly better when he plays at home, as indicated by his plus minus statistic. If we compare his in-game statistics such as points, rebounds, assists, etc., there isn't a significant difference.  Even in the scatterplot, it is difficult to discern the differences between his performances in wins and losses, however, the plus minus statistic indicates overall impact and performance during games, thus allowing us to confirm a better performance at home than on the road.

### Home vs. Away
Comparing LeBron James's productivity in home and away games to Jae Crowder's productivity in home and away games.

In [None]:
gamedata['location'] = 0

for i in range(len(gamedata.MATCHUP)):
    if '@' in gamedata.MATCHUP[i]:
        gamedata['location'][i] = 'away'
    else:
        gamedata['location'][i] = 'home'

In [None]:
for i, s in gamedata.groupby('location'):
    if i == 'away':
        away = s
    else:
        home = s

In [None]:
a = range(len(away))
b = away.PTS
c = away.REB
d = away.AST
e = away.TOV

f = range(len(home))
g = home.PTS
h = home.REB
i = home.AST
j = home.TOV

plt.scatter(a, b)
plt.scatter(a, c)
plt.scatter(a, d)
plt.scatter(a, e)
plt.xlabel('Games')
plt.ylabel('Statistics')
plt.legend()
plt.title('LeBron James Away Games')
plt.show()

plt.scatter(f, g)
plt.scatter(f, h)
plt.scatter(f, i)
plt.scatter(f, j)
plt.xlabel('Games')
plt.ylabel('Statistics')
plt.legend()
plt.title('LeBron James Home Games')
plt.show()

In [None]:
params_jc = {'PlayerID':'203109',
          'Season':'2016-17',
          'SeasonType':'Regular Season'}

gamedata_jc = get_nba_data('playergamelog', params_jc)
gamedata_jc.head()

In [None]:
gamedata_jc['location'] = 0

for i in range(len(gamedata_jc.MATCHUP)):
    if '@' in gamedata_jc.MATCHUP[i]:
        gamedata_jc['location'][i] = 'away'
    else:
        gamedata_jc['location'][i] = 'home'

In [None]:
for i, s in gamedata_jc.groupby('location'):
    if i == 'away':
        away_jc = s
    else:
        home_jc = s

In [None]:
a_jc = range(len(away_jc))
b_jc = away_jc.PTS
c_jc = away_jc.REB
d_jc = away_jc.AST
e_jc = away_jc.TOV

f_jc = range(len(home_jc))
g_jc = home_jc.PTS
h_jc = home_jc.REB
i_jc = home_jc.AST
j_jc = home_jc.TOV

plt.scatter(a_jc, b_jc)
plt.scatter(a_jc, c_jc)
plt.scatter(a_jc, d_jc)
plt.scatter(a_jc, e_jc)
plt.xlabel('Games')
plt.ylabel('Statistics')
plt.legend()
plt.title('Jae Crowder Away Games')
plt.show()

plt.scatter(f_jc, g_jc)
plt.scatter(f_jc, h_jc)
plt.scatter(f_jc, i_jc)
plt.scatter(f_jc, j_jc)
plt.xlabel('Games')
plt.ylabel('Statistics')
plt.legend()
plt.title('Jae Crowder Home Games')
plt.show()

In [None]:
gamedata.groupby('location').mean()

In [None]:
gamedata_jc.groupby('location').mean()

If we look at James's data, it is very difficult to discern any significant differences in his home and away game statistics. This can simply be attributed to how good of a player he is as he is able to perform whenever he is needed. Again, his game statistics are relatively the same and his plus minus indicate he is relatively productive at home or on the road. For this reason, it is important to look at another arbitrary player to see if James's numbers simply indicate his skill or if it is attributed to another factor. We have selected Jae Crowder, another "average" player who is known more for his defensive prowess and grittiness. Upon analysis of his numbers, we can see that his statistics are indicative of a great player. Although James's numbers are better than Crowders, it is possible that being on a top tier team, which both the Cavaliers and Celtics were in 2016-17, could play a role in the players having such high statistics as they are surrounded by better teammates. 