## Using nba-dataloader

This notebook takes you through a few examples of downloading data from http://stats.nba.com and using it to do some simple data analysis.

### Installation
Make sure you have installed the nba-dataloader package

In [1]:
%pip install nba-dataloader

Collecting nba-dataloader
  Downloading nba_dataloader-1.0.2-py3-none-any.whl.metadata (5.5 kB)
Downloading nba_dataloader-1.0.2-py3-none-any.whl (17 kB)
Installing collected packages: nba-dataloader
Successfully installed nba-dataloader-1.0.2
Note: you may need to restart the kernel to use updated packages.


## Downloading data
Lets download some data, for this exercise we will be querying the following resources 
- [LeagueDashTeamStats](https://any-api.com/nba_com/nba_com/docs/_leaguedashteamstats/GET)
- [LeagueDashPlayerStats](https://any-api.com/nba_com/nba_com/docs/_leaguedashplayerstats/GET)

However before we do that let's check out how to use the module using the --help option

In [2]:
%run -m nba_dataloader -h

usage: __main__.py [-h] [--params PARAMS] [--partition_by PARTITION_BY]
                   [--mode {overwrite,append,error,ignore}]
                   [--location LOCATION]
                   resource

Downloads data from stats.nba.com and persists on disk as delta tables

positional arguments:
  resource              Will make a request to -->
                        https://stats.nba.com/stats/<endpoint>

options:
  -h, --help            show this help message and exit
  --params PARAMS       A python module containing variable 'params' for the
                        query
  --partition_by PARTITION_BY
                        The column to partition by
  --mode {overwrite,append,error,ignore}
                        The write mode
  --location LOCATION   Location to write the fetched data, defaults to tmp/


You can see the nba_dataloader package requires one parameter ```resource```. Examining [LeagueDashPlayerStats](https://any-api.com/nba_com/nba_com/docs/_leaguedashplayerstats/GET) resource we see that the resource takes a few required request parameters. 

### Specifying parameters
There are a few ways to specify the request parameters:-
#### Using --params
You can provide a python module containing a variable params of type list of dicts.
#### Using Defaults
If no parameters are provided, the default behavior of the script is to look for a module called ```request_params.<resource>_params```. 
    
In this case if we used the default, then the script will try to load the request parameters from ```request_params.leaguedashplayerstats_params```

If you started the jupyter server in the git repo that you cloned, you will find the module under ```request_params/leaguedashplayerstats.py``` Have a look at the file.

In [None]:
default_params = {
        "LastNGames": 0,
        "LeagueID": "00",
        "MeasureType": "Base",
        "Month": 0,
        "OpponentTeamID": 0,
        "PORound": 0,
        "PaceAdjust": "N",
        "PerMode": "Totals",
        "Period": 0,
        "PlusMinus": "N",
        "Rank": "N",
        "SeasonType": "Regular Season",
        "TeamID": 0
}
seasons = {'1996-97', '1997-98', '1998-99', '1999-00', '2000-01', '2001-02', '2002-03', '2003-04', '2004-05',
           '2005-06', '2006-07', '2007-08', '2008-09', '2009-10', '2010-11', '2011-12', '2012-13', '2013-14',
           '2014-15', '2015-16', '2016-17', '2017-18', '2018-19', '2019-20', '2020-21', '2021-22', '2022-23'}
params = map(lambda season: {'Season': season} | default_params, seasons)

The ```map(...)``` operation generates a list of dicts generated by appending the key/value "Season":<season> to the default_params for each of the 27 seasons from 1996-97 to 2022-23. The result can bee seen by running the code below

In [None]:
list(params)

Let's run the package using the defaults

In [3]:
%run -m nba_dataloader leaguedashplayerstats

2023-11-07 17:11:57,646	INFO worker.py:1664 -- Started a local Ray instance. View the dashboard at [1m[32mhttp://127.0.0.1:8265 [39m[22m


    MeasureType PerMode PlusMinus PaceAdjust Rank LeagueID   Season  \
0          Base  Totals         N          N    N       00  2021-22   
1          Base  Totals         N          N    N       00  2021-22   
2          Base  Totals         N          N    N       00  2021-22   
3          Base  Totals         N          N    N       00  2021-22   
4          Base  Totals         N          N    N       00  2021-22   
..          ...     ...       ...        ...  ...      ...      ...   
600        Base  Totals         N          N    N       00  2021-22   
601        Base  Totals         N          N    N       00  2021-22   
602        Base  Totals         N          N    N       00  2021-22   
603        Base  Totals         N          N    N       00  2021-22   
604        Base  Totals         N          N    N       00  2021-22   

         SeasonType  PORound  Month  ...  BLK_RANK  BLKA_RANK  PF_RANK  \
0    Regular Season        0      0  ...        69        568      492   

    MeasureType PerMode PlusMinus PaceAdjust Rank LeagueID   Season  \
0          Base  Totals         N          N    N       00  2016-17   
1          Base  Totals         N          N    N       00  2016-17   
2          Base  Totals         N          N    N       00  2016-17   
3          Base  Totals         N          N    N       00  2016-17   
4          Base  Totals         N          N    N       00  2016-17   
..          ...     ...       ...        ...  ...      ...      ...   
481        Base  Totals         N          N    N       00  2016-17   
482        Base  Totals         N          N    N       00  2016-17   
483        Base  Totals         N          N    N       00  2016-17   
484        Base  Totals         N          N    N       00  2016-17   
485        Base  Totals         N          N    N       00  2016-17   

         SeasonType  PORound  Month  ...  BLK_RANK  BLKA_RANK  PF_RANK  \
0    Regular Season        0      0  ...       253         93       84   

    MeasureType PerMode PlusMinus PaceAdjust Rank LeagueID   Season  \
0          Base  Totals         N          N    N       00  2018-19   
1          Base  Totals         N          N    N       00  2018-19   
2          Base  Totals         N          N    N       00  2018-19   
3          Base  Totals         N          N    N       00  2018-19   
4          Base  Totals         N          N    N       00  2018-19   
..          ...     ...       ...        ...  ...      ...      ...   
525        Base  Totals         N          N    N       00  2018-19   
526        Base  Totals         N          N    N       00  2018-19   
527        Base  Totals         N          N    N       00  2018-19   
528        Base  Totals         N          N    N       00  2018-19   
529        Base  Totals         N          N    N       00  2018-19   

         SeasonType  PORound  Month  ...  BLK_RANK  BLKA_RANK  PF_RANK  \
0    Regular Season        0      0  ...        50        471      439   

    MeasureType PerMode PlusMinus PaceAdjust Rank LeagueID   Season  \
0          Base  Totals         N          N    N       00  2009-10   
1          Base  Totals         N          N    N       00  2009-10   
2          Base  Totals         N          N    N       00  2009-10   
3          Base  Totals         N          N    N       00  2009-10   
4          Base  Totals         N          N    N       00  2009-10   
..          ...     ...       ...        ...  ...      ...      ...   
437        Base  Totals         N          N    N       00  2009-10   
438        Base  Totals         N          N    N       00  2009-10   
439        Base  Totals         N          N    N       00  2009-10   
440        Base  Totals         N          N    N       00  2009-10   
441        Base  Totals         N          N    N       00  2009-10   

         SeasonType  PORound  Month  ...  BLK_RANK  BLKA_RANK  PF_RANK  \
0    Regular Season        0      0  ...       357        205      107   

    MeasureType PerMode PlusMinus PaceAdjust Rank LeagueID   Season  \
0          Base  Totals         N          N    N       00  2001-02   
1          Base  Totals         N          N    N       00  2001-02   
2          Base  Totals         N          N    N       00  2001-02   
3          Base  Totals         N          N    N       00  2001-02   
4          Base  Totals         N          N    N       00  2001-02   
..          ...     ...       ...        ...  ...      ...      ...   
435        Base  Totals         N          N    N       00  2001-02   
436        Base  Totals         N          N    N       00  2001-02   
437        Base  Totals         N          N    N       00  2001-02   
438        Base  Totals         N          N    N       00  2001-02   
439        Base  Totals         N          N    N       00  2001-02   

         SeasonType  PORound  Month  ...  BLK_RANK  BLKA_RANK  PF_RANK  \
0    Regular Season        0      0  ...       284         50       74   

    MeasureType PerMode PlusMinus PaceAdjust Rank LeagueID   Season  \
0          Base  Totals         N          N    N       00  2002-03   
1          Base  Totals         N          N    N       00  2002-03   
2          Base  Totals         N          N    N       00  2002-03   
3          Base  Totals         N          N    N       00  2002-03   
4          Base  Totals         N          N    N       00  2002-03   
..          ...     ...       ...        ...  ...      ...      ...   
423        Base  Totals         N          N    N       00  2002-03   
424        Base  Totals         N          N    N       00  2002-03   
425        Base  Totals         N          N    N       00  2002-03   
426        Base  Totals         N          N    N       00  2002-03   
427        Base  Totals         N          N    N       00  2002-03   

         SeasonType  PORound  Month  ...  BLK_RANK  BLKA_RANK  PF_RANK  \
0    Regular Season        0      0  ...       390         21        1   

    MeasureType PerMode PlusMinus PaceAdjust Rank LeagueID   Season  \
0          Base  Totals         N          N    N       00  1998-99   
1          Base  Totals         N          N    N       00  1998-99   
2          Base  Totals         N          N    N       00  1998-99   
3          Base  Totals         N          N    N       00  1998-99   
4          Base  Totals         N          N    N       00  1998-99   
..          ...     ...       ...        ...  ...      ...      ...   
435        Base  Totals         N          N    N       00  1998-99   
436        Base  Totals         N          N    N       00  1998-99   
437        Base  Totals         N          N    N       00  1998-99   
438        Base  Totals         N          N    N       00  1998-99   
439        Base  Totals         N          N    N       00  1998-99   

         SeasonType  PORound  Month  ...  BLK_RANK  BLKA_RANK  PF_RANK  \
0    Regular Season        0      0  ...       216        269      213   

    MeasureType PerMode PlusMinus PaceAdjust Rank LeagueID   Season  \
0          Base  Totals         N          N    N       00  1996-97   
1          Base  Totals         N          N    N       00  1996-97   
2          Base  Totals         N          N    N       00  1996-97   
3          Base  Totals         N          N    N       00  1996-97   
4          Base  Totals         N          N    N       00  1996-97   
..          ...     ...       ...        ...  ...      ...      ...   
436        Base  Totals         N          N    N       00  1996-97   
437        Base  Totals         N          N    N       00  1996-97   
438        Base  Totals         N          N    N       00  1996-97   
439        Base  Totals         N          N    N       00  1996-97   
440        Base  Totals         N          N    N       00  1996-97   

         SeasonType  PORound  Month  ...  BLK_RANK  BLKA_RANK  PF_RANK  \
0    Regular Season        0      0  ...       202        342      272   

    MeasureType PerMode PlusMinus PaceAdjust Rank LeagueID   Season  \
0          Base  Totals         N          N    N       00  2011-12   
1          Base  Totals         N          N    N       00  2011-12   
2          Base  Totals         N          N    N       00  2011-12   
3          Base  Totals         N          N    N       00  2011-12   
4          Base  Totals         N          N    N       00  2011-12   
..          ...     ...       ...        ...  ...      ...      ...   
473        Base  Totals         N          N    N       00  2011-12   
474        Base  Totals         N          N    N       00  2011-12   
475        Base  Totals         N          N    N       00  2011-12   
476        Base  Totals         N          N    N       00  2011-12   
477        Base  Totals         N          N    N       00  2011-12   

         SeasonType  PORound  Month  ...  BLK_RANK  BLKA_RANK  PF_RANK  \
0    Regular Season        0      0  ...       387         83      105   

What just happened??? 

The script queried the endpoint http://stats.nba.com/leaguedashplayerstats 27 times, once each for every dict value in the ```params``` list. Player stats for every season from 1996-97 to 2022-23 was fetched and the results are stored in ```tmp/LeagueDashPlayerStats``` as a delta table. Let's examine the contents of the delta table using the delta-rs package

In [4]:
from deltalake import DeltaTable
import pandas as pd

pd.set_option('display.max_columns', None)

dt = DeltaTable("tmp/leaguedashplayerstats")
display(dt.to_pandas())

Unnamed: 0,MeasureType,PerMode,PlusMinus,PaceAdjust,Rank,LeagueID,Season,SeasonType,PORound,Month,OpponentTeamID,TeamID,Period,LastNGames,PLAYER_ID,PLAYER_NAME,NICKNAME,TEAM_ID,TEAM_ABBREVIATION,AGE,GP,W,L,W_PCT,MIN,FGM,FGA,FG_PCT,FG3M,FG3A,FG3_PCT,FTM,FTA,FT_PCT,OREB,DREB,REB,AST,TOV,STL,BLK,BLKA,PF,PFD,PTS,PLUS_MINUS,NBA_FANTASY_PTS,DD2,TD3,WNBA_FANTASY_PTS,GP_RANK,W_RANK,L_RANK,W_PCT_RANK,MIN_RANK,FGM_RANK,FGA_RANK,FG_PCT_RANK,FG3M_RANK,FG3A_RANK,FG3_PCT_RANK,FTM_RANK,FTA_RANK,FT_PCT_RANK,OREB_RANK,DREB_RANK,REB_RANK,AST_RANK,TOV_RANK,STL_RANK,BLK_RANK,BLKA_RANK,PF_RANK,PFD_RANK,PTS_RANK,PLUS_MINUS_RANK,NBA_FANTASY_PTS_RANK,DD2_RANK,TD3_RANK,WNBA_FANTASY_PTS_RANK
0,Base,Totals,N,N,N,00,2021-22,Regular Season,0,0,0,0,0,0,203932,Aaron Gordon,Aaron,1610612743,DEN,26.0,75,46,29,0.613,2375.418333,434,834,0.520,87,260,0.335,171,230,0.743,125,314,439,188,133,44,44,52,148,200,1126,321,2065.8,6,0,2016.0,46,32,417,167,32,55,69,106,134,125,265,63,61,327,42,67,51,106,60,177,69,568,492,60,64,24,71,85,40,70
1,Base,Totals,N,N,N,00,2021-22,Regular Season,0,0,0,0,0,0,1630565,Aaron Henry,Aaron,1610612755,PHI,22.0,6,6,0,1.000,16.983333,1,5,0.200,0,1,0.000,0,0,0.000,0,1,1,0,2,0,2,2,2,0,2,-20,7.2,0,0,7.0,521,472,1,1,564,561,557,569,514,538,514,543,547,543,557,569,574,569,516,554,457,110,47,564,570,359,573,268,40,571
2,Base,Totals,N,N,N,00,2021-22,Regular Season,0,0,0,0,0,0,1628988,Aaron Holiday,Aaron,1610612756,PHX,25.0,63,34,29,0.540,1020.750000,151,338,0.447,39,103,0.379,59,68,0.868,24,98,122,153,67,42,9,18,92,74,400,-39,861.9,0,0,816.0,184,155,417,266,266,257,250,287,258,274,126,238,264,90,322,311,321,143,177,192,335,338,354,238,263,409,268,268,40,270
3,Base,Totals,N,N,N,00,2021-22,Regular Season,0,0,0,0,0,0,1630174,Aaron Nesmith,Aaron,1610612738,BOS,22.0,52,32,20,0.615,573.878333,72,182,0.396,31,115,0.270,21,26,0.808,15,74,89,22,31,20,5,2,70,28,196,22,379.8,0,0,388.0,273,177,259,162,361,361,351,443,282,262,420,364,377,199,390,352,368,401,336,342,406,110,298,373,363,200,378,268,40,371
4,Base,Totals,N,N,N,00,2021-22,Regular Season,0,0,0,0,0,0,1630598,Aaron Wiggins,Aaron,1610612760,OKC,23.0,50,13,37,0.260,1208.750000,156,337,0.463,42,138,0.304,62,85,0.729,51,127,178,68,54,30,10,21,93,63,416,-235,797.6,0,0,784.0,289,400,518,526,230,254,251,223,245,235,368,228,222,342,173,261,244,277,225,252,321,385,358,268,255,571,286,268,40,282
...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...
12841,Base,Totals,N,N,N,00,2003-04,Regular Season,0,0,0,0,0,0,2585,Zaza Pachulia,Zaza,1610612753,ORL,20.0,59,13,46,0.220,658.916667,68,175,0.389,0,0,0.000,58,90,0.644,69,105,174,13,34,21,12,25,87,1,194,-65,487.3,0,0,447.0,232,337,394,422,296,306,296,329,289,361,289,235,219,338,148,249,222,356,291,279,231,247,173,21,299,312,292,235,19,292
12842,Base,Totals,N,N,N,00,2003-04,Regular Season,0,0,0,0,0,0,1442,Zeljko Rebraca,Zeljko,1610612737,ATL,32.0,24,9,15,0.375,269.838333,34,77,0.442,0,0,0.000,23,30,0.767,23,35,58,6,17,5,11,4,52,3,91,-37,200.6,0,0,187.0,357,368,101,334,358,351,355,178,289,361,289,314,325,177,299,347,340,385,348,369,243,64,121,2,356,259,358,235,19,357
12843,Base,Totals,N,N,N,00,2003-04,Regular Season,0,0,0,0,0,0,1985,Zendon Hamilton,Zendon,1610612755,PHI,29.0,46,16,30,0.348,470.736667,51,95,0.537,0,0,0.000,67,96,0.698,49,97,146,13,27,8,8,13,74,0,169,-66,384.7,2,0,360.0,292,319,247,349,329,330,349,11,289,361,289,209,208,286,205,266,249,356,317,351,273,161,156,79,312,315,316,146,19,319
12844,Base,Totals,N,N,N,00,2003-04,Regular Season,0,0,0,0,0,0,2565,Zoran Planinic,Zoran,1610612751,NJN,21.0,49,31,18,0.633,466.743333,53,129,0.411,9,32,0.281,38,60,0.633,14,41,55,68,36,13,3,11,70,0,153,-77,333.0,0,0,317.0,284,183,124,97,330,325,327,265,187,194,208,278,267,344,334,340,343,222,288,320,342,142,148,79,319,329,331,235,19,328


You will see at total of 12846 players logged minutes from 1996-97 to 2022-23.

### Querying using Spark
 Let's see how many players played for each team during this period. This time we will use spark to query the delta tables. However before that we need to install a few more python packages:- pyspark and delta-spark. Ensure you have the compatible versions of the two from [here](https://docs.delta.io/latest/releases.html). We will be installing 
 - [delta-spark==2.4.0](https://pypi.org/project/delta-spark/2.4.0/)
 - [pyspark==3.4.1](https://pypi.org/project/pyspark/3.4.1/)
 
 Installing delta-spark should also install the correct version of pyspark

In [None]:
%pip install delta-spark==2.4.0

In [3]:
import os
import sys
import pyspark
from delta.pip_utils import configure_spark_with_delta_pip
os.environ['PYSPARK_PYTHON'] = sys.executable
os.environ['PYSPARK_DRIVER_PYTHON'] = sys.executable
    
builder = pyspark.sql.SparkSession.builder.appName("NBA Analytics")\
.config("spark.sql.extensions", "io.delta.sql.DeltaSparkSessionExtension")\
.config("spark.sql.catalog.spark_catalog", "org.apache.spark.sql.delta.catalog.DeltaCatalog")
    
spark = configure_spark_with_delta_pip(builder).getOrCreate()
spark.conf.set("spark.sql.execution.arrow.pyspark.enabled", "true")

df = spark.read.format("delta").load("tmp/LeagueDashPlayerStats")
df.createOrReplaceTempView("PLAYER_STATS")
spark.sql("SELECT TEAM_ABBREVIATION, count(TEAM_ABBREVIATION) as NUM_PLAYERS from PLAYER_STATS"\
          " group by TEAM_ABBREVIATION order by NUM_PLAYERS").show(50)

+-----------------+-----------+
|TEAM_ABBREVIATION|NUM_PLAYERS|
+-----------------+-----------+
|              NOK|         32|
|              VAN|         73|
|              CHH|         89|
|              NOH|        143|
|              NOP|        176|
|              SEA|        182|
|              BKN|        200|
|              OKC|        254|
|              NJN|        257|
|              CHA|        305|
|              MEM|        371|
|              PHX|        415|
|              UTA|        416|
|              MIN|        417|
|              SAC|        418|
|              DET|        420|
|              CHI|        423|
|              NYK|        425|
|              BOS|        425|
|              GSW|        427|
|              MIL|        427|
|              POR|        428|
|              DEN|        428|
|              IND|        428|
|              LAL|        429|
|              ORL|        429|
|              SAS|        433|
|              HOU|        434|
|       

Let us query another resource this time. The [drafthistory](https://any-api.com/nba_com/nba_com/docs/_drafthistory/GET). This time we will use a custom python module to pass the parameters.

Run the cell below to create a python file ```drafthistory_params.py``` with the following content

In [5]:
%%writefile drafthistory_params.py

params = [{
    "LeagueID":"00"
}]

Writing drafthistory_params.py


Now run the package to fetch the draft history data

In [6]:
%run -m nba_dataloader drafthistory --params drafthistory_params

2023-11-07 17:12:36,933	INFO worker.py:1664 -- Started a local Ray instance. View the dashboard at [1m[32mhttp://127.0.0.1:8265 [39m[22m


     LeagueID  PERSON_ID        PLAYER_NAME SEASON  ROUND_NUMBER  ROUND_PICK  \
0          00    1641705  Victor Wembanyama   2023             1           1   
1          00    1641706     Brandon Miller   2023             1           2   
2          00    1630703    Scoot Henderson   2023             1           3   
3          00    1641708      Amen Thompson   2023             1           4   
4          00    1641709     Ausar Thompson   2023             1           5   
...       ...        ...                ...    ...           ...         ...   
8252       00      78344        Jack Tingle   1947             0           0   
8253       00      77684          Fred Nagy   1947             0           0   
8254       00      77613         Wat Misaka   1947             0           0   
8255       00      78402         Gene Vance   1947             0           0   
8256       00      77452        John Mandic   1947             0           0   

      OVERALL_PICK DRAFT_TYPE     TEAM_