## NBA 3 Point Statistical Analysis since 1980

## New Advanced NBA 3 point Stats updated 2021

3NG, short for 3-point Net Gain. 3NG calculates the number of points a team gains per possession when the player makes a 3-pointer, minus the number of points the team loses per possession when the player misses a 3-pointer. The gain is based on the expected value of points per possession in the league.

This Jupyter Notebook contains Exploratory Data Analysis introducing 3NG. It also introduces EM3A, and EM3, Expected Minutes before a 3-point Attempt, and Expected Minutes before a 3-pointer. I use these statistics to rank 3-point shooters throughout NBA History. Data Wrangling steps have been included for those with an interest in pandas.

## References

https://www.kaggle.com/drgilermo/nba-players-stats
    
https://www.basketball-reference.com/

New 3 point statistics created by Corey J Wade from 2018 Article "The 3-point" Statistic to Rule them All" https://towardsdatascience.com/the-3-point-statistic-to-rule-them-all-12ac018a955a

# NBA Player Stats through 2017

In [1]:
# Import pandas
import pandas as pd

# Silence warnings due to chained assignments
pd.options.mode.chained_assignment = None  # default='warn'

# Open file as DataFrame - this data file can be downloaded from Kaggle at 
# https://www.kaggle.com/drgilermo/nba-players-stats

df_2017 = pd.read_csv('Seasons_Stats.csv')

# Display first five rows
df_2017.head()

Unnamed: 0.1,Unnamed: 0,Year,Player,Pos,Age,Tm,G,GS,MP,PER,...,FT%,ORB,DRB,TRB,AST,STL,BLK,TOV,PF,PTS
0,0,1950.0,Curly Armstrong,G-F,31.0,FTW,63.0,,,,...,0.705,,,,176.0,,,,217.0,458.0
1,1,1950.0,Cliff Barker,SG,29.0,INO,49.0,,,,...,0.708,,,,109.0,,,,99.0,279.0
2,2,1950.0,Leo Barnhorst,SF,25.0,CHS,67.0,,,,...,0.698,,,,140.0,,,,192.0,438.0
3,3,1950.0,Ed Bartels,F,24.0,TOT,15.0,,,,...,0.559,,,,20.0,,,,29.0,63.0
4,4,1950.0,Ed Bartels,F,24.0,DNN,13.0,,,,...,0.548,,,,20.0,,,,27.0,59.0


The 3-point shot did not exist before 1980, so will start there

In [2]:
# Delete unnecessary column
del df_2017['Unnamed: 0']

# Only select years after 1979
df_2017 = df_2017[df_2017['Year']>=1980]

# Display last five rows
df_2017.tail()

Unnamed: 0,Year,Player,Pos,Age,Tm,G,GS,MP,PER,TS%,...,FT%,ORB,DRB,TRB,AST,STL,BLK,TOV,PF,PTS
24686,2017.0,Cody Zeller,PF,24.0,CHO,62.0,58.0,1725.0,16.7,0.604,...,0.679,135.0,270.0,405.0,99.0,62.0,58.0,65.0,189.0,639.0
24687,2017.0,Tyler Zeller,C,27.0,BOS,51.0,5.0,525.0,13.0,0.508,...,0.564,43.0,81.0,124.0,42.0,7.0,21.0,20.0,61.0,178.0
24688,2017.0,Stephen Zimmerman,C,20.0,ORL,19.0,0.0,108.0,7.3,0.346,...,0.6,11.0,24.0,35.0,4.0,2.0,5.0,3.0,17.0,23.0
24689,2017.0,Paul Zipser,SF,22.0,CHI,44.0,18.0,843.0,6.9,0.503,...,0.775,15.0,110.0,125.0,36.0,15.0,16.0,40.0,78.0,240.0
24690,2017.0,Ivica Zubac,C,19.0,LAL,38.0,11.0,609.0,17.0,0.547,...,0.653,41.0,118.0,159.0,30.0,14.0,33.0,30.0,66.0,284.0


In [3]:
# Display info
df_2017.info()

<class 'pandas.core.frame.DataFrame'>
Int64Index: 18927 entries, 5727 to 24690
Data columns (total 52 columns):
 #   Column  Non-Null Count  Dtype  
---  ------  --------------  -----  
 0   Year    18927 non-null  float64
 1   Player  18927 non-null  object 
 2   Pos     18927 non-null  object 
 3   Age     18927 non-null  float64
 4   Tm      18927 non-null  object 
 5   G       18927 non-null  float64
 6   GS      18233 non-null  float64
 7   MP      18927 non-null  float64
 8   PER     18922 non-null  float64
 9   TS%     18851 non-null  float64
 10  3PAr    18839 non-null  float64
 11  FTr     18839 non-null  float64
 12  ORB%    18922 non-null  float64
 13  DRB%    18922 non-null  float64
 14  TRB%    18922 non-null  float64
 15  AST%    18922 non-null  float64
 16  STL%    18922 non-null  float64
 17  BLK%    18922 non-null  float64
 18  TOV%    18866 non-null  float64
 19  USG%    18922 non-null  float64
 20  blanl   0 non-null      float64
 21  OWS     18927 non-null  float64


In [5]:
# Read html file
df_2018, = pd.read_html("https://www.basketball-reference.com/leagues/NBA_2018_totals.html", header=0)
df_2019, = pd.read_html("https://www.basketball-reference.com/leagues/NBA_2019_totals.html", header=0)
df_2020, = pd.read_html("https://www.basketball-reference.com/leagues/NBA_2020_totals.html", header=0)
df_2021, = pd.read_html("https://www.basketball-reference.com/leagues/NBA_2021_totals.html", header=0)

# Convert to csv file
df_2018.to_csv("df_2018.csv", index=False)
df_2019.to_csv("df_2019.csv", index=False)
df_2020.to_csv("df_2020.csv", index=False)
df_2021.to_csv("df_2021.csv", index=False)

# Display first five rows
df_2019.head()

Unnamed: 0,Rk,Player,Pos,Age,Tm,G,GS,MP,FG,FGA,...,FT%,ORB,DRB,TRB,AST,STL,BLK,TOV,PF,PTS
0,1,Álex Abrines,SG,25,OKC,31,2,588,56,157,...,0.923,5,43,48,20,17,6,14,53,165
1,2,Quincy Acy,PF,28,PHO,10,0,123,4,18,...,0.7,3,22,25,8,1,4,4,24,17
2,3,Jaylen Adams,PG,22,ATL,34,1,428,38,110,...,0.778,11,49,60,65,14,5,28,45,108
3,4,Steven Adams,C,25,OKC,80,80,2669,481,809,...,0.5,391,369,760,124,117,76,135,204,1108
4,5,Bam Adebayo,C,21,MIA,82,28,1913,280,486,...,0.735,165,432,597,184,71,65,121,203,729


In [7]:
df_2021.head()

Unnamed: 0,Rk,Player,Pos,Age,Tm,G,GS,MP,FG,FGA,...,FT%,ORB,DRB,TRB,AST,STL,BLK,TOV,PF,PTS
0,1,Precious Achiuwa,PF,21,MIA,49,2,604,104,186,...,0.505,62,112,174,25,16,25,38,76,258
1,2,Jaylen Adams,PG,24,MIL,7,0,18,1,8,...,,0,3,3,2,0,0,0,1,2
2,3,Steven Adams,C,27,NOP,48,48,1341,165,267,...,0.435,188,233,421,92,47,33,72,99,380
3,4,Bam Adebayo,C,23,MIA,46,46,1541,332,590,...,0.819,109,329,438,249,41,54,120,104,878
4,5,LaMarcus Aldridge,C,35,TOT,26,23,674,140,296,...,0.872,19,99,118,49,11,29,27,47,352


## Add column for year and delete the "Rk" column

In [8]:
# Delete unnecessary column
del df_2018['Rk']
del df_2019['Rk']
del df_2020['Rk']
del df_2021['Rk']

# Add column for year, place at index 0
df_2018.insert(0, 'Year', 2018.0)
df_2019.insert(0, 'Year', 2019.0)
df_2020.insert(0, 'Year', 2020.0)
df_2021.insert(0, 'Year', 2021.0)

#Display last five rows
df_2021.tail()

Unnamed: 0,Year,Player,Pos,Age,Tm,G,GS,MP,FG,FGA,...,FT%,ORB,DRB,TRB,AST,STL,BLK,TOV,PF,PTS
678,2021.0,Delon Wright,PG,28,SAC,8,0,157,18,45,...,0.923,7,16,23,16,6,0,5,4,54
679,2021.0,Thaddeus Young,PF,32,CHI,47,14,1178,262,450,...,0.623,116,188,304,205,57,28,94,117,578
680,2021.0,Trae Young,PG,22,ATL,50,50,1714,388,891,...,0.867,32,166,198,475,41,11,214,97,1271
681,2021.0,Cody Zeller,C,28,CHO,32,20,693,122,228,...,0.708,88,146,234,66,23,14,30,74,298
682,2021.0,Ivica Zubac,C,23,LAC,54,15,1176,183,285,...,0.821,145,245,390,58,21,48,57,139,467


In [9]:
# Display info
df_2018.info()

<class 'pandas.core.frame.DataFrame'>
RangeIndex: 690 entries, 0 to 689
Data columns (total 30 columns):
 #   Column  Non-Null Count  Dtype  
---  ------  --------------  -----  
 0   Year    690 non-null    float64
 1   Player  690 non-null    object 
 2   Pos     690 non-null    object 
 3   Age     690 non-null    object 
 4   Tm      690 non-null    object 
 5   G       690 non-null    object 
 6   GS      690 non-null    object 
 7   MP      690 non-null    object 
 8   FG      690 non-null    object 
 9   FGA     690 non-null    object 
 10  FG%     686 non-null    object 
 11  3P      690 non-null    object 
 12  3PA     690 non-null    object 
 13  3P%     625 non-null    object 
 14  2P      690 non-null    object 
 15  2PA     690 non-null    object 
 16  2P%     672 non-null    object 
 17  eFG%    686 non-null    object 
 18  FT      690 non-null    object 
 19  FTA     690 non-null    object 
 20  FT%     632 non-null    object 
 21  ORB     690 non-null    object 
 22  DR

## Concatenating DataFrames
Note that the dataframes have a different number of columns. 

In [10]:
# Select columns
tp_2017 = df_2017[['Year', 'Tm', 'Player', 'Pos', 'G', 'GS', 'MP', 'FG', 'FGA', 'FG%', 'PTS', '2P', '2PA', '2P%', 'eFG%', 'FT', 'FTA', 'FT%', 'ORB', 'DRB', 'TRB', 'AST', 'STL', 'BLK', 'TOV', 'PF', '3P', '3PA', '3P%']]
tp_2018 = df_2018[['Year', 'Tm', 'Player', 'Pos', 'G', 'GS', 'MP', 'FG', 'FGA', 'FG%', 'PTS', '2P', '2PA', '2P%', 'eFG%', 'FT', 'FTA', 'FT%', 'ORB', 'DRB', 'TRB', 'AST', 'STL', 'BLK', 'TOV', 'PF', '3P', '3PA', '3P%']]
tp_2019 = df_2019[['Year', 'Tm', 'Player', 'Pos', 'G', 'GS', 'MP', 'FG', 'FGA', 'FG%', 'PTS', '2P', '2PA', '2P%', 'eFG%', 'FT', 'FTA', 'FT%', 'ORB', 'DRB', 'TRB', 'AST', 'STL', 'BLK', 'TOV', 'PF', '3P', '3PA', '3P%']]
tp_2020 = df_2020[['Year', 'Tm', 'Player', 'Pos', 'G', 'GS', 'MP', 'FG', 'FGA', 'FG%', 'PTS', '2P', '2PA', '2P%', 'eFG%', 'FT', 'FTA', 'FT%', 'ORB', 'DRB', 'TRB', 'AST', 'STL', 'BLK', 'TOV', 'PF', '3P', '3PA', '3P%']]
tp_2021 = df_2021[['Year', 'Tm', 'Player', 'Pos', 'G', 'GS', 'MP', 'FG', 'FGA', 'FG%', 'PTS', '2P', '2PA', '2P%', 'eFG%', 'FT', 'FTA', 'FT%', 'ORB', 'DRB', 'TRB', 'AST', 'STL', 'BLK', 'TOV', 'PF', '3P', '3PA', '3P%']]


In [11]:
# Concatenate dataframes
tp = pd.concat([tp_2017, tp_2018, tp_2019, tp_2020, tp_2021], ignore_index=True)

In [12]:
#View the new df
tp.tail()

Unnamed: 0,Year,Tm,Player,Pos,G,GS,MP,FG,FGA,FG%,...,DRB,TRB,AST,STL,BLK,TOV,PF,3P,3PA,3P%
21706,2021.0,SAC,Delon Wright,PG,8,0,157,18,45,0.4,...,16,23,16,6,0,5,4,6,17,0.353
21707,2021.0,CHI,Thaddeus Young,PF,47,14,1178,262,450,0.582,...,188,304,205,57,28,94,117,6,25,0.24
21708,2021.0,ATL,Trae Young,PG,50,50,1714,388,891,0.435,...,166,198,475,41,11,214,97,117,324,0.361
21709,2021.0,CHO,Cody Zeller,C,32,20,693,122,228,0.535,...,146,234,66,23,14,30,74,3,22,0.136
21710,2021.0,LAC,Ivica Zubac,C,54,15,1176,183,285,0.642,...,245,390,58,21,48,57,139,0,3,0.0


## Adjust the columns

In [13]:
#Display info
tp.info()

<class 'pandas.core.frame.DataFrame'>
RangeIndex: 21711 entries, 0 to 21710
Data columns (total 29 columns):
 #   Column  Non-Null Count  Dtype  
---  ------  --------------  -----  
 0   Year    21711 non-null  float64
 1   Tm      21711 non-null  object 
 2   Player  21711 non-null  object 
 3   Pos     21711 non-null  object 
 4   G       21711 non-null  object 
 5   GS      21017 non-null  object 
 6   MP      21711 non-null  object 
 7   FG      21711 non-null  object 
 8   FGA     21711 non-null  object 
 9   FG%     21606 non-null  object 
 10  PTS     21711 non-null  object 
 11  2P      21711 non-null  object 
 12  2PA     21711 non-null  object 
 13  2P%     21539 non-null  object 
 14  eFG%    21606 non-null  object 
 15  FT      21711 non-null  object 
 16  FTA     21711 non-null  object 
 17  FT%     20790 non-null  object 
 18  ORB     21711 non-null  object 
 19  DRB     21711 non-null  object 
 20  TRB     21711 non-null  object 
 21  AST     21711 non-null  object 
 22

With the exception of 'Year', the data has not been rendered as numbers. They must be converted to floats for mathematical operations.

In [15]:
# Convert numeric columns to decimals
tp.G = pd.to_numeric(tp.G, errors='coerce')
tp.FT = pd.to_numeric(tp.FT, errors='coerce')
tp.FTA = pd.to_numeric(tp.FTA, errors='coerce')
tp.ORB = pd.to_numeric(tp.ORB, errors='coerce')
tp.DRB = pd.to_numeric(tp.DRB, errors='coerce')
tp.TRB = pd.to_numeric(tp.TRB, errors='coerce')
tp.AST = pd.to_numeric(tp.AST, errors='coerce')
tp.STL = pd.to_numeric(tp.STL, errors='coerce')
tp.BLK = pd.to_numeric(tp.BLK, errors='coerce')
tp.TOV = pd.to_numeric(tp.TOV, errors='coerce')
tp.PF = pd.to_numeric(tp.PF, errors='coerce')
tp.FG = pd.to_numeric(tp.FG, errors='coerce')
tp.FGA = pd.to_numeric(tp.FGA, errors='coerce')
tp['FG%'] = pd.to_numeric(tp['FG%'], errors='coerce')
tp['FT%'] = pd.to_numeric(tp['FT%'], errors='coerce')
tp.GS = pd.to_numeric(tp.GS, errors='coerce')
tp.MP = pd.to_numeric(tp.MP, errors='coerce')
tp.PTS = pd.to_numeric(tp.PTS, errors='coerce')
tp['3P'] = pd.to_numeric(tp['3P'], errors='coerce')
tp['3PA'] = pd.to_numeric(tp['3PA'], errors='coerce')
tp['3P%'] = pd.to_numeric(tp['3P%'], errors='coerce')
tp['2P'] = pd.to_numeric(tp['2P'], errors='coerce')
tp['2PA'] = pd.to_numeric(tp['2PA'], errors='coerce')
tp['2P%'] = pd.to_numeric(tp['2P%'], errors='coerce')
tp['eFG%'] = pd.to_numeric(tp['eFG%'], errors='coerce')

# Check columns
tp.info()

<class 'pandas.core.frame.DataFrame'>
RangeIndex: 21711 entries, 0 to 21710
Data columns (total 29 columns):
 #   Column  Non-Null Count  Dtype  
---  ------  --------------  -----  
 0   Year    21711 non-null  float64
 1   Tm      21711 non-null  object 
 2   Player  21711 non-null  object 
 3   Pos     21711 non-null  object 
 4   G       21608 non-null  float64
 5   GS      20914 non-null  float64
 6   MP      21608 non-null  float64
 7   FG      21608 non-null  float64
 8   FGA     21608 non-null  float64
 9   FG%     21503 non-null  float64
 10  PTS     21608 non-null  float64
 11  2P      21608 non-null  float64
 12  2PA     21608 non-null  float64
 13  2P%     21436 non-null  float64
 14  eFG%    21503 non-null  float64
 15  FT      21608 non-null  float64
 16  FTA     21608 non-null  float64
 17  FT%     20687 non-null  float64
 18  ORB     21608 non-null  float64
 19  DRB     21608 non-null  float64
 20  TRB     21608 non-null  float64
 21  AST     21608 non-null  float64
 22

## Points Per Possession
Another piece of Data Wrangling is points per possession. It will be used to compute the expected value of points each time a team has the ball. I obtained the team ratings at https://www.basketball-reference.com/leagues/NBA_stats.html.

In [40]:
# Read html file
df_teams, = pd.read_html("https://www.basketball-reference.com/leagues/NBA_stats_per_game.html", header=0)



In [41]:
df_teams.head()

Unnamed: 0.1,Unnamed: 0,Unnamed: 1,Unnamed: 2,Unnamed: 3,Unnamed: 4,Unnamed: 5,Unnamed: 6,Per Game,Per Game.1,Per Game.2,...,Per Game.15,Shooting,Shooting.1,Shooting.2,Advanced,Advanced.1,Advanced.2,Advanced.3,Advanced.4,Advanced.5
0,Rk,Season,Lg,Age,Ht,Wt,G,MP,FG,FGA,...,PTS,FG%,3P%,FT%,Pace,eFG%,TOV%,ORB%,FT/FGA,ORtg
1,1,2020-21,NBA,26.2,6-6,217,784,241.5,41.1,88.3,...,111.8,.465,.367,.778,99.2,.537,12.5,22.1,.192,112.1
2,2,2019-20,NBA,26.1,6-6,218,1059,241.8,40.9,88.8,...,111.8,.460,.358,.773,100.3,.529,12.8,22.5,.201,110.6
3,3,2018-19,NBA,26.3,6-6,219,1230,241.6,41.1,89.2,...,111.2,.461,.355,.766,100.0,.524,12.4,22.9,.198,110.4
4,4,2017-18,NBA,26.4,6-7,220,1230,241.4,39.6,86.1,...,106.3,.460,.362,.767,97.3,.521,13.0,22.3,.193,108.6


In [42]:
df_teams.drop([0,0])

Unnamed: 0.1,Unnamed: 0,Unnamed: 1,Unnamed: 2,Unnamed: 3,Unnamed: 4,Unnamed: 5,Unnamed: 6,Per Game,Per Game.1,Per Game.2,...,Per Game.15,Shooting,Shooting.1,Shooting.2,Advanced,Advanced.1,Advanced.2,Advanced.3,Advanced.4,Advanced.5
1,1,2020-21,NBA,26.2,6-6,217,784,241.5,41.1,88.3,...,111.8,.465,.367,.778,99.2,.537,12.5,22.1,.192,112.1
2,2,2019-20,NBA,26.1,6-6,218,1059,241.8,40.9,88.8,...,111.8,.460,.358,.773,100.3,.529,12.8,22.5,.201,110.6
3,3,2018-19,NBA,26.3,6-6,219,1230,241.6,41.1,89.2,...,111.2,.461,.355,.766,100.0,.524,12.4,22.9,.198,110.4
4,4,2017-18,NBA,26.4,6-7,220,1230,241.4,39.6,86.1,...,106.3,.460,.362,.767,97.3,.521,13.0,22.3,.193,108.6
5,5,2016-17,NBA,26.6,6-7,221,1230,241.6,39.0,85.4,...,105.6,.457,.358,.772,96.4,.514,12.7,23.3,.209,108.8
...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...
77,71,1950-51,NBA,,,,354,,29.8,83.6,...,84.1,.357,,.733,,.357,,,.293,
78,72,1949-50,NBA,,,,561,,28.2,83.1,...,80.0,.340,,.714,,.340,,,.284,
79,73,1948-49,BAA,,,,360,,29.0,88.7,...,80.0,.327,,.703,,.327,,,.248,
80,74,1947-48,BAA,,,,192,,27.2,96.0,...,72.7,.284,,.675,,.284,,,.190,


In [43]:
# Choose relevant columns
df_PPP = df_teams[['Unnamed: 1','Advanced.5']]

# Rename columns
df_PPP.columns = ['Year', 'PPP']

# Show first five rows
df_PPP.head()

Unnamed: 0,Year,PPP
0,Season,ORtg
1,2020-21,112.1
2,2019-20,110.6
3,2018-19,110.4
4,2017-18,108.6


In [45]:
df_PPP.head()

Unnamed: 0,Year,PPP
0,Season,ORtg
1,2020-21,112.1
2,2019-20,110.6
3,2018-19,110.4
4,2017-18,108.6


In [48]:
df_P = df_PPP.iloc[1:]

In [49]:
df_P

Unnamed: 0,Year,PPP
1,2020-21,112.1
2,2019-20,110.6
3,2018-19,110.4
4,2017-18,108.6
5,2016-17,108.8
...,...,...
77,1950-51,
78,1949-50,
79,1948-49,
80,1947-48,


In [50]:
df_P.info()

<class 'pandas.core.frame.DataFrame'>
RangeIndex: 81 entries, 1 to 81
Data columns (total 2 columns):
 #   Column  Non-Null Count  Dtype 
---  ------  --------------  ----- 
 0   Year    78 non-null     object
 1   PPP     54 non-null     object
dtypes: object(2)
memory usage: 1.4+ KB


In [51]:
# Convert 'Year' to year listed before hyphen
df_P['Year'] = df_P.loc[:,'Year'].str.split('-').str[0]

# Convert columns to numbers
df_P['Year'] = pd.to_numeric(df_P['Year'], errors='coerce')
df_P['PPP'] = pd.to_numeric(df_P['PPP'], errors='coerce')

# Drop NaN values
df_P = df_P.dropna()

# Add 1 to each year, since NBA seasons are maked by the second, not first year
df_P['Year'] = df_P['Year'] + 1

# Offensive rating is defined by points per 100 possession
# Divide by 100 to convert to points per possession
df_P['PPP'] = df_P['PPP']/100

# Only choose years with 3-pointers
df_P = df_P[df_P.Year>=1980]

# View DataFrame
df_P

Unnamed: 0,Year,PPP
1,2021.0,1.121
2,2020.0,1.106
3,2019.0,1.104
4,2018.0,1.086
5,2017.0,1.088
6,2016.0,1.064
7,2015.0,1.056
8,2014.0,1.066
9,2013.0,1.058
10,2012.0,1.046


In [52]:
# Convert years to ints
tp['Year'] = tp['Year'].astype(int)

## Duplicate Entries¶
What if a player gets traded? Basketball Reference lists both their separate team statistics and their individual total statistics per year. This makes sense. When looking at team statistics, the stats only matter for that team, but when looking at individual statistics, only the total stats should count. We need two separate dataframes to handle the two separate cases.

In [53]:
# Copy the dataframe
tp_team = tp.copy()

# Drop rows with TOT as 'Tm'
tp_team = tp_team.drop(tp_team[tp_team['Tm']=='TOT'].index)

# Show last rows
tp_team.tail()

Unnamed: 0,Year,Tm,Player,Pos,G,GS,MP,FG,FGA,FG%,...,DRB,TRB,AST,STL,BLK,TOV,PF,3P,3PA,3P%
21706,2021,SAC,Delon Wright,PG,8.0,0.0,157.0,18.0,45.0,0.4,...,16.0,23.0,16.0,6.0,0.0,5.0,4.0,6.0,17.0,0.353
21707,2021,CHI,Thaddeus Young,PF,47.0,14.0,1178.0,262.0,450.0,0.582,...,188.0,304.0,205.0,57.0,28.0,94.0,117.0,6.0,25.0,0.24
21708,2021,ATL,Trae Young,PG,50.0,50.0,1714.0,388.0,891.0,0.435,...,166.0,198.0,475.0,41.0,11.0,214.0,97.0,117.0,324.0,0.361
21709,2021,CHO,Cody Zeller,C,32.0,20.0,693.0,122.0,228.0,0.535,...,146.0,234.0,66.0,23.0,14.0,30.0,74.0,3.0,22.0,0.136
21710,2021,LAC,Ivica Zubac,C,54.0,15.0,1176.0,183.0,285.0,0.642,...,245.0,390.0,58.0,21.0,48.0,57.0,139.0,0.0,3.0,0.0


In [54]:
# Copy original dataframe for individuals
tp_ind = tp.copy()

# Drop all rows that list players more than once per year
tp_no_duplicates = tp_ind.drop_duplicates(['Year','Player'], keep=False)

# Create dataframe that only includes 'TOT' as team
tp_Tot = tp_ind[tp_ind['Tm']=='TOT']

# Combine dataframe with no duplicates with dataframe that has TOT as team
tp_ind = pd.concat([tp_no_duplicates, tp_Tot])

# Sort index
tp_ind = tp_ind.sort_index()

# Show last 5 entries
tp_ind.tail()

Unnamed: 0,Year,Tm,Player,Pos,G,GS,MP,FG,FGA,FG%,...,DRB,TRB,AST,STL,BLK,TOV,PF,3P,3PA,3P%
21704,2021,TOT,Delon Wright,SG-PG,44.0,31.0,1209.0,154.0,338.0,0.456,...,143.0,187.0,197.0,64.0,19.0,53.0,49.0,37.0,106.0,0.349
21707,2021,CHI,Thaddeus Young,PF,47.0,14.0,1178.0,262.0,450.0,0.582,...,188.0,304.0,205.0,57.0,28.0,94.0,117.0,6.0,25.0,0.24
21708,2021,ATL,Trae Young,PG,50.0,50.0,1714.0,388.0,891.0,0.435,...,166.0,198.0,475.0,41.0,11.0,214.0,97.0,117.0,324.0,0.361
21709,2021,CHO,Cody Zeller,C,32.0,20.0,693.0,122.0,228.0,0.535,...,146.0,234.0,66.0,23.0,14.0,30.0,74.0,3.0,22.0,0.136
21710,2021,LAC,Ivica Zubac,C,54.0,15.0,1176.0,183.0,285.0,0.642,...,245.0,390.0,58.0,21.0,48.0,57.0,139.0,0.0,3.0,0.0


In [55]:
tp_ind.to_csv('tp_ind.csv', index = False)

In [63]:
tp_ind.sort_values(['3P'])



Unnamed: 0,Year,Tm,Player,Pos,G,GS,MP,FG,FGA,FG%,...,DRB,TRB,AST,STL,BLK,TOV,PF,3P,3PA,3P%
0,1980,LAL,Kareem Abdul-Jabbar*,C,82.0,,3143.0,835.0,1383.0,0.604,...,696.0,886.0,371.0,81.0,280.0,297.0,216.0,0.0,1.0,0.000
9260,2001,TOT,Calvin Booth,C,55.0,29.0,933.0,120.0,252.0,0.476,...,169.0,246.0,42.0,29.0,111.0,53.0,146.0,0.0,0.0,
9259,2001,CLE,Etdrick Bohannon,PF,6.0,0.0,19.0,2.0,4.0,0.500,...,4.0,7.0,0.0,0.0,2.0,1.0,4.0,0.0,0.0,
9258,2001,TOR,Muggsy Bogues,PG,3.0,0.0,34.0,0.0,2.0,0.000,...,3.0,3.0,5.0,2.0,0.0,4.0,3.0,0.0,1.0,0.000
9257,2001,BOS,Mark Blount,C,64.0,50.0,1098.0,101.0,200.0,0.505,...,134.0,231.0,32.0,39.0,76.0,62.0,183.0,0.0,0.0,
...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...
20617,2020,HOU,James Harden,SG,68.0,68.0,2483.0,672.0,1514.0,0.444,...,376.0,446.0,512.0,125.0,60.0,308.0,227.0,299.0,843.0,0.355
18452,2017,GSW,Stephen Curry,PG,79.0,79.0,2638.0,675.0,1443.0,0.468,...,292.0,353.0,523.0,143.0,17.0,239.0,183.0,324.0,789.0,0.411
19782,2019,GSW,Stephen Curry,PG,69.0,69.0,2331.0,632.0,1340.0,0.472,...,324.0,369.0,361.0,92.0,25.0,192.0,166.0,354.0,810.0,0.437
19894,2019,HOU,James Harden,PG,78.0,78.0,2867.0,843.0,1909.0,0.442,...,452.0,518.0,586.0,158.0,58.0,387.0,244.0,378.0,1028.0,0.368


## Minimum Requirements

It's not necessary to examine data from all players. If a player was never recorded as taking a 3-pointer, he can be excluded. Players who only took a few 3's may also be excluded so as not to skew the data. My minimum requirements for 3NG are less stringent than other "qualified" statistics. See https://stats.nba.com/help/statminimums/.

In [65]:
# Define functions that establishes minimum requirements for qualified individuals
def min_requirements(data, threes, mins, games):

    # Select players who have made more than a certain number of 3s
    data = data[(data['3P'] > threes)]

    # Select players with a certain number of minutes
    data = data[(data['MP'] > mins)]

    # Select players who have played a certain number of games
    data = data[(data['G'] > games)]

    # return dataframe
    return data

In [66]:
# Create dataframe for qualified individuals
tp_ind_qual = min_requirements(tp_ind, 10, 220, 30)

# Create team dataframe with qualified individuals
tp_team_qual = min_requirements(tp_team, 10, 220, 30)

In [69]:
del tp_team_qual ['GS']
del tp_ind_qual ['GS']
del tp_team_qual ['FT%']
del tp_ind_qual ['FT%']

In [74]:
# Check info for qualified individuals
tp_ind_qual.info()

<class 'pandas.core.frame.DataFrame'>
Int64Index: 6878 entries, 7 to 21708
Data columns (total 27 columns):
 #   Column  Non-Null Count  Dtype  
---  ------  --------------  -----  
 0   Year    6878 non-null   int32  
 1   Tm      6878 non-null   object 
 2   Player  6878 non-null   object 
 3   Pos     6878 non-null   object 
 4   G       6878 non-null   float64
 5   MP      6878 non-null   float64
 6   FG      6878 non-null   float64
 7   FGA     6878 non-null   float64
 8   FG%     6878 non-null   float64
 9   PTS     6878 non-null   float64
 10  2P      6878 non-null   float64
 11  2PA     6878 non-null   float64
 12  2P%     6878 non-null   float64
 13  eFG%    6878 non-null   float64
 14  FT      6878 non-null   float64
 15  FTA     6878 non-null   float64
 16  ORB     6878 non-null   float64
 17  DRB     6878 non-null   float64
 18  TRB     6878 non-null   float64
 19  AST     6878 non-null   float64
 20  STL     6878 non-null   float64
 21  BLK     6878 non-null   float64
 22 

In [75]:
# Check info for qualified teams
tp_team_qual.info()

<class 'pandas.core.frame.DataFrame'>
Int64Index: 6758 entries, 7 to 21708
Data columns (total 27 columns):
 #   Column  Non-Null Count  Dtype  
---  ------  --------------  -----  
 0   Year    6758 non-null   int32  
 1   Tm      6758 non-null   object 
 2   Player  6758 non-null   object 
 3   Pos     6758 non-null   object 
 4   G       6758 non-null   float64
 5   MP      6758 non-null   float64
 6   FG      6758 non-null   float64
 7   FGA     6758 non-null   float64
 8   FG%     6758 non-null   float64
 9   PTS     6758 non-null   float64
 10  2P      6758 non-null   float64
 11  2PA     6758 non-null   float64
 12  2P%     6758 non-null   float64
 13  eFG%    6758 non-null   float64
 14  FT      6758 non-null   float64
 15  FTA     6758 non-null   float64
 16  ORB     6758 non-null   float64
 17  DRB     6758 non-null   float64
 18  TRB     6758 non-null   float64
 19  AST     6758 non-null   float64
 20  STL     6758 non-null   float64
 21  BLK     6758 non-null   float64
 22 

## New NBA Stats 

## Expected Minutes Before 3's¶


This first group of statistics computes the number of minutes players are on the court before attemping and making 3's.



## AM3A : Average Minutes per 3-point Attempt¶


A player's Average Minutes per 3-Point Attempt is total minutes played divided by total 3-pointers attempted.



In [76]:
# Define AM3A, Average Minutes per 3-point Attempt 
tp_ind_qual['AM3A'] = tp_ind_qual['MP'] / tp_ind_qual['3PA']

# Round to 2 decimal places
tp_ind_qual['AM3A']=round(tp_ind_qual['AM3A'], 2)

## EM3A : Expected Minutes before 3-point Attempt¶


The expected value of a continuous interval of time is at the halfway mark. If a player's Average Minutes per 3-point Attempt is 10.0, he will most likely shoot a 3 at the halfway mark, after five minutes.

In [77]:
# Define EM3A, Expected Minutes before 3-point Attempt
tp_ind_qual['EM3A'] = tp_ind_qual['AM3A'] / 2

# Round to 2 decimal places
tp_ind_qual['EM3A']=round(tp_ind_qual['EM3A'], 2)

# Sort DataFrame by new category
tp_EM3A = tp_ind_qual.sort_values('EM3A', ascending=True)

# Reset index
tp_EM3A = tp_EM3A.reset_index(drop=True)

# Start index at 1 instead of 0
tp_EM3A.index = tp_EM3A.index + 1

## EM3A Top Twenty¶


In [78]:
# View players who attempt 3s faster than anyone in NBA history
tp_EM3A.head(20)

Unnamed: 0,Year,Tm,Player,Pos,G,MP,FG,FGA,FG%,PTS,...,AST,STL,BLK,TOV,PF,3P,3PA,3P%,AM3A,EM3A
1,2020,HOU,Chris Clemons,SG,33.0,291.0,57.0,142.0,0.401,161.0,...,27.0,9.0,6.0,19.0,26.0,37.0,107.0,0.346,2.72,1.36
2,2019,HOU,James Harden,PG,78.0,2867.0,843.0,1909.0,0.442,2818.0,...,586.0,158.0,58.0,387.0,244.0,378.0,1028.0,0.368,2.79,1.4
3,2019,GSW,Stephen Curry,PG,69.0,2331.0,632.0,1340.0,0.472,1881.0,...,361.0,92.0,25.0,192.0,166.0,354.0,810.0,0.437,2.88,1.44
4,2018,ORL,Marreese Speights,C,52.0,675.0,138.0,349.0,0.395,402.0,...,40.0,8.0,22.0,40.0,106.0,86.0,233.0,0.369,2.9,1.45
5,2021,UTA,Jordan Clarkson,SG,51.0,1296.0,318.0,754.0,0.422,875.0,...,110.0,44.0,8.0,87.0,76.0,154.0,441.0,0.349,2.94,1.47
6,2018,TOR,C.J. Miles,SF,70.0,1337.0,227.0,599.0,0.379,699.0,...,55.0,37.0,21.0,39.0,134.0,164.0,454.0,0.361,2.94,1.47
7,2021,GSW,Stephen Curry,PG,45.0,1527.0,445.0,922.0,0.483,1346.0,...,268.0,59.0,4.0,143.0,79.0,214.0,520.0,0.412,2.94,1.47
8,2020,HOU,James Harden,SG,68.0,2483.0,672.0,1514.0,0.444,2335.0,...,512.0,125.0,60.0,308.0,227.0,299.0,843.0,0.355,2.95,1.48
9,2016,GSW,Stephen Curry,PG,79.0,2700.0,805.0,1598.0,0.504,2375.0,...,527.0,169.0,15.0,262.0,161.0,402.0,886.0,0.454,3.05,1.52
10,2015,DAL,Charlie Villanueva,PF,64.0,678.0,150.0,362.0,0.414,403.0,...,19.0,15.0,22.0,28.0,64.0,83.0,221.0,0.376,3.07,1.54


## AM3 : Average Minutes per 3-Pointer¶

AM3 computes the average minutes played per 3-pointer Made.




In [79]:
# Define AM3, Average Minutes per 3-pointer Made
tp_ind_qual['AM3'] = tp_ind_qual['MP']/tp_ind_qual['3P']

# Round to 2 decimal places
tp_ind_qual['AM3']=round(tp_ind_qual['AM3'], 2)

## EM3 : Expected Minutes Before a 3¶

It's how long a player is expected to be on the court before making a 3. EM3 is the Average Minutes per 3-pointer Made divided by two.

In [80]:
# Define EM3, Expected Minutes before 3-pointer
tp_ind_qual['EM3'] = tp_ind_qual['AM3'] / 2

# Round to 2 decimal places
tp_ind_qual['EM3']=round(tp_ind_qual['EM3'], 2)

# Sort DataFrame by new category
tp_EM3 = tp_ind_qual.sort_values('EM3', ascending=True)

# Reset index
tp_EM3 = tp_EM3.reset_index(drop=True)

# Start index at 1 instead of 0
tp_EM3.index = tp_EM3.index + 1

## EM3 Top Twenty¶


In [81]:
# Display top twenty seasons of all-time
tp_EM3.head(20)

Unnamed: 0,Year,Tm,Player,Pos,G,MP,FG,FGA,FG%,PTS,...,BLK,TOV,PF,3P,3PA,3P%,AM3A,EM3A,AM3,EM3
1,2019,GSW,Stephen Curry,PG,69.0,2331.0,632.0,1340.0,0.472,1881.0,...,25.0,192.0,166.0,354.0,810.0,0.437,2.88,1.44,6.58,3.29
2,2016,GSW,Stephen Curry,PG,79.0,2700.0,805.0,1598.0,0.504,2375.0,...,15.0,262.0,161.0,402.0,886.0,0.454,3.05,1.52,6.72,3.36
3,2021,GSW,Stephen Curry,PG,45.0,1527.0,445.0,922.0,0.483,1346.0,...,4.0,143.0,79.0,214.0,520.0,0.412,2.94,1.47,7.14,3.57
4,2019,HOU,James Harden,PG,78.0,2867.0,843.0,1909.0,0.442,2818.0,...,58.0,387.0,244.0,378.0,1028.0,0.368,2.79,1.4,7.58,3.79
5,2018,GSW,Stephen Curry,PG,51.0,1631.0,428.0,864.0,0.495,1346.0,...,8.0,153.0,114.0,212.0,501.0,0.423,3.26,1.63,7.69,3.84
6,2012,NYK,Steve Novak,PF,54.0,1020.0,161.0,337.0,0.478,477.0,...,9.0,21.0,59.0,133.0,282.0,0.472,3.62,1.81,7.67,3.84
7,2008,HOU,Steve Novak,SF,35.0,264.0,49.0,102.0,0.48,135.0,...,3.0,4.0,17.0,34.0,71.0,0.479,3.72,1.86,7.76,3.88
8,2018,ORL,Marreese Speights,C,52.0,675.0,138.0,349.0,0.395,402.0,...,22.0,40.0,106.0,86.0,233.0,0.369,2.9,1.45,7.85,3.92
9,2020,HOU,Chris Clemons,SG,33.0,291.0,57.0,142.0,0.401,161.0,...,6.0,19.0,26.0,37.0,107.0,0.346,2.72,1.36,7.86,3.93
10,2020,WAS,Dāvis Bertāns,PF,54.0,1583.0,265.0,610.0,0.434,834.0,...,33.0,59.0,139.0,200.0,472.0,0.424,3.35,1.68,7.92,3.96


EM3 Statistical Notes:

EM3 measures how quickly shooters make 3-pointers upon taking the court.

I prefer EM3A and EM3 to AM3A and AM3. They are shorter, more informative, and have a better ring. Since AM3A and AM3 are just doubles of EM3A and EM3, they can be eliminated without losing any valuable information.

In [82]:
# Delete extraneous columns
del tp_ind_qual['AM3A'] 
del tp_ind_qual['AM3']

I have reindexed twice, and expect to do so again. It's always better to write a function instead of copying and pasting.



In [83]:
# Define reindex function that starts at 1
def reindex_start_1(data):
    
    # Reset index
    data = data.reset_index(drop=True)

    # Start index at 1 instead of 0
    data.index = data.index + 1
    
    # Return new index
    return data.index

## 3NG

The 3-point statistics above are compelling, but they do not a provide a single statistic to rank all 3-point shooters. This is where 3NG, or 3-point Net Gain comes in. 3NG adds what the team gains beyond the expected value, and subtracts what the team loses beyond the expected value, for each 3-pointer attempted.



# Points Per Possession¶
3NG depends on the expected value. Should the expected value be points per possession? Or points per field goal attempt? I have chosen points per possession since each time a team has the ball, this is what they are expected to earn. I will use mean points per possession throughout NBA history. The statistic was first computed in 1974.

In [84]:
# Define ev, expected value, as points per possession
ev = df_P['PPP'].mean()

# Display ev
print('Avg. Points Per Possession:', ev)

Avg. Points Per Possession: 1.067761904761905


## 3NG Formula¶


When a player makes a 3-pointer, the team gains an extra 3 points minus the expected value. When a player misses a 3-pointer, the team loses the expected value.

In [85]:
# Function that returns a dataframe orded by a new column, 3NG
def threeNG(data, threesMade, threesAttempted, totalGames, expectedValue, teams=False):

    # Compute 3PG, 3-pointers per Game
    data['3PG']=threesMade/totalGames

    # Round to 2 decimal places
    data['3PG']=round(data['3PG'], 2)

    # Compute 3PAG, 3-point Attempts per Game
    data['3PAG']= threesAttempted/totalGames

    # Round to 2 decimal places
    data['3PAG']=round(data['3PAG'], 2)

    # Compute 3-point Misses per Game
    tp_misses = data['3PAG'] - data['3PG']
    
    # Shorten notation for expectedValue
    ev = expectedValue
                          
    # Compute 3NG, 3-point Advantage
    data['3NG']=data['3PG'] * (3 - ev) - tp_misses * ev

    # (3 - ev) is what the team gains per 3-pointer made
    # -ev is what the team loses per 3-pointer missed
    
    # Round to 2 decimal places
    data['3NG']=round(data['3NG'], 2)
    
    # Sort dataframe by 3NG
    data = data.sort_values('3NG', ascending=False)

    # Reset index for individuals only (not teams)
    if teams != True:
        data.index = reindex_start_1(data)
    
    return data

## Apply 3NG to Individuals¶


In [86]:
# Define expected value as the mean points per possession during 3-point era
expected_value = df_P['PPP'].mean()

# Apply 3NG to qualified individual stats
tp_ind_qual = threeNG(tp_ind_qual, tp_ind_qual['3P'], tp_ind_qual['3PA'], tp_ind_qual['G'], expected_value)

# Apply 3NG to individual stats
tp_ind = threeNG(tp_ind, tp_ind['3P'], tp_ind['3PA'], tp_ind['G'], expected_value)

## Apply 3NG to Teams¶


In [87]:
# Group by Team and Year, and sum colums
tp_teams = tp_team.groupby(['Tm','Year']).sum()

# Give correct 3-point percentage, not sum
tp_teams['3P%'] = tp_teams['3P']/tp_teams['3PA']

# Apply 3NG to team statistics
tp_teams = threeNG(tp_teams, tp_teams['3P'], tp_teams['3PA'], 82, expected_value, teams=True)

## 3NG Rankings¶


The Top 20¶


In [88]:
# Display top 20 3-point shooting seasons of all-time
tp_ind_qual.head(20)

Unnamed: 0,Year,Tm,Player,Pos,G,MP,FG,FGA,FG%,PTS,...,TOV,PF,3P,3PA,3P%,EM3A,EM3,3PG,3PAG,3NG
1,2016,GSW,Stephen Curry,PG,79.0,2700.0,805.0,1598.0,0.504,2375.0,...,262.0,161.0,402.0,886.0,0.454,1.52,3.36,5.09,11.22,3.29
2,2019,GSW,Stephen Curry,PG,69.0,2331.0,632.0,1340.0,0.472,1881.0,...,192.0,166.0,354.0,810.0,0.437,1.44,3.29,5.13,11.74,2.85
3,2015,ATL,Kyle Korver,SG,75.0,2418.0,292.0,600.0,0.487,911.0,...,107.0,140.0,221.0,449.0,0.492,2.7,5.47,2.95,5.99,2.45
4,2021,BRK,Joe Harris,SF,53.0,1633.0,273.0,537.0,0.508,746.0,...,40.0,111.0,166.0,347.0,0.478,2.36,4.92,3.13,6.55,2.4
5,2013,GSW,Stephen Curry,PG,78.0,2983.0,626.0,1388.0,0.451,1786.0,...,240.0,198.0,272.0,600.0,0.453,2.48,5.48,3.49,7.69,2.26
6,2020,MIA,Duncan Robinson,SG,73.0,2166.0,323.0,687.0,0.47,983.0,...,70.0,193.0,270.0,606.0,0.446,1.78,4.01,3.7,8.3,2.24
7,2021,UTA,Joe Ingles,SF,49.0,1310.0,197.0,374.0,0.527,580.0,...,75.0,88.0,132.0,268.0,0.493,2.44,4.96,2.69,5.47,2.23
8,2015,GSW,Stephen Curry,PG,80.0,2613.0,653.0,1341.0,0.487,1900.0,...,249.0,158.0,286.0,646.0,0.443,2.02,4.57,3.58,8.07,2.12
9,2016,LAC,J.J. Redick,SG,75.0,2097.0,422.0,880.0,0.48,1226.0,...,78.0,135.0,200.0,421.0,0.475,2.49,5.24,2.67,5.61,2.02
10,2018,GSW,Stephen Curry,PG,51.0,1631.0,428.0,864.0,0.495,1346.0,...,153.0,114.0,212.0,501.0,0.423,1.63,3.84,4.16,9.82,1.99


3NG Statistical Notes:

Steph Curry's legendary MVP season is heads and shoulders above the rest, and he dominates the list as a player.
3NG does a nice job of comparing 3-point shooters over the years.
3NG has real meaning. It conveys the actual points a team gains beyond the average by the player shooting 3-pointers.

## Weighted¶
It's telling to use the same measure, mean points per possession, across all years. But is it justifiable? Teams score more points per possession these days, so it could be argued that 3-pointers were more valuable in years past. The expected value can be weighted, by taking the mean points per possession for each given year.

In [89]:
# Function to compute weighted 3NG from dataframe that already contains 3NG
def threeNG_weighted(data, teams=False):

    # Merge df_PPP, dataframe with 'Year' and 'PPP', with the current dataframe
    data = data.merge(df_P)

    # Compute 3-point Misses per Game
    tp_misses =data['3PAG'] - data['3PG']
                          
    # Compute 3NG using weighted expected value
    data['3NG/w'] = data['3PG'] * (3 - data['PPP']) - tp_misses * data['PPP']

    # Round to 2 decimal places
    data['3NG/w']=round(data['3NG/w'], 2)
    
    # Sort dataframe by 3NG/w
    data = data.sort_values('3NG/w', ascending=False)
    
    # Reset index for individuals only (not teams)
    if teams != True:
        data.index = reindex_start_1(data)
    
    # Keep dataframe tight by eliminating unnecessary columns
    data.drop(['MP','PPP'], axis=1, inplace=True)
    
    # Return dataframe with 3NG/w
    return data

## The Top 20, Weighted¶


In [90]:
# Created dataframe that includes weighted 3NG
tp_ind_qual_w = threeNG_weighted(tp_ind_qual)

# Show top twenty weighted 3NG
tp_ind_qual_w.head(20)

Unnamed: 0,Year,Tm,Player,Pos,G,FG,FGA,FG%,PTS,2P,...,PF,3P,3PA,3P%,EM3A,EM3,3PG,3PAG,3NG,3NG/w
1,2016,GSW,Stephen Curry,PG,79.0,805.0,1598.0,0.504,2375.0,403.0,...,161.0,402.0,886.0,0.454,1.52,3.36,5.09,11.22,3.29,3.33
2,2015,ATL,Kyle Korver,SG,75.0,292.0,600.0,0.487,911.0,71.0,...,140.0,221.0,449.0,0.492,2.7,5.47,2.95,5.99,2.45,2.52
3,2019,GSW,Stephen Curry,PG,69.0,632.0,1340.0,0.472,1881.0,278.0,...,166.0,354.0,810.0,0.437,1.44,3.29,5.13,11.74,2.85,2.43
4,2013,GSW,Stephen Curry,PG,78.0,626.0,1388.0,0.451,1786.0,354.0,...,198.0,272.0,600.0,0.453,2.48,5.48,3.49,7.69,2.26,2.33
5,2015,GSW,Stephen Curry,PG,80.0,653.0,1341.0,0.487,1900.0,367.0,...,158.0,286.0,646.0,0.443,2.02,4.57,3.58,8.07,2.12,2.22
6,2021,BRK,Joe Harris,SF,53.0,273.0,537.0,0.508,746.0,107.0,...,111.0,166.0,347.0,0.478,2.36,4.92,3.13,6.55,2.4,2.05
7,2016,LAC,J.J. Redick,SG,75.0,422.0,880.0,0.48,1226.0,222.0,...,135.0,200.0,421.0,0.475,2.49,5.24,2.67,5.61,2.02,2.04
8,2002,MIL,Ray Allen,SG,69.0,530.0,1148.0,0.462,1503.0,301.0,...,157.0,229.0,528.0,0.434,2.39,5.52,3.32,7.65,1.79,1.97
9,2014,ATL,Kyle Korver,SG,71.0,289.0,609.0,0.475,850.0,104.0,...,147.0,185.0,392.0,0.472,3.07,6.51,2.61,5.52,1.94,1.95
10,2021,UTA,Joe Ingles,SF,49.0,197.0,374.0,0.527,580.0,65.0,...,88.0,132.0,268.0,0.493,2.44,4.96,2.69,5.47,2.23,1.94


The values are very close. Some players from earlier eras, like Ray Allen, move up the list, but others, like Glen Rice, actually move down. It depends on how many points per possession the league averaged that year.

## League Leaders by Season

We can check the league leaders for any given year. Note that for a particular year, weighted and unweighted will provide the same order.



In [91]:
# Create 2021 dataframe
tp_2021 = tp_ind_qual[tp_ind_qual['Year']==2021.0]

# Reset index
tp_2021.index = reindex_start_1(tp_2021)

# Show top 10 3NG
tp_2021.head(10)

Unnamed: 0,Year,Tm,Player,Pos,G,MP,FG,FGA,FG%,PTS,...,TOV,PF,3P,3PA,3P%,EM3A,EM3,3PG,3PAG,3NG
1,2021,BRK,Joe Harris,SF,53.0,1633.0,273.0,537.0,0.508,746.0,...,40.0,111.0,166.0,347.0,0.478,2.36,4.92,3.13,6.55,2.4
2,2021,UTA,Joe Ingles,SF,49.0,1310.0,197.0,374.0,0.527,580.0,...,75.0,88.0,132.0,268.0,0.493,2.44,4.96,2.69,5.47,2.23
3,2021,GSW,Stephen Curry,PG,45.0,1527.0,445.0,922.0,0.483,1346.0,...,143.0,79.0,214.0,520.0,0.412,1.47,3.57,4.76,11.56,1.94
4,2021,LAC,Paul George,SF,40.0,1336.0,321.0,673.0,0.477,911.0,...,125.0,95.0,132.0,299.0,0.441,2.24,5.06,3.3,7.48,1.91
5,2021,LAC,Marcus Morris,PF,42.0,1074.0,187.0,416.0,0.45,521.0,...,37.0,86.0,100.0,218.0,0.459,2.46,5.37,2.38,5.19,1.6
6,2021,ATL,Tony Snell,SG,40.0,839.0,76.0,147.0,0.517,219.0,...,18.0,62.0,56.0,98.0,0.571,4.28,7.49,1.4,2.45,1.58
7,2021,CHI,Zach LaVine,SG,50.0,1756.0,492.0,966.0,0.509,1377.0,...,184.0,116.0,170.0,405.0,0.42,2.17,5.16,3.4,8.1,1.55
8,2021,DEN,Michael Porter Jr.,SF,42.0,1271.0,274.0,507.0,0.54,717.0,...,47.0,89.0,102.0,228.0,0.447,2.78,6.23,2.43,5.43,1.49
9,2021,TOT,Norman Powell,SG-SF,50.0,1533.0,322.0,661.0,0.487,953.0,...,92.0,118.0,134.0,311.0,0.431,2.46,5.72,2.68,6.22,1.4
10,2021,MIL,Bryn Forbes,SG,51.0,969.0,170.0,376.0,0.452,478.0,...,29.0,55.0,109.0,242.0,0.45,2.0,4.44,2.14,4.75,1.35


In [95]:
# Create 2002 dataframe
tp_2002 = tp_ind_qual[tp_ind_qual['Year']==2002.0]

# Reset index
tp_2002.index = reindex_start_1(tp_2002)

# Show top 10 3NG
tp_2002.head(10)

Unnamed: 0,Year,Tm,Player,Pos,G,MP,FG,FGA,FG%,PTS,...,TOV,PF,3P,3PA,3P%,EM3A,EM3,3PG,3PAG,3NG
1,2002,MIL,Ray Allen,SG,69.0,2525.0,530.0,1148.0,0.462,1503.0,...,159.0,157.0,229.0,528.0,0.434,2.39,5.52,3.32,7.65,1.79
2,2002,DAL,Steve Nash,PG,82.0,2837.0,525.0,1088.0,0.483,1466.0,...,229.0,164.0,156.0,343.0,0.455,4.14,9.1,1.9,4.18,1.24
3,2002,SAS,Steve Smith,SG,77.0,2211.0,310.0,682.0,0.455,895.0,...,108.0,158.0,116.0,246.0,0.472,4.5,9.53,1.51,3.19,1.12
4,2002,LAC,Eric Piatkowski,SG,71.0,1718.0,207.0,471.0,0.439,626.0,...,64.0,110.0,111.0,238.0,0.466,3.61,7.74,1.56,3.35,1.1
5,2002,DET,Jon Barry,SG,82.0,1985.0,255.0,522.0,0.489,739.0,...,111.0,134.0,121.0,258.0,0.469,3.84,8.2,1.48,3.15,1.08
6,2002,CLE,Wesley Person,SG,78.0,2793.0,467.0,944.0,0.495,1176.0,...,74.0,93.0,143.0,322.0,0.444,4.34,9.76,1.83,4.13,1.08
7,2002,ORL,Pat Garrity,PF,80.0,2406.0,327.0,767.0,0.426,884.0,...,68.0,230.0,169.0,396.0,0.427,3.04,7.12,2.11,4.95,1.04
8,2002,SEA,Brent Barry,SG,81.0,3040.0,401.0,790.0,0.508,1164.0,...,165.0,182.0,164.0,387.0,0.424,3.93,9.27,2.02,4.78,0.96
9,2002,BOS,Paul Pierce,SG,82.0,3302.0,707.0,1598.0,0.442,2144.0,...,241.0,237.0,210.0,520.0,0.404,3.18,7.86,2.56,6.34,0.91
10,2002,IND,Reggie Miller*,SG,79.0,2889.0,414.0,913.0,0.453,1304.0,...,120.0,143.0,180.0,443.0,0.406,3.26,8.02,2.28,5.61,0.85


# Best 3-Point Shooting Teams of All-Time

In [96]:
# Convert teams dataframe to top 20 showing 3NG only
pd.DataFrame(tp_teams['3NG'].head(20))

Unnamed: 0_level_0,Unnamed: 1_level_0,3NG
Tm,Year,Unnamed: 2_level_1
GSW,2016,5.64
LAC,2021,4.29
PHO,2010,3.65
CHH,1997,3.64
GSW,2015,3.44
PHO,2006,3.33
PHO,2007,3.13
GSW,2019,3.01
GSW,2018,3.01
CLE,2017,2.84


In [97]:
# Create 2016, 2010 dataframe
tp_2016 = tp_ind_qual[tp_ind_qual['Year']==2016.0]
tp_2010 = tp_ind_qual[tp_ind_qual['Year']==2010.0]

# Reset index
tp_2016.index = reindex_start_1(tp_2016)
tp_2010.index = reindex_start_1(tp_2010)



## 2016 GSW vs. 2021 LAC vs 2010 SUNS

In [101]:
# Create dataframe for GSW with qualified individuals only
tp_GSW_2016 = tp_2016[(tp_2016['Tm']=='GSW')]

# Display DataFrame with 2016 rankings 
tp_GSW_2016


Unnamed: 0,Year,Tm,Player,Pos,G,MP,FG,FGA,FG%,PTS,...,TOV,PF,3P,3PA,3P%,EM3A,EM3,3PG,3PAG,3NG
1,2016,GSW,Stephen Curry,PG,79.0,2700.0,805.0,1598.0,0.504,2375.0,...,262.0,161.0,402.0,886.0,0.454,1.52,3.36,5.09,11.22,3.29
3,2016,GSW,Klay Thompson,SG,80.0,2666.0,651.0,1386.0,0.47,1771.0,...,138.0,152.0,276.0,650.0,0.425,2.05,4.83,3.45,8.12,1.68
42,2016,GSW,Brandon Rush,SG,72.0,1055.0,111.0,260.0,0.427,305.0,...,33.0,57.0,65.0,157.0,0.414,3.36,8.12,0.9,2.18,0.37
55,2016,GSW,Draymond Green,PF,81.0,2808.0,401.0,819.0,0.49,1131.0,...,259.0,240.0,100.0,258.0,0.388,5.44,14.04,1.23,3.19,0.28
57,2016,GSW,Harrison Barnes,SF,66.0,2042.0,295.0,633.0,0.466,774.0,...,57.0,136.0,82.0,214.0,0.383,4.77,12.45,1.24,3.24,0.26
101,2016,GSW,Marreese Speights,C,72.0,832.0,197.0,456.0,0.432,512.0,...,66.0,117.0,24.0,62.0,0.387,6.71,17.34,0.33,0.86,0.07
125,2016,GSW,Ian Clark,SG,66.0,578.0,89.0,202.0,0.441,236.0,...,43.0,58.0,30.0,84.0,0.357,3.44,9.64,0.45,1.27,-0.01
128,2016,GSW,Leandro Barbosa,SG,68.0,1079.0,171.0,370.0,0.462,433.0,...,53.0,107.0,39.0,110.0,0.355,4.9,13.84,0.57,1.62,-0.02
133,2016,GSW,Andre Iguodala,SF,65.0,1732.0,176.0,368.0,0.478,457.0,...,79.0,102.0,54.0,154.0,0.351,5.62,16.04,0.83,2.37,-0.04


In [102]:
# Create dataframe for 2021 LAC with qualified individuals only
tp_LAC_2021 = tp_2021[(tp_2021['Tm']=='LAC')]

# Display DataFrame with 2021 rankings 
tp_LAC_2021


Unnamed: 0,Year,Tm,Player,Pos,G,MP,FG,FGA,FG%,PTS,...,TOV,PF,3P,3PA,3P%,EM3A,EM3,3PG,3PAG,3NG
4,2021,LAC,Paul George,SF,40.0,1336.0,321.0,673.0,0.477,911.0,...,125.0,95.0,132.0,299.0,0.441,2.24,5.06,3.3,7.48,1.91
5,2021,LAC,Marcus Morris,PF,42.0,1074.0,187.0,416.0,0.45,521.0,...,37.0,86.0,100.0,218.0,0.459,2.46,5.37,2.38,5.19,1.6
18,2021,LAC,Luke Kennard,SG,48.0,914.0,142.0,295.0,0.481,376.0,...,36.0,72.0,76.0,163.0,0.466,2.8,6.02,1.58,3.4,1.11
30,2021,LAC,Reggie Jackson,SG,51.0,1109.0,189.0,409.0,0.462,509.0,...,54.0,101.0,84.0,195.0,0.431,2.84,6.6,1.65,3.82,0.87
45,2021,LAC,Nicolas Batum,SF,51.0,1447.0,149.0,325.0,0.458,430.0,...,39.0,83.0,89.0,215.0,0.414,3.36,8.13,1.75,4.22,0.74
46,2021,LAC,Patrick Beverley,PG,31.0,726.0,82.0,190.0,0.432,249.0,...,27.0,97.0,52.0,125.0,0.416,2.9,6.98,1.68,4.03,0.74
65,2021,LAC,Kawhi Leonard,SF,45.0,1559.0,425.0,825.0,0.515,1168.0,...,88.0,75.0,88.0,223.0,0.395,3.5,8.86,1.96,4.96,0.58
99,2021,LAC,Amir Coffey,SG,35.0,276.0,36.0,75.0,0.48,101.0,...,11.0,13.0,20.0,45.0,0.444,3.06,6.9,0.57,1.29,0.33
130,2021,LAC,Terance Mann,SG,51.0,918.0,124.0,247.0,0.502,325.0,...,31.0,95.0,24.0,60.0,0.4,7.65,19.12,0.47,1.18,0.15
165,2021,LAC,Serge Ibaka,C,39.0,919.0,176.0,347.0,0.507,427.0,...,44.0,74.0,38.0,108.0,0.352,4.26,12.09,0.97,2.77,-0.05


In [103]:
# Create dataframe for 2010 PHO with qualified individuals only
tp_PHO_2010 = tp_2010[(tp_2010['Tm']=='PHO')]

# Display DataFrame with 2010 rankings 
tp_PHO_2010

Unnamed: 0,Year,Tm,Player,Pos,G,MP,FG,FGA,FG%,PTS,...,TOV,PF,3P,3PA,3P%,EM3A,EM3,3PG,3PAG,3NG
2,2010,PHO,Channing Frye,C,81.0,2190.0,317.0,703.0,0.451,904.0,...,73.0,263.0,172.0,392.0,0.439,2.8,6.36,2.12,4.84,1.19
9,2010,PHO,Jared Dudley,SF,82.0,1991.0,225.0,490.0,0.459,674.0,...,68.0,162.0,120.0,262.0,0.458,3.8,8.3,1.46,3.2,0.96
11,2010,PHO,Steve Nash,PG,81.0,2660.0,499.0,985.0,0.507,1333.0,...,295.0,108.0,124.0,291.0,0.426,4.57,10.72,1.53,3.59,0.76
18,2010,PHO,Jason Richardson,SG,79.0,2485.0,473.0,998.0,0.474,1239.0,...,92.0,169.0,157.0,400.0,0.393,3.1,7.92,1.99,5.06,0.57
49,2010,PHO,Goran Dragic,PG,80.0,1442.0,222.0,491.0,0.452,635.0,...,126.0,131.0,74.0,188.0,0.394,3.84,9.74,0.92,2.35,0.25
55,2010,PHO,Grant Hill,SF,81.0,2430.0,336.0,703.0,0.478,912.0,...,108.0,165.0,35.0,80.0,0.438,15.19,34.72,0.43,0.99,0.23
176,2010,PHO,Leandro Barbosa,PG,44.0,786.0,155.0,365.0,0.425,418.0,...,46.0,71.0,44.0,136.0,0.324,2.89,8.93,1.0,3.09,-0.3


## Career Totals¶

How about the most points gained by shooting 3's over their entire career?

In [104]:
# Create 3NG/c using same formula as 3NG, but use totals instead of per game
tp_ind['3NG/c'] = tp_ind['3P'] * (3 - ev) - (tp_ind['3PA'] - tp_ind['3P']) * ev

# Round to 2 decimal places
tp_ind['3NG/c'] = round(tp_ind['3NG/c'], 2)

# Group by player, and sum over their career
tp_player = tp_ind.groupby('Player', as_index=False)['3NG/c'].sum()

# Order from the top
tp_player = tp_player.sort_values('3NG/c', ascending=False)

# Reindex
tp_player.index = reindex_start_1(tp_player)

# Display top 25
tp_player.head(25)

Unnamed: 0,Player,3NG/c
1,Stephen Curry,1443.88
2,Kyle Korver,1247.75
3,Ray Allen,986.59
4,Steve Nash,849.09
5,J.J. Redick,822.82
6,Klay Thompson,812.24
7,Reggie Miller*,754.49
8,Dale Ellis,601.91
9,Mike Miller,595.02
10,Peja Stojakovic,590.39


## Steph Curry - The 3 point king

## Career Averages¶
We can also examine career averages by taking a player's 3NG for each year and dividing by the total number of years.

In [105]:
# Aggregate 3NG sum and total years for each player
tp_3Py = tp_ind_qual.groupby('Player', as_index=False).agg({'3NG':'sum','Year':'count'})

# Only select players with at least 5 years in the league
tp_3Py = tp_3Py[tp_3Py['Year']>=5]

# To obtain career averages, divide 3NG sum by total years
tp_3Py['3NG/y'] = tp_3Py['3NG']/tp_3Py['Year']

# Round to 2 decimal places
tp_3Py['3NG/y'] = round(tp_3Py['3NG/y'], 2)

# Sort values in descending order
tp_3Py = tp_3Py[['Player','3NG/y']].sort_values('3NG/y', ascending=False) 

# Reset index
tp_3Py.index = reindex_start_1(tp_3Py)

# Show top 25 career 3NG averages
tp_3Py.head(25)

Unnamed: 0,Player,3NG/y
1,Stephen Curry,2.01
2,Klay Thompson,1.32
3,Joe Harris,1.11
4,Buddy Hield,1.04
5,Kyle Korver,1.02
6,Seth Curry,0.97
7,Joe Ingles,0.86
8,J.J. Redick,0.8
9,Ray Allen,0.76
10,Steve Novak,0.75


## Conclusion

Three new NBA statistics have been presented, EM3A, EM3, and 3NG. EM3A, Expected Minutes before a 3-point Attempt could be of value to coaches preparing for opponents and working with their own players. EM3, Expected Minutes before a 3, is a fun statistic that could be used for similar reasons. 3NG, 3-point Net Gain, is a powerful statistic that provides a single number to rank 3-point shooters across all seasons.

3NG rewards players for making 3-point shots, and penalizes them for missing. Players that make a lot of 3s, but shoot a low percentage are exposed as making slight contributions to their teams. Players who shoot a high percentage need to make a high volume to be competitive. 3NG rankings are statisically verifiable while simultaneously communicating valuable information.

3NG reveals the net gain in points beyond the league average that a player adds to his team by shooting 3-pointers. It can be weighted, summed, or displayed as per game averages. It can be used as a barometer to determine whether a player shooting 3-pointers results in a net gain or net loss for the team. It can also be used to predict playoff success.

3NG can be further used to analyze playoff performers and clutch 3-point shooters. It can be used during basketball seasons past and future. It can be used in any league, WNBA, college, high school, etc., provided that an appropiate expected value, like points per possession, is used as a baseline of comparison.