The purpose of this analysis is to perform some exploratory data analysis of NBA offense in the 2022-2023 season.

The nba API breaks down offensive possessions into several types :

* Cut
* Handoff
* Isolation
* Miscellaneous
* OffScreen
* Postup
* PRBallHandler
* PRRollman
* OffRebound
* Spotup
* Transition

A good starting point for measuring offensive efficiency is points per possession (PPP). To put our analysis into context, let's start by calculating the average points per possession across all players and play types.

The NBA SynergyPlayTypes endpoint doesn't let us call all players and play types at once, so instead we'll just do an API call for each playtype and merge the datasets together.

In [22]:
from nba_api.stats import endpoints
import pandas as pd
import time
import numpy as np

# loop over the different play types
play_types_list = ['Cut','Handoff','Isolation','Misc','OffScreen','Postup','PRBallHandler','PRRollman','OffRebound','Spotup','Transition']
df = pd.DataFrame()

for play_type in play_types_list:

    # The NBA API often times out on calls, using a while loop to simply automate retries of our api call
    while True:
        try:
            response = endpoints.SynergyPlayTypes(play_type_nullable=play_type, player_or_team_abbreviation='P', type_grouping_nullable='Offensive').get_data_frames()[0]
            print(f'{play_type} called successfully.')
            break # quit the loop if successful
        except:
            print(f'Error with {play_type} call.')
            time.sleep(5)

    # combine each call into one dataframe
    df = pd.concat([df,response])
    time.sleep(5)

df.to_csv('synergy_all_offensive_possessions_by_player.csv')

Cut called successfully.
Handoff called successfully.
Isolation called successfully.
Misc called successfully.
OffScreen called successfully.
Postup called successfully.
PRBallHandler called successfully.
PRRollman called successfully.
OffRebound called successfully.
Spotup called successfully.
Transition called successfully.


Let's save this as a csv so that in the future we can just read from the csv to save some time for analysis. We can always rerun the previous cell when we want to update our data.

In [168]:
df = pd.read_csv('synergy_all_offensive_possessions_by_player.csv')
print('Read file from csv.')

NameError: name 'plotly' is not defined

The complete csv contains many extraneous columns, let's select just a couple that we are interested in.

* PLAYER_NAME and TEAM_NAME will identify the players.
* PLAY_TYPE describes the type of offensive possession.
* PPP gives us the average number of points scored per possession by that player for a certain type of shot attempt.
* POSS is the number of possessions that each player utilizes per type of shot attempt.

In [151]:
df = df[['PLAYER_NAME','TEAM_NAME','PLAY_TYPE','PPP','POSS']]

# sort by player name then play type
df = df.sort_values(['PLAYER_NAME','PLAY_TYPE'])

Now, by calculating the weighted average of points per possession weighted by number of possessions we get our league average PPP of 1.019. That should give us some useful context for our play type analysis.

In [152]:
league_average_ppp = np.average(df['PPP'], weights=df['POSS'])
league_average_ppp

1.0194306169357525

Next, let's calculate PPP for each play type to see which types of plays tend to be more and less efficient.

In [153]:
# create new dataframe for this analysis
df_play_type_PPP = df.sort_values('PLAY_TYPE')

# calculate PPP using grouped weighted average
df_play_type_PPP_weights = df_play_type_PPP.groupby('PLAY_TYPE').apply(lambda x: np.average(x['PPP'], weights=x['POSS'])).reset_index(name='PPP')

# calculate sum of possessions
df_play_type_PPP_possessions = df_play_type_PPP.groupby('PLAY_TYPE')['POSS'].sum().reset_index()
df_play_type_PPP = pd.merge(df_play_type_PPP_weights,df_play_type_PPP_possessions)
df_play_type_PPP.sort_values('PPP')

Unnamed: 0,PLAY_TYPE,PPP,POSS
3,Misc,0.542987,10269
6,PRBallHandler,0.90687,31514
2,Isolation,0.947907,12065
5,OffScreen,0.951863,6508
1,Handoff,0.960385,8298
8,Postup,0.978828,7238
9,Spotup,1.039979,43021
10,Transition,1.134757,30843
4,OffRebound,1.148954,9044
7,PRRollMan,1.176522,9234


Now that we have volume and efficiency data for each shot type, let's graph our results.

In [154]:
import plotly.express as px
fig = px.scatter(df_play_type_PPP, x='POSS', y='PPP', text='PLAY_TYPE')
fig.update_layout(title_text='Frequency and Efficiency of NBA Offensive Possessions by Shot Type (2022-2023)', title_x=0.5)
fig.update_traces(textposition='bottom center')
fig.add_hline(y=league_average_ppp, line_dash='dash', line_width = 0.5)
fig.update_xaxes(title_text='Number of Possessions')
fig.update_yaxes(title_text='Points per Possession (PPP)')
fig.show()

First, let's start by taking a look at efficiency as measured by PPP.

* Cutting to the basket is by far the most efficient type of possession at 1.31 PPP.
* Misc possessions are extremely inefficient at 0.54 PPP.
* The rest of the shot types are more tightly clustered between 0.8PPP and 1.2PPP

Next, let's look at the number of possessions utilized for each shot type.
* Spotting up is by far the most common type of possession, with over 43k occurrences in the '22 season.
* Transition and Pick and Roll Ball Handler possessions are also quite frequent at around 31k occurrences.
* All other possession types occur significantly less frequently, clustered between 6k and 12k occurrences.

A major trend in modern NBA offense has been the rise of faster paced play relying on weaker transition defenses. The data here supports this trend, showing both the high frequency and efficiency of these types of possessions.

Now that we have taken a look at offensive possessions by shot type at the league-level, let's move on to analyzing shot possessions at the team level.

In [155]:
# create new dataframe for this analysis
df_team = df.sort_values('PLAY_TYPE')

# group data by team and play type, then calculate PPP with weighted average
df_team_PPP = df_team.groupby(['TEAM_NAME','PLAY_TYPE']).apply(lambda x: np.average(x['PPP'], weights=x['POSS'])).reset_index(name='PPP')

# calculate sum of possessions
df_team_possessions = df_team.groupby(['TEAM_NAME','PLAY_TYPE'])['POSS'].sum().reset_index()

df_team = pd.merge(df_team_PPP,df_team_possessions)
df_team.sort_values(['TEAM_NAME','PLAY_TYPE'])

Unnamed: 0,TEAM_NAME,PLAY_TYPE,PPP,POSS
0,Atlanta Hawks,Cut,1.388962,342
1,Atlanta Hawks,Handoff,1.019487,154
2,Atlanta Hawks,Isolation,0.977146,481
3,Atlanta Hawks,Misc,0.613968,285
4,Atlanta Hawks,OffRebound,1.114693,322
...,...,...,...,...
325,Washington Wizards,PRBallHandler,0.897693,919
326,Washington Wizards,PRRollMan,1.197404,324
327,Washington Wizards,Postup,0.996683,293
328,Washington Wizards,Spotup,1.035888,1423


In [166]:
fig = px.scatter(df_team, x='POSS', y='PPP', facet_col='PLAY_TYPE', facet_col_wrap=3)
fig.show()