This Jupyter Notebook offers an engaging analysis of NBA shot attempts for the 2022-23 season, highlighting:

- **Data Preprocessing**: Filters for made and missed shots from play-by-play data.
- **Image Handling**: Loads player images, excluding those without images.
- **Data Enrichment**: Adds player positions and maps team colors for visual distinction.
- **Interactive Visualization**: Uses Plotly for dynamic exploration of shot types, efficiency, and frequency.
- **Insightful Analysis**: Aims to uncover shooting trends and performance insights in the NBA.

# Importing Necessary Libraries

In this section, we import all the necessary libraries that will be used throughout the notebook. This includes pandas for data manipulation, os for interacting with the operating system, PIL for image processing, and plotly for interactive plotting.

In [10]:
import pandas as pd
import os
from PIL import Image
import plotly.graph_objects as go

# Preparing Play-by-Play Data

Here, we load the play-by-play data from a CSV file and filter it to include only the rows where a shot was made or missed. We also add a new column to indicate whether the shot was a hit or a miss.

In [11]:
play_by_play_data = pd.read_csv("nba-2022-23-adv-boxscores/pbp.csv") # Load play-by-play data
play_by_play_data = play_by_play_data[play_by_play_data["type"].isin(["Made Shot", "Missed Shot"])] # Filter for made and missed shots
play_by_play_data.loc[:, "is_hit"] = 1  # Assume all shots hit initially
play_by_play_data.loc[play_by_play_data["type"] == "Missed Shot", "is_hit"] = 0 # Mark missed shots

# Extract unique shot subtypes from the play-by-play data
unique_shot_subtypes = set(play_by_play_data["subtype"])

# Load game data, filtering out entries without a specified 'SEC' value
game_data = pd.read_csv("nba-2022-23-adv-boxscores/basic.csv")
game_data = game_data[~game_data["SEC"].isna()].groupby("playerid")["gameid"].count()

# Loading Player Images
In this section, we attempt to load images for each player mentioned in the play-by-play data. We keep track of any players for whom the image loading fails.

In [12]:
missing_images = []  # Initialize a list to keep track of players without images
player_images = {}  # Dictionary to store player images
for player_id in set(play_by_play_data["playerid"]): # Iterate through unique player IDs
    try:
        player_images[player_id] = Image.open(f"nba-active-players-images/img/{player_id}.png") # Attempt to load player image
    except FileNotFoundError:
        missing_images.append(player_id) # If image loading fails, add player ID to skipped list

# Excluding Players Without Photos
To ensure we only work with players who have corresponding photos, we exclude any players from the play-by-play data who were added to the skipped list in the previous step.

In [13]:
play_by_play_data = play_by_play_data[~play_by_play_data["playerid"].isin(missing_images)]

# Assigning Shapes Based on Player Positions
We load another dataset containing player positions and assign shapes to each position. This will be useful for visualizing the data later on.

In [14]:
# Load player information, including positions
player_positions = pd.read_csv("nba-active-players-images/players.csv")

# Simplify player positions by splitting any combined positions and taking the first one
player_positions.loc[player_positions["position"].str.contains("-"), "position"] = player_positions.loc[player_positions["position"].str.contains("-"), "position"].str.split("-").str[0]

# Replace textual position descriptions with shapes for visualization purposes
player_positions["position"] = player_positions["position"].replace({
    "Guard": "circle",
    "Forward": "diamond",
    "Center": "hexagon2"
})

# Set the player ID as the index of the DataFrame for easy lookup
player_positions = player_positions.set_index("playerid")["position"]


# Assigning Colors Based on Team
We define a dictionary mapping each team to its primary and secondary colors. These colors will be used to represent the teams in the visualizations.

In [15]:
team_colors = {'ATL': ['#E03A3E', '#C1D32F'],
          'BOS': ['#007A33', '#BA9653'],
          'BKN': ['#000000', '#FFFFFF'],
          'CHA': ['#1D1160', '#00788C'],
          'CHI': ['#CE1141', '#000000'],
          'CLE': ['#860038', '#FDBB30'],
          'DAL': ['#00538C', '#B8C4CA'],
          'DEN': ['#0E2240', '#FEC524'],
          'DET': ['#C8102E', '#1D42BA'],
          'GSW': ['#1D428A', '#FFC72C'],
          'HOU': ['#CE1141', '#000000'],
          'IND': ['#FDBB30', '#002D62'],
          'LAC': ['#C8102E', '#1D428A'],
          'LAL': ['#552583', '#FDB927'],
          'MEM': ['#5D76A9', '#12173F'],
          'MIA': ['#98002E', '#000000'],
          'MIL': ['#00471B', '#EEE1C6'],
          'MIN': ['#0C2340', '#78BE20'],
          'NOP': ['#0C2340', '#85714D'],
          'NYK': ['#006BB6', '#F58426'],
          'OKC': ['#007AC1', '#EF3B24'],
          'ORL': ['#0077C0', '#C4CED4'],
          'PHI': ['#006BB6', '#ED174C'],
          'PHX': ['#1D1160', '#E56020'],
          'POR': ['#E03A3E', '#FFFFFF'],
          'SAC': ['#5A2D81', '#63727A'],
          'SAS': ['#000000', '#C4CED4'],
          'TOR': ['#CE1141', '#000000'],
          'UTA': ['#ffeb17', '#000000'],
          'WAS': ['#002B5C', '#E31837']
}

# Function for Generating Data for Each Shot Type
This function takes a shot type (or types) as input and generates a DataFrame containing statistics for each player who has attempted that type of shot. It filters the play-by-play data based on the shot type, aggregates statistics by player, and joins this with the player positions data.

In [16]:
# Function to generate shot data for visualization based on shot type
def generate_shot_data(shot_type):
    if isinstance(shot_type, list):
        shot_names_filtered = [name for name in unique_shot_subtypes if any(item in name for item in shot_type)]
    else:
        shot_names_filtered = [name for name in unique_shot_subtypes if shot_type in name]

    shots = play_by_play_data[play_by_play_data["subtype"].isin(shot_names_filtered)].groupby("playerid").agg({
        "player": "first",
        "playerid": "first",
        "team": "last",
        "is_hit": ["count", "mean"]
    })
    shots.columns = ["player", "playerid", "team", "FGA", "FG%"]
    shots = shots.join(game_data)
    shots = shots[shots["FGA"] >= 10]
    shots["FGA/G"] = shots["FGA"] / shots["gameid"]

    return shots.join(player_positions)

# Initialize Plotly figure for visualization
fig = go.Figure()
default_shot_type = "Step Back"
data_for_default_shot = generate_shot_data(default_shot_type)
player_ids_for_default_shot = data_for_default_shot["playerid"]

# Plot player images on the figure
image_indices = {}
for image_id, image in player_images.items():
    image_indices[image_id] = len(image_indices)
    if image_id in player_ids_for_default_shot:
        row = data_for_default_shot[player_ids_for_default_shot == image_id].iloc[0]
        x, y = row["FGA/G"], row["FG%"]
        visible = True
    else:
        x = y = 0
        visible = False
    fig.add_layout_image(
        x=x, y=y, source=image, xref="x", yref="y", sizex=0.2, sizey=0.2, xanchor="center", yanchor="middle", visible=visible
    )

# Define shot types and titles for dropdown menu in visualization
shot_types = ['Alley Oop', 'Bank', 'Cutting', 'Dunk', 'Fadeaway', 'Finger Roll', 'Floating', 'Hook', 'Jump Shot', 'Layup', 'Pullup', 'Putback', 'Reverse', 'Running', 'Step Back', 'Tip', 'Turnaround']

shot_titles = ['Alley Oops', 'Bank Shots', 'Cutting', 'Dunks', 'Fadeaways', 'Finger Rolls', 'Floaters', 'Hooks', 'Jump Shots', 'Layups', 'Pullups', 'Putbacks', 'Reverse', 'Running Shots', 'Step Backs', 'Tips', 'Turnarounds']

# Initialize list to store visibility for dropdown menu
l = len(shot_types)
def visl(n):
    return [False for _ in range(n)] + [True] + [False for _ in range(l-n-1)]

# Initialize Plotly figure
fig = go.Figure()

# Prepare arguments for dropdown menu options
dropdown_args = {}
for index, shot_type in enumerate(shot_types):
    shot_data = generate_shot_data(shot_type)  # Generate shot data for each type

    # Prepare color scales based on team colors
    inside_color_scale = [team_colors[team][1] for team in shot_data["team"]]
    outside_color_scale = [team_colors[team][0] for team in shot_data["team"]]

    # Add scatter plot for each shot type
    fig.add_trace(go.Scatter(
        x=shot_data["FGA/G"],
        y=shot_data["FG%"],
        mode='markers',
        name=shot_type,
        hovertemplate="<b>%{text}<br>FGA/G: %{x:.3f}<br>FG%: %{y:.3f}<extra></extra>",
        text=shot_data["player"] + "</b><br>FGA: " + shot_data["FGA"].astype(str),
        marker=dict(
            size=6,
            color=inside_color_scale,  # Use team's primary color
            symbol=shot_data["position"],
            line=dict(
                color=outside_color_scale,  # Use team's secondary color
                width=2.5
            )
        )
    ))

    # Prepare arguments for updating visualization based on selected shot type
    update_args = dict(visible=visl)
    dropdown_args[shot_type] = update_args

# Image index dictionary to map player IDs to their image index in the layout
image_index = {player_id: idx for idx, player_id in enumerate(player_images.keys())}

for index, shot_type in enumerate(shot_types):
    shot_data = generate_shot_data(shot_type)  # Generate shot data for the current shot type

    # Initialize sub-arguments for updating visualization based on the selected shot type
    sub_args = {
        f"layout.images[{image_index[player_id]}].x": row["FGA/G"]
        for player_id, row in shot_data.iterrows()
    }
    sub_args.update({
        f"layout.images[{image_index[player_id]}].y": row["FG%"]
        for player_id, row in shot_data.iterrows()
    })
    sub_args.update({
        f"layout.images[{image_index[player_id]}].visible": False
        for player_id in image_index if player_id not in shot_data.index
    })
    sub_args.update({
        f"layout.images[{image_index[player_id]}].visible": True
        for player_id in shot_data.index
    })

    # Calculate the size of the images based on the shot data to ensure visibility and proportionality
    size_x = size_y = max(max(shot_data["FGA/G"]) * 0.05, max(shot_data["FG%"]) * 0.07)
    for player_id in shot_data.index:  # Corrected syntax error here
        sub_args[f"layout.images[{image_index[player_id]}].sizex"] = size_x
        sub_args[f"layout.images[{image_index[player_id]}].sizey"] = size_y

    # Update the title based on the selected shot type
    sub_args["layout.title.text"] = "Players Best At " + shot_type

    # Store the prepared arguments for later use when the corresponding shot type is selected
    update_args[shot_type] = sub_args


# Plotting Player Images and Shot Data
Finally, we use Plotly to create an interactive plot displaying player images and shot data. We add buttons to the plot to allow users to filter the visualization based on shot type.

In [17]:
# Update the layout of the figure
def update_figure_layout(figure):
    figure.update_layout(
        width=1050,
        height=600,
        autosize=False,
        margin=dict(t=50, b=130, l=0, r=0),
        xaxis_title="FGA/G",
        yaxis_title="FG%",
        title="Step Back"
    )

# Create buttons for updating the figure
def create_update_buttons(types, vis_list):
    update_buttons = [
        dict(
            method="update",
            label=shot_type,
            args=[{"visible": visl(n)}, update_args[shot_type]]
        ) for n, shot_type in enumerate(types)
    ]
    return update_buttons

# Create buttons for hiding/showing player faces
def create_hide_show_buttons(image_indices):
    hide_buttons = [
        dict(
            method="relayout",
            label="Show Faces",
            args=[
                {f"images[{idx}].opacity": 1 for idx in image_indices.values()}
            ],
            args2=[
                {f"images[{idx}].opacity": 0 for idx in image_indices.values()}
            ]
        )
    ]
    return hide_buttons

# Define the layout for the update menus
def define_update_menus(update_buttons, hide_buttons, default_index):
    update_menus = [
        {
            "buttons": update_buttons,
            "direction": "right",
            "type": "buttons",
            "pad": {"r": 10, "t": 10},
            "showactive": True,
            "x": 0,
            "xanchor": "left",
            "y": -0.12,
            "yanchor": "top",
            "font": {"size": 8.25},
            "active": default_index
        },
        {
            "buttons": hide_buttons,
            "type": "buttons",
            "pad": {"r": 10, "t": 10},
            "showactive": True,
            "x": 0,
            "xanchor": "left",
            "y": -0.2,
            "yanchor": "top",
            "font": {"size": 8.25}
        }
    ]
    return update_menus

# Update the visibility of each trace in the figure
def update_trace_visibility(figure, default_trace):
    figure.for_each_trace(
        lambda trace: trace.update(visible=True) if trace.name == default_trace else trace.update(visible=False)
    )

# Display the figure
def display_figure(figure):
    figure.show()

# Update the layout of the figure
update_figure_layout(fig)

# Create buttons for updating the figure
update_buttons = create_update_buttons(shot_types, update_args)

# Create buttons for hiding/showing player faces
hide_buttons = create_hide_show_buttons(image_index)

DEFAULT = "Step Back"

# Define the layout for the update menus
update_menus = define_update_menus(update_buttons, hide_buttons, shot_types.index(DEFAULT))

# Update the layout with the defined update menus
fig.update_layout(updatemenus=update_menus)

# Update the visibility of each trace in the figure
update_trace_visibility(fig, DEFAULT)

# Display the figure
display_figure(fig)


# Conclusion:

- Developed a visualization tool using Plotly for basketball shot data analysis.
- Enabled exploration of shot types and player performance metrics like FGA/G and FG%.
- Integrated player images for intuitive understanding of player positioning.
- Implemented dropdown menus for easy switching between shot types.
- Provided comments and explanations to enhance code readability.
- Demonstrated the potential for further customization and integration into larger analytics systems.