<style>
.jp-Notebook {
    padding: var(--jp-notebook-padding);
    margin-left: 160px;
    outline: none;
    overflow: auto;
    background: var(--jp-layout-color0);
}
</style>

In [None]:
import pandas as pd
import numpy as np 
import seaborn as sns 
import matplotlib.pyplot as plt

In [None]:
awards = pd.read_csv("awards_data.csv")
player_data = pd.read_csv("player_stats.csv")
team_data = pd.read_csv("team_stats.csv")
rebounding_data = pd.read_csv("team_rebounding_data_22.csv")

In [None]:
import warnings
warnings.filterwarnings('ignore')


## Can Using 3pt and 2pt  Field Goal Averages From Past  Career Outcomes, Predict  Future Career Outcomes? 

As basketball has evolved, the game has increasingly favored smaller, shooting-dominant players. This study seeks to understand if this shift is reflected in the awards and playing time allotted to players with higher field goal percentages. And also predict future career outcomes based on the findings.

#### Terms 
    fgp2 = 2pt Field Goal Percentage
    fgp3 = 3pt Field Goal Percentage

##### Methods
 To investigate, I used players drafted before 2015.  I first categorized players into 'Elite', 'All-Star', 'Starter', 'Rotation', 'Roster', and 'Out of the League' based on their achievements and roles. I then analyzed the average 2pt (fgp2) and 3pt (fgp3) field goal percentages for each category to identify correlations with career status. With this data, I then predicted the 2018-2021 draft picks future career outcomes by calculating the Euclidean distance between a player's fgp2 and fgp3 and the averages for each status category.

##### Possible problems with my model. 
While the game has become more shooting-centric, and our model indicates a correlation between higher field goal percentages and superior career outcomes, solely relying on this metric may not provide a complete picture. This is particularly true as not all positions prioritize shooting. We also excluded data on defensive players, rookies, and those who left the league, simplifying our model due to time constraints. Injuries can also have a huge impact on player potential. 

##### How I would improve my model.
My model was envisioned to be more than just a shooter sided prediction model. If I spent more time with it I would include player data from defensive awards and compare truly if players with a higher field goal percentage earned more awards and playing time, compared to more defensive players. For example, Dikembe Mutombo has many Defensive awards and rookie awards but in my model he would be considered as a roster player.  Incorporating a linear regression or machine learning model will also refine the predictions, offering probabilities of career outcomes rather than just categorical predictions and also more nuances will be analyzed. Moreover, I am cognizant that traditional positions, like centers, may not focus as much on shooting, and future refinements would adjust for these variations.

##### Findings 
Upon executing my model, I observed a positive correlation between both 2pt and 3pt shooting percentages and player status—irrespective of their position. It is important to note that the 'Allstar' category had superior 3pt shooting percentages compared to 'Elite' players. This discrepancy could be attributed to factors like shot selection, the smaller sample size of elite players shooting 3s, or even age and physical condition.

#### Conclusion 

My research into the contemporary basketball arena underscores a clear link between shooting accuracy and career success. Interestingly, "All-Star" players had a slight edge in 3pt shooting over "Elite" players, emphasizing that basketball success is multifaceted. While the insights gleaned are invaluable, the model's primary focus on offensive prowess and omission of certain player categories suggest areas for future exploration. A comprehensive model accounting for both offensive and defensive metrics would provide a more rounded understanding of basketball success.

#### My Predictions For Shai Gilgeous-Alexander, Zion Williamson
Based on my Data and Model I concluded that:  
  
Shai Gilgeous-Alexander     -    All-Star  
Zion Williamson             -    Elite  

In [None]:
# Merged dataframes on relevant columns
merged_data = pd.merge(player_data, awards, on=['season', 'nbapersonid'], how='left')

# Filtered for players drafted in 2015 and below
merged_data_4 = merged_data[merged_data['draftyear'] <= 2015]

# Selected only the relevant columns
filtered_data_modeling = merged_data_4[[
    'season', 'nbapersonid', 'player','games_start', 'draftyear', 'team', 'games', 
    'mins', 'fgp3', 'fgp2', 'All NBA First Team', 'All NBA Second Team', 'All NBA Third Team',
    'Most Valuable Player_rk', 'all_star_game'
]]

filtered_data_modeling

Unnamed: 0,season,nbapersonid,player,games_start,draftyear,team,games,mins,fgp3,fgp2,All NBA First Team,All NBA Second Team,All NBA Third Team,Most Valuable Player_rk,all_star_game
0,2007,2585,Zaza Pachulia,5,2003,ATL,62,944,0.000,0.442,,,,,
1,2007,200780,Solomon Jones,0,2006,ATL,35,145,0.000,0.429,,,,,
2,2007,2746,Josh Smith,81,2004,ATL,81,2873,0.253,0.477,0.0,0.0,0.0,,
3,2007,201151,Acie Law,6,2007,ATL,56,865,0.206,0.433,,,,,
4,2007,101136,Salim Stoudamire,0,2005,ATL,35,402,0.341,0.379,,,,,
...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...
8483,2021,204001,Kristaps Porzingis,17,2015,WAS,17,479,0.367,0.522,,,,,
8487,2021,203484,Kentavious Caldwell-Pope,77,2013,WAS,77,2329,0.390,0.480,,,,,
8489,2021,203107,Tomas Satoransky,10,2012,WAS,22,416,0.273,0.548,,,,,
8490,2021,1626149,Montrezl Harrell,3,2015,WAS,46,1117,0.267,0.660,,,,,


In [None]:
# Merging player_data and awards dataframes on the columns 'season' and 'nbapersonid'
merged_data = pd.merge(player_data, awards, on=['season', 'nbapersonid'], how='left')

# Filtering to consider only players who were drafted in the year 2015 or earlier
merged_data_4 = merged_data[merged_data['draftyear'] <= 2015]

# Selecting specific columns of interest from the filtered dataframe
filtered_data = merged_data_4[[
    'season', 'nbapersonid', 'player', 'games_start', 'draftyear', 'team', 'games', 
    'mins', 'fgp3', 'fgp2', 'All NBA First Team', 'All NBA Second Team', 'All NBA Third Team',
    'Most Valuable Player_rk', 'all_star_game'
]]

# Function to classify a player's status based on awards received and playtime
def player_status(row):
    # Handling missing values: Replace NaN values in the row with 0
    row.fillna(0, inplace=True)
    
    # Adjusting game starts and minutes for special seasons (lockout or COVID-affected)
    if row['season'] == 2011:
        factor = 82/66
        row['games_start'] = round(row['games_start'] * factor)
        row['mins'] = round(row['mins'] * factor)
    elif row['season'] in [2019, 2020]:
        factor = 82/72
        row['games_start'] = round(row['games_start'] * factor)
        row['mins'] = round(row['mins'] * factor)
    
    # Classification logic: Determined player's status based on awards and playtime
    if (row['All NBA First Team'] == 1 or row['All NBA Second Team'] == 1 or 
        row['All NBA Third Team'] == 1 or row['Most Valuable Player_rk'] == 1):
        return 'Elite'
    elif row['all_star_game'] == 1:
        return 'All-Star'
    elif row['games_start'] >= 41 or row['mins'] >= 2000:
        return 'Starter'
    elif row['mins'] >= 1000:
        return 'Rotation'
    elif row['mins'] >= 1:
        return 'Roster'
    else:
        return 'Out of the League'

# Applied the player_status function to each row to determine player's status
filtered_data['status'] = filtered_data.apply(player_status, axis=1)

# Selected specific columns to display the final result
result = filtered_data[['draftyear','nbapersonid', 'player', 'status', 'fgp3', 'fgp2']]


result

Unnamed: 0,draftyear,nbapersonid,player,status,fgp3,fgp2
0,2003,2585,Zaza Pachulia,Roster,0.000,0.442
1,2006,200780,Solomon Jones,Roster,0.000,0.429
2,2004,2746,Josh Smith,Starter,0.253,0.477
3,2007,201151,Acie Law,Roster,0.206,0.433
4,2005,101136,Salim Stoudamire,Roster,0.341,0.379
...,...,...,...,...,...,...
8483,2015,204001,Kristaps Porzingis,Roster,0.367,0.522
8487,2013,203484,Kentavious Caldwell-Pope,Starter,0.390,0.480
8489,2012,203107,Tomas Satoransky,Roster,0.273,0.548
8490,2015,1626149,Montrezl Harrell,Rotation,0.267,0.660


In [None]:
# Assigning a number rank to each status
status_rank = {
    'Elite': 5,
    'All-Star': 4,
    'Starter': 3,
    'Rotation': 2,
    'Roster': 1,
    'Out of the League': 0
}

# Adding a new column for status rank
result['status_rank'] = result['status'].map(status_rank)

# Grouping by nbapersonid and getting the index of the max status rank
idx = result.groupby('nbapersonid')['status_rank'].idxmax()

# Filtering the dataframe to keep only rows with the highest status for each player
highest_status = result.loc[idx]

# Droped the status_rank column as it's no longer needed
highest_status = highest_status.drop(columns='status_rank')

highest_status

Unnamed: 0,draftyear,nbapersonid,player,status,fgp3,fgp2
414,1994,15,Eric Piatkowski,Roster,0.423,0.143
191,1991,87,Dikembe Mutombo,Roster,,0.538
463,1992,109,Robert Horry,Roster,0.257,0.408
36,1992,136,P.J. Brown,Roster,0.000,0.350
166,1993,185,Chris Webber,Roster,,0.484
...,...,...,...,...,...,...
6935,2013,1629740,Nicolo Melli,Rotation,0.335,0.573
7114,2015,1629742,Stanton Kidd,Roster,0.000,0.000
7897,2015,1629750,Javonte Green,Starter,0.356,0.625
7320,2013,1630267,Facundo Campazzo,Rotation,0.352,0.444


### Below Shows the Chart I will be using to base my predictions for players drafted in 2018-2021
#### The Chart is an average of all the Field Goal players in each status category which was determined by All Nba Awards, MVP and Play Time.


In [None]:
from IPython.display import display, HTML
status_order = ['Elite', 'All-Star', 'Starter', 'Rotation', 'Roster', 'Out of the League']

# Calculated the mean fgp3 and fgp2 for each status
average_fgp = result.groupby('status')[['fgp3', 'fgp2']].mean()

# Re-indexed the DataFrame to match the custom order and print
ordered_avg_fgp = average_fgp.reindex(status_order)
ordered_avg_fgp


# Displayed the dataframe as an HTML table
display(HTML(ordered_avg_fgp.to_html()))





Unnamed: 0_level_0,fgp3,fgp2
status,Unnamed: 1_level_1,Unnamed: 2_level_1
Elite,0.311556,0.530375
All-Star,0.334817,0.497994
Starter,0.310249,0.496635
Rotation,0.309344,0.489456
Roster,0.274206,0.455968
Out of the League,,


### 2018-2021 Draft Pick Player Predictions for Career Outcome

In [None]:
# Filtering to consider only players who were drafted between 2018 and 2021
merged_data_filtered = merged_data[(merged_data['draftyear'] >= 2018) & (merged_data['draftyear'] <= 2021)]

# Selecting specific columns of interest from the filtered dataframe
filtered_data_2018_2021 = merged_data_filtered[[
    'season', 'nbapersonid', 'player', 'games_start', 'draftyear', 'team', 'games', 
    'mins', 'fgp3', 'fgp2', 'All NBA First Team', 'All NBA Second Team', 'All NBA Third Team',
    'Most Valuable Player_rk', 'all_star_game'
]]





In [None]:
from IPython.display import display, HTML
# First, group by 'nbapersonid' and 'player' to get the average 'fgp3' and 'fgp2' for each player from 2018-2021
player_averages = filtered_data_2018_2021.groupby(['nbapersonid', 'player'])[['fgp3', 'fgp2']].mean().reset_index()

# Defined a function to predict status for individual players based on their average 'fgp3' and 'fgp2' values
def predict_player_status(row):
    # Initialized variables to keep track of minimum distance and the corresponding status
    min_distance = float('inf')
    predicted_status = None

    # Iterate over each status and its associated average 'fgp3' and 'fgp2' values
    for status, values in average_fgp.iterrows():
        # Calculate the Euclidean distance between the player's average 'fgp3', 'fgp2' and the status's average 'fgp3', 'fgp2'
        distance = (row['fgp3'] - values['fgp3'])**2 + (row['fgp2'] - values['fgp2'])**2

        # If this distance is the smallest so far, update the minimum distance and the predicted status
        if distance < min_distance:
            min_distance = distance
            predicted_status = status

    # Returned the predicted status for the player
    return predicted_status

# Applied the function to predict the status of each player based on their average 'fgp3' and 'fgp2' values
player_averages['predicted_status'] = player_averages.apply(predict_player_status, axis=1)

# Replaced None values in 'predicted_status' column with 'Out of the League'
player_averages['predicted_status'].fillna('Out of the League', inplace=True)

# Displayed the players with their predicted statuses
display(HTML(player_averages.to_html()))

Unnamed: 0,nbapersonid,player,fgp3,fgp2,predicted_status
0,1628238,Paris Bass,0.0,0.5,Roster
1,1628959,Rawle Alkins,0.25,0.37,Roster
2,1628960,Grayson Allen,0.38175,0.498,All-Star
3,1628961,Kostas Antetokounmpo,,0.433333,Out of the League
4,1628962,Udoka Azubuike,,0.5995,Out of the League
5,1628963,Marvin Bagley,0.2618,0.5512,Elite
6,1628964,Mohamed Bamba,0.33725,0.55425,Elite
7,1628966,Keita Bates-Diop,0.3032,0.524,Elite
8,1628968,Brian Bowen II,0.0,0.354,Roster
9,1628969,Mikal Bridges,0.3725,0.60825,Elite


### 'Shai Gilgeous-Alexander', 'Zion Williamson' Career Outcome Predictions

In [None]:
# List of players to predict
players_to_predict = ['Shai Gilgeous-Alexander', 'Zion Williamson',]

# Filter the player_averages dataframe for the specified players
predicted_outcomes = player_averages[player_averages['player'].isin(players_to_predict)]

# Display the predictions for the players
predicted_outcomes[['player', 'fgp2', 'fgp3', 'predicted_status']]


Unnamed: 0,player,fgp2,fgp3,predicted_status
22,Shai Gilgeous-Alexander,0.519,0.358,All-Star
138,Zion Williamson,0.606,0.3615,Elite
