## Introduction

In this notebook, we are going to explore how the 3 point shot has evolved over years, how scoring has increased/decreased from its usage, and so on.

We will first start with the basic introduction on 'final_data.csv' - or whatever you named the data file! Let's first import and install all the necessary packages/libraries.

For a quick disclaimer, there are 3 data sets that I am going to be using for this notebook: 'final_data_2022.csv' - downloaded from the kaggle website; 'final_data_2023.csv' - added the 2023 all star data to 'final_data_2022.csv'; 'final_data_2024.csv' adds on the 2024 all star data to 'final_data_2023.csv'. 

In [None]:
# importing the necessary packages/libraries for the analysis.
import numpy as np
import pandas as pd
import matplotlib.pyplot as plt
import seaborn as sns
from sklearn.model_selection import train_test_split
from sklearn.metrics import mean_squared_error
from sklearn.linear_model import LinearRegression
from sklearn.tree import DecisionTreeRegressor
from sklearn.preprocessing import StandardScaler
from tensorflow.keras.models import Sequential
from tensorflow.keras.layers import Dense

As for this introduction, we will look into all 3 data sets to see if there is any significant change from the past 3 years.

Let's start with the given data set:

In [None]:
nba_all_stars_22 = pd.read_csv("final_data_2022.csv")
nba_all_stars_22.head(10)

In [None]:
nba_all_stars_22.info()

With 2022 out of the way, let's move the next year (2023):

In [None]:
# compare to the actual 2023 NBA All-Star FG3As
nba_all_stars_23 = pd.read_csv("final_data_2023.csv")
nba_all_stars_23.head(10)

In [None]:
nba_all_stars_23.info()

In [None]:
# calculate the average 3 pointers attempted for 2023
nba_all_stars_23 = nba_all_stars_23[nba_all_stars_23['year'] == 2023]
actual_fg3a_2023 = nba_all_stars_23['fg3a'].mean()

# print the actual average 3-point attempts for 2023
print(f"Actual Average 3-Point Attempts for 2023 NBA All Stars: {actual_fg3a_2023:.2f}")

This marks the end of the basic introduction of 'final_data.csv', we can now move onto the visualizations of the data set and predictions!

## Visualization #1 - Evolution of 3 Pointers Attempted

In [None]:
# group by year and calculate the average 3 pointers attempted
avg_3fga_22 = nba_all_stars_22.groupby('year')['fg3a'].mean().reset_index()

# plot the average 3 pointers attempted over the years
plt.figure(figsize=(10, 6))
sns.scatterplot(data=avg_3fga_22, x='year', y='fg3a', color='blue')
sns.regplot(data=avg_3fga_22, x='year', y='fg3a', scatter=False, lowess=True, line_kws={'color': 'black', 'alpha': 0.25}, ci=90)
plt.axvline(x=2009, color='gold')

# labels and title
plt.xlabel('Year')
plt.ylabel('Average 3 Pointers Attempted')
plt.title('Average 3 Pointers Attempted by All Stars Over the Years')

# display the plot
plt.show()

## Visualization #2 - Predicting the Average FG3As for the '23 NBA All Stars

### **2a** - Linear Regression Model

In [None]:
# replace missing values with 0
nba_all_stars_22['fg3a'].fillna(0, inplace=True)

# split the data into features and target variable
X = nba_all_stars_22[['year']]
y = nba_all_stars_22['fg3a']

# split the data into training and test sets
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.2, random_state=42)

# create a linear regression model
model = LinearRegression()

# train the model
model.fit(X_train, y_train)

# predict the average number of 3-point attempts for 2023
year_2023 = [[2023]]
predicted_fg3a_2023 = model.predict(year_2023)

# print the predicted average 3-point attempts for 2023
print(f"Predicted Average 3 Pointers Attempted for 2023: {predicted_fg3a_2023[0]:.2f}")

# evaluate the model on the test set
y_pred = model.predict(X_test)
mse = mean_squared_error(y_test, y_pred)
print(f"Mean Squared Error on Test Set: {mse:.3f}\n")

# plot the linear regression line
plt.figure(figsize=(12, 7))
plt.scatter(X, y, color='blue', label='Actual Data')
plt.plot(X_test, y_pred, color='red', label='Linear Regression Line')
plt.scatter(year_2023, predicted_fg3a_2023, color='gold', marker='*', s=200, label='Predicted for 2023')

# add the predicted value slightly to the right of the marker
offset = 0.5
plt.text(year_2023[0][0] + offset, predicted_fg3a_2023[0], f'{predicted_fg3a_2023[0]:.2f}', ha='left', va='center')

# labels and title
plt.xlabel('Year')
plt.ylabel('Average 3 Pointers Attempted')
plt.title('Linear Regression for Average 3 Pointers Attempted Over the Years')
plt.legend()
plt.show()

Now that we have found the actual average FG3As from the 2023 NBA All Stars, let's figure out the MSE (Mean Square Error) to see how accurate the Decision Tree model is. Note: Our MSE on the test data set was 5.005.

In [None]:
# convert to lists if needed
actual_3pt_attempts_2023_list = [actual_fg3a_2023]
predicted_3pt_attempts_2023_list = [predicted_fg3a_2023]

# calculate Mean Squared Error
mse_2023 = mean_squared_error(actual_3pt_attempts_2023_list, predicted_3pt_attempts_2023_list)

# print the MSE
print(f"Mean Squared Error for 2023: {mse_2023:.3f}")

### **2b** - Decision Trees Model

The code block below imputes missing values in the 'fg3a' column of the 'nba_all_stars' DataFrame with the median, then builds and trains a Decision Tree regression model using the 'year' column as a feature to predict the average number of 3-point attempts in 2023. The model is evaluated, and the predictions are visualized alongside the actual data using a scatter plot.

In [None]:
# replace missing values with '0'
nba_all_stars_22['fg3a'].fillna(0, inplace=True)

# split data into feature and target variable
X = nba_all_stars_22[['year']]
y = nba_all_stars_22['fg3a']

# split the data into training and testing sets
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.2, random_state=42)

# create a Decision Tree model
model = DecisionTreeRegressor(random_state=42)

# train the model
model.fit(X_train, y_train)

# predict the average number of 3-point attempts for 2023
year_2023 = [[2023]]
predicted_fg3a_2023 = model.predict(year_2023)

# print the predicted value
print(f"Predicted Average 3-Point Attempts for 2023 NBA All Stars: {predicted_fg3a_2023[0]:.2f}")

# evaluate the model on the test set (optional)
y_pred = model.predict(X_test)
mse = mean_squared_error(y_test, y_pred)
print(f"Mean Squared Error on Test Set: {mse:.3f}\n")

# plot the decision tree predictions
plt.figure(figsize=(12, 7))
plt.scatter(X, y, color='blue', label='Actual Data')
sns.regplot(data=nba_all_stars_22, x=X_test.squeeze(), y=y_pred, scatter=False, lowess=True, line_kws={'color': 'red', 'alpha': 0.75}, label='Decision Tree Predictions')
plt.scatter(year_2023, predicted_fg3a_2023, color='gold', marker='*', s=200, label='Predicted for 2023')

# add the predicted value slightly to the right of the marker
offset = 0.5
plt.text(year_2023[0][0] + offset, predicted_fg3a_2023[0], f'{predicted_fg3a_2023[0]:.2f}', ha='left', va='center')

# labels and title
plt.xlabel('Year')
plt.ylabel('Average 3 Pointers Attempted')
plt.title('Decision Tree Regression for Average 3 Pointers Attempted Over the Years')
plt.legend()
plt.show()

Now that we have found the actual average FG3As from the 2023 NBA All Stars, let's figure out the MSE (Mean Square Error) to see how accurate the Decision Tree model is. Note: Our MSE on the test data set was 5.25. 

In [None]:
# convert to lists if needed
actual_3pt_attempts_2023_list = [actual_fg3a_2023]
predicted_3pt_attempts_2023_list = [predicted_fg3a_2023]

# calculate Mean Squared Error
mse_2023 = mean_squared_error(actual_3pt_attempts_2023_list, predicted_3pt_attempts_2023_list)

# print the MSE
print(f"Mean Squared Error for 2023: {mse_2023:.3f}")

### **2c** - Neural Networks

In [None]:
# replace missing values with '0'
nba_all_stars_22['fg3a'].fillna(0, inplace=True)

# split data into feature and target variable
X = nba_all_stars_22[['year']]
y = nba_all_stars_22['fg3a']

# split the data into training and testing sets
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.2, random_state=42)

# create a StandardScaler object
scaler = StandardScaler()
X_train_scaled = scaler.fit_transform(X_train)
X_test_scaled = scaler.transform(X_test)

# build neural network model
model = Sequential()
model.add(Dense(10, input_dim=1, activation='relu'))
model.add(Dense(1, activation='linear'))

# compile the model
model.compile(loss='mean_squared_error', optimizer='adam')

# train the model
model.fit(X_train_scaled, y_train, epochs=50, batch_size=32, verbose=0)
# model.fit(X_train_scaled, y_train, epochs=100, batch_size=32, validation_data=(X_test_scaled, y_test))

# predict the average number of 'fg3a' for the year 2023
year_2023 = np.array([[2023]])
year_2023_scaled = scaler.transform(year_2023)
predicted_fg3a_2023 = model.predict(year_2023_scaled)[0, 0]

# print the predicted value
print(f"Predicted Average fg3a for 2023: {predicted_fg3a_2023:.2f}")

# evaluate the model on the test set (optional)
y_pred = model.predict(X_test_scaled).flatten()
mse = mean_squared_error(y_test, y_pred)
print(f"Mean Squared Error on Test Set: {mse:.3f}\n")

# plot the neural network predictions
plt.figure(figsize=(10, 6))
plt.scatter(X, y, color='blue', label='Actual Data')
plt.plot(X_test, y_pred, color='red', label='Neural Network Predictions')
plt.scatter(year_2023, predicted_fg3a_2023, color='gold', marker='*', s=200, label='Predicted for 2023')

# label and title
plt.xlabel('Year')
plt.ylabel('Average 3 Pointers Attempted')
plt.title('Neural Network Regression for Average 3 Pointers Attempted Over the Years')
plt.legend()
plt.show()

Now that we have found the actual average FG3As from the 2023 NBA All Stars, let's figure out the MSE (Mean Square Error) to see how accurate the Neural Network model is. Note: Our MSE on the test data set was 4.9. 

In [None]:
# convert to lists if needed
actual_3pt_attempts_2023_list = [actual_fg3a_2023]
predicted_3pt_attempts_2023_list = [predicted_fg3a_2023]

# calculate Mean Squared Error
mse_2023 = mean_squared_error(actual_3pt_attempts_2023_list, predicted_3pt_attempts_2023_list)

# print the MSE
print(f"Mean Squared Error for 2023: {mse_2023:.3f}")

## Conclusion -

Linear Regression: 
- Predicted Value = 5.18
- MSE Error = 0.264

Decision Trees: 
- Predicted Value = 5.58
- MSE Error = 0.015

Neural Networks: 
- Predicted Value = 5.43
- MSE Error = 0.069