# NBA Dream Team (Timothy Manolias)

### The following program predicts the amount of points a team will score in a season, based on their aggregate stats in that season. Next, we construct a 'Dream Team' from players in the league during the 2013-2019 seasons.

In [1]:
from IPython.display import Image
from IPython.core.display import HTML

Image(url='Images/Question.png', width=700)

In [2]:
'''Imports Libraries and Data.'''

from gurobipy import *
import pandas as pd
import numpy as np
import statsmodels.api as sm
from sklearn.metrics import r2_score
from sklearn.linear_model import LinearRegression

nba_stats = pd.read_csv('Data/nba-stats-2013-2019.csv')
nba_players = pd.read_csv('Data/nba-players-2018-2019-with-pos.csv')

nba_stats.head()

Unnamed: 0,Team,Year,G,X3P,X3PA,X2P,X2PA,FG,FGA,FT,FTA,ORB,DRB,AST,STL,BLK,TOV,PF,PTS,Playoffs
0,Denver Nuggets,2013,82,521,1518,2818,5465,3339,6983,1505,2148,1092,2601,2002,762,533,1253,1682,8704,1
1,Houston Rockets,2013,82,867,2369,2257,4413,3124,6782,1573,2087,909,2652,1902,679,359,1348,1662,8688,1
2,Oklahoma City Thunder,2013,82,598,1588,2528,4916,3126,6504,1819,2196,854,2725,1753,679,624,1253,1654,8669,1
3,San Antonio Spurs,2013,82,663,1764,2547,4911,3210,6675,1365,1725,666,2721,2058,695,446,1206,1427,8448,1
4,Miami Heat,2013,82,717,1809,2431,4539,3148,6348,1423,1887,676,2490,1890,710,441,1143,1533,8436,1


### Part 1: Developing a Simple Linear Regression

**Predicts Amount of Points a Team Will Score in a Season.**

In [3]:
'''Estimates Linear Regression Model.'''

train = nba_stats.loc[nba_stats['Year'] <= 2017]
test = nba_stats.loc[nba_stats['Year'] > 2017]

# Train set
y_train = train['PTS']
X_train = train.drop(['Team', 'Year', 'G', 'Playoffs', 'PTS'], axis=1)

lin_reg = LinearRegression().fit(X_train, y_train)

**Training Set $R^2$:**

In [4]:
lin_reg.score(X_train, y_train)

1.0

**Non-Zero Coefficients:**

In [5]:
coefs = [coef for coef in zip(X_train.columns, lin_reg.coef_) if np.abs(coef[1])>.00001]
coefs

[('X3P', 1.3333333333333317),
 ('X2P', 0.333333333333335),
 ('FG', 1.6666666666666665),
 ('FT', 0.9999999999999998)]

**Regression Equation:**

PTS = $1.33*$X3P + $0.33*$X2P + $1.67*$FG + $1.0*$FT

The results of this regression are not surprising because the model simply keeps the variables which have a direct effect on the amount of points a team scores. The amount of 3-pointers, 2-pointers, field goals and free throws will give the exact amount of points scored for a team. Therefore, the model perfectly predicts the amount of points a team scores, as indicated by the $R^2$ of 1.0.

### Part 2: Building a Better Linear Regression Model

- Excludes variables which directly contribute to points scored (2-pointers scored, 3-pointers scored, free throws).

**Obtains Statistically Significant Variables:**

In [6]:
'''Obtains dependent and independent variables for train set.'''

# Train set
y_train = train['PTS']
X_train = train[['X3PA', 'X2PA', 'FGA', 'FTA', 'ORB', 'DRB', 'AST', 'STL', 'BLK', 'TOV', 'PF']]

# Computes p-values for OLS
model = sm.OLS(y_train, X_train).fit()
p_values = model.summary2().tables[1]['P>|t|']
significant_vars = [coef for coef in p_values.items() if coef[1]<=0.05]

significant_vars

[('X3PA', 1.8474256120647054e-13),
 ('FGA', 6.630129120236298e-09),
 ('FTA', 9.42386128830506e-15),
 ('DRB', 0.000960016893636361),
 ('AST', 9.67373209225348e-11),
 ('TOV', 0.003568375881840509),
 ('PF', 0.04366868424827717)]

In [7]:
'''Calculates Probability of a 3-Point Attempt Being Successful.'''

lin_reg = LinearRegression().fit(X_train, y_train)
coefs = [coef for coef in zip(X_train.columns, lin_reg.coef_)]
coefs

[('X3PA', 0.34024769788726267),
 ('X2PA', 0.010794861554335344),
 ('FGA', 0.35104255944159846),
 ('FTA', 0.7812218918260065),
 ('ORB', -0.2027670064693409),
 ('DRB', 0.6106360794077946),
 ('AST', 0.9151002208220944),
 ('STL', 0.5050598541499401),
 ('BLK', 0.07974950388755275),
 ('TOV', -0.6270084081155936),
 ('PF', 0.30420044289917414)]

In [8]:
print(f'Successful 3-Point %: {100*(0.34024 + 0.35104) / 3:.2f}%')

Successful 3-Point %: 23.04%


The sum of the `X3PA` and `FGA` coefficients represents the probability of a 3-point attempt yielding a 1-point increase. Therefore, the probability of a successful 3-point attempt is the sum of the `X3PA` coefficient and the `FGA` coefficient, divided by three, which is 23.04%.

In [9]:
'''Makes Predictions on Test Set.'''

# Test set
y_test = test['PTS']
X_test = test[['X3PA', 'X2PA', 'FGA', 'FTA', 'ORB', 'DRB', 'AST', 'STL', 'BLK', 'TOV', 'PF']]

# Predictions on test set
y_preds = lin_reg.predict(X_test)
r2_score(y_test, y_preds)

0.7618057588674045

### Part 3: The Dream Team

In [10]:
Image(url='Images/Dream Team.png', width=700)

In [11]:
'''Predicts Total Points A Player Will Score.'''

# Predicts points
for i, player in nba_players.drop(['Player', 'Pos', 'Tm', 'Salary'], axis=1).iterrows():
    nba_players.at[i, 'Predicted_Points'] = sum(player.values * lin_reg.coef_)
    
# Sorts by most predicted points
nba_players.sort_values(['Predicted_Points'], ascending=False, inplace=True)
nba_players_arr = nba_players.to_numpy()

nba_players.head()

Unnamed: 0,Player,Pos,Tm,X3PA,X2PA,FGA,FTA,ORB,DRB,AST,STL,BLK,TOV,PF,Salary,Predicted_Points
202,James Harden,PG,HOU,1028,881,1909,858,66,452,586,158,58,387,244,30431854.0,2414.584733
497,Russell Westbrook,PG,OKC,411,1062,1473,451,109,698,784,142,33,325,245,35654150.0,2087.385272
179,Paul George,SF,OKC,757,857,1614,540,105,523,318,170,34,205,214,30560700.0,1869.46905
17,Giannis Antetokounmpo,PF,MIL,203,1044,1247,686,159,739,424,92,110,268,232,24157304.0,1818.805211
300,Damian Lillard,PG,POR,643,890,1533,513,68,303,551,88,34,212,148,27977689.0,1802.009201


In [12]:
'''Find Optimal Set of Players for Dream Team.'''

nPlayers = len(nba_players_arr)

# Creates model
m = Model()


# Suppresses output
m.Params.outputFlag = 0


# Adds decision variable for each available player
x = m.addVars(nPlayers, vtype=GRB.BINARY)


# Adds constraints
# Total salary can't exceed 100,000,000
m.addConstr( sum(x[p] * nba_players_arr[p][-2] for p in range(nPlayers)) <= 100_000_000 )

# Exactly three players for each position
positions = np.unique(nba_players['Pos'].values).tolist()
for pos in positions:
    m.addConstr( sum(x[p] for p in range(nPlayers) if nba_players_arr[p][1] == pos) == 3 )


# Adds objective function to maximize points
m.setObjective( sum(x[p] * nba_players_arr[p][-1] for p in range(nPlayers)), GRB.MAXIMIZE )


m.update()
m.optimize()

Academic license - for non-commercial use only - expires 2022-01-16
Using license file /Users/Nolias/gurobi.lic


**Selected Players for the Dream Team:**

In [13]:
'''Prints Dream Team.'''

dream_team = nba_players_arr[[p for p in range(nPlayers) if x[p].x > 0.5]]
sorted_dream_team = dream_team[np.argsort(dream_team[:, 1])]

for player in sorted_dream_team:
    print(f'{player[0]:25} {player[1]:10} ${player[-2]:12,}')
    
print(f'\n\nTotal Salary: ${sum([p[-2] for p in sorted_dream_team]): ,}')
print(f'\nTotal Points: {lin_reg.intercept_ + sum([p[-1] for p in sorted_dream_team]):,.0f}')

Karl-Anthony Towns        C          $ 7,839,435.0
Nikola Vucevic            C          $12,750,000.0
Domantas Sabonis          C          $ 2,659,800.0
Giannis Antetokounmpo     PF         $24,157,304.0
Pascal Siakam             PF         $ 1,544,951.0
Kyle Kuzma                PF         $ 1,689,840.0
Kemba Walker              PG         $12,000,000.0
Trae Young                PG         $ 5,356,440.0
De'Aaron Fox              PG         $ 5,470,920.0
Jayson Tatum              SF         $ 6,700,800.0
Cedi Osman                SF         $ 2,775,000.0
Justise Winslow           SF         $ 3,448,926.0
Luka Doncic               SG         $ 6,560,640.0
Donovan Mitchell          SG         $ 3,111,480.0
Devin Booker              SG         $ 3,314,365.0


Total Salary: $ 99,379,901.0

Total Points: 20,578


### Analysis

The dream team is predicted to score a total of 20,578 points in the upcoming season (286 points per game). This is clearly infeasible and unlikely to happen, given that not all of the 'star' players that were selected will have as many opportunities to shoot the ball as they currently do; there are only so many available shot attempts for a team each game.

In the future, a better approach to building a dream team is one that accounts for other aspects of basketball, as opposed to simply maximizing the amount of points. A better model can include a defensive component where we also account for blocks, steals, defensive rebounds, etc.