**Operations Research in Action &#x25aa; Fall 2024**

# Project 2 &ndash; Model and Results &ndash; Part 2

How can we create the __adjusted plus-minus__ for each player? 
In other words, how can we measure the relative contribution of each player when they are on the court?

__Read in the data we created in Part 1.__

In [1]:
# Solution
stint_df <- read.csv('stint_data_wide_pm.csv')
head(stint_df)

Unnamed: 0_level_0,game_id,stint_id,h_team,a_team,minutes,h_goals,a_goals,USA_p1,USA_p2,USA_p3,⋯,Chile_p4,Chile_p5,Chile_p6,Chile_p7,Chile_p8,Chile_p9,Chile_p10,Chile_p11,Chile_p12,pm
Unnamed: 0_level_1,<int>,<int>,<chr>,<chr>,<dbl>,<int>,<int>,<int>,<int>,<int>,⋯,<int>,<int>,<int>,<int>,<int>,<int>,<int>,<int>,<int>,<dbl>
1,1,1,USA,Japan,4.252969,4,9,1,0,1,⋯,0,0,0,0,0,0,0,0,0,-1.1756493
2,1,2,USA,Japan,5.688809,6,11,1,0,0,⋯,0,0,0,0,0,0,0,0,0,-0.8789186
3,1,3,USA,Japan,1.149557,0,1,1,0,0,⋯,0,0,0,0,0,0,0,0,0,-0.8699006
4,1,4,USA,Japan,3.511617,7,5,0,1,0,⋯,0,0,0,0,0,0,0,0,0,0.5695382
5,1,5,USA,Japan,2.163139,7,5,0,0,0,⋯,0,0,0,0,0,0,0,0,0,0.9245824
6,1,6,USA,Japan,2.155972,0,6,0,1,0,⋯,0,0,0,0,0,0,0,0,0,-2.7829681


Here's the big idea. Let's use linear regression to predict the plus-minus of a stint based on the players on the court during that stint:

$$
\text{PM} = \beta_0 + \sum_{i = 1}^{k} \beta_i \text{Player}_i + \varepsilon, \qquad \varepsilon \sim N(0, \sigma_{\varepsilon}^2)
$$

where $k$ is the total number of players, and 

$$
\text{Player}_i = \begin{cases}
+1 & \text{if player $i$ is on the court for the home team during the stint}\\
-1 & \text{if player $i$ is on the court for the away team during the stint} \\
0 & \text{otherwise} \\
\end{cases}
\quad \text{for } i = 1,\dots,k
$$

We will then use the estimated coefficients of the fitted model as the adjusted plus-minus values of each player.

__Fit the model above using R.__

Some R tips:

- In a model formula, you can include all the other variables in a data frame with `.`.
- You can remove variables in a dataframe with `-`.

In [2]:
# Solution
fit <- lm(
    pm
    ~
    .
    - game_id
    - stint_id
    - h_team
    - a_team
    - minutes
    - h_goals
    - a_goals,
    data = stint_df
)

summary(fit)


Call:
lm(formula = pm ~ . - game_id - stint_id - h_team - a_team - 
    minutes - h_goals - a_goals, data = stint_df)

Residuals:
     Min       1Q   Median       3Q      Max 
-30.5347  -0.7407   0.0015   0.7167  30.9460 

Coefficients: (1 not defined because of singularities)
                  Estimate Std. Error t value Pr(>|t|)    
(Intercept)        0.09383    0.01670   5.617 2.01e-08 ***
USA_p1             0.35910    0.17953   2.000 0.045512 *  
USA_p2             0.03592    0.18292   0.196 0.844341    
USA_p3             0.44614    0.20868   2.138 0.032560 *  
USA_p4            -0.69949    0.22799  -3.068 0.002163 ** 
USA_p5            -0.28498    0.14463  -1.970 0.048827 *  
USA_p6            -0.13921    0.19510  -0.714 0.475542    
USA_p7             0.27014    0.14779   1.828 0.067615 .  
USA_p8             0.29755    0.15296   1.945 0.051782 .  
USA_p9             0.01953    0.16174   0.121 0.903897    
USA_p10           -0.11514    0.16481  -0.699 0.484824    
USA_p11      

__Looking at the output, which player is the reference player?__

_Write your answer here. Double-click to edit._

_Solution._ Chile_p12.

__What is the predicted plus-minus in a stint with USA_p1, USA_p2, USA_p3, USA_p4 as the home team, and Canada_p9, Canada_p10, Canada_p11, Canada_p12 as the away team? 
Use the code cell below as a calculator.__

In [3]:
# Solution
0.09383 + (1 * 0.35910) + (1 * 0.03592) + (1 * 0.44614) + (1 * -0.69949) + (-1 * 0.36332) + (-1 * -0.18106) + (-1 * 0.45068) + (-1 * -0.38305) 

__From Team USA's perspective, did USA_p4 have a positive or negative efffect on the predicted plus-minus value of this stint?__

_Write your answer here. Double-click to edit._

_Solution._
USA_p4 contributes -0.69949 to the plus-minus value of the stint. Since USA is the home team, USA_p4 has a negative effect. 

__From Team Canada's perspective, did Canada_p12 have a positive or negative effect on the predicted plus-minus value of this stint?__

_Write your answer here. Double-click to edit._

_Solution._ Canada_p12 contributed 0.38305 to the plus-minus value of the stint. Since Canada is the away team, Canada_p12 also has a negative effect.

__Based on your answers above, explain why it makes sense to use the estimated coefficients of the fitted model as the adjusted plus-minus values for each player.__

_Write your answer here. Double-click to edit._

_Solution._ The estimated coefficients measure how much each player contributes to the plus-minus of a stint, relative to the reference player, who is Chile_p12 in this case.
This is a measure of the relative contributions of each player.

__Output the estimated coefficients to a CSV file.__

Some R tips:
- You can create a data frame of the coefficients like this: `coefs_df <- data.frame(coef(fit))`.
- You can write a CSV file in R using `write.csv()` - Google for the documentation.

In [4]:
# Solution
coefs_df <- data.frame(coef(fit))
write.csv(coefs_df, 'coefs.csv')