**SM339 &#x25aa; Applied Statistics &#x25aa; Spring 2024 &#x25aa; Uhan**

# Lesson 17. Assessing Multiple Linear Regression Models, cont. 

## The coefficient of (multiple) determination $R^2$

- Recall the multiple linear regression model:

$$ Y = \beta_0 + \beta_1 X_1 + \cdots + \beta_k X_k + \varepsilon \quad \text{where} \quad \varepsilon \sim \text{iid } N(0, \sigma_{\varepsilon}^2) $$ 

- Question: __How much of the variation in $Y$ is explained by the model?__

- Formula:

$$ R^2 = \frac{\text{variability explained by model}}{\text{total variability in $Y$}} = \frac{\mathit{SSModel}}{\mathit{SSTotal}} = 1 - \frac{\mathit{SSE}}{\mathit{SSTotal}} $$

- Interpretation:

    - Suppose we fit a multiple linear regression model and the $R^2$ value is 0.75
    
    - Then we would say
        > The linear regression model with explanatory variables <mark>list of explanatory variables</mark> explains 75% of the variability in the response variable <mark>response variable</mark>.

- Things to note:

    - We call $R^2$ the coefficient of _multiple_ determination because we now have multiple predictors
    
    - For simple linear regression, we use $r^2$ (with lowercase $r$) to denote the coefficient of determination because it was the sample correlation $r$ between $X$ and $Y$ squared
    
    - That interpretation does not translate directly to the case with multiple predictors, since each predictor has its own correlation with the response variable
    
    - However, in the multiple linear regression setting, if we calculate the correlation between $y$ (observed) and $\hat{y}$ (fitted) and then square that correlation, we would obtain $R^2$

### Example 1

Continuing with the `RailsTrails` example from the previous lesson...

In [None]:
library(Stat2Data)
data(RailsTrails)

#### a.
How much of the variability in $\mathit{Price2014}$ is explained by the _simple_ linear regression model with _only_ $\mathit{SquareFeet}$ as the predictor?

*Write your notes here. Double-click to edit.*

#### b.
How much of the variability in $\mathit{Price2014}$ is explained by the _simple_ linear regression model with _only_ $\mathit{Distance}$ as the predictor?

*Write your notes here. Double-click to edit.*

#### c.
How much of the variability in  $\mathit{Price2014}$ is explained by the _multiple_ linear regression model with both $\mathit{SquareFeet}$ and $\mathit{Distance}$ as predictors?

*Write your notes here. Double-click to edit.*

## The adjusted coefficient of determination (adjusted $R^2$) 

- Adding an additional predictor to a multiple linear regression model can _never_ decrease the percentage of variability explained by that model, assuming we are using the same data

- __But... is the increase in explained variability due to important new information in the new predictor, or is it just due to chance?__

- One way to answer this question is to do a $t$-test for the new predictor's coefficient, like we did in the previous lesson

- Another less formal option is to look at the __adjusted $R^2$__, denoted by $R_{adj}^2$

- The __adjusted coefficient of determination $R_{adj}^2$__ accounts for
    
    1. the amount of variability explained by the model, __and__
    
    2. the number of predictors
    
- It includes a _penalty_ for additional predictors, so an extra predictor has to help _enough_ in order to raise $R_{adj}^2$

- $R_{adj}^2$ can go _up or down_ when a new predictor is included

- Therefore, $R_{adj}^2$ is a good way to capture models with different numbers of predictors
    - It's most useful to quickly compare lots of models

- Formula:
    $$R_{adj}^2 = 1 - \frac{\hat{\sigma}_{\varepsilon}^2}{s_{Y}^2} = 1 - \frac{ \mathit{SSE} \,/\, (n - (k + 1)) }{ \mathit{SSTotal} \,/\, (n - 1) } $$
    
    - $s_{Y}^2$ is the sample variance of $Y$

### Example 2

Continuing with the `RailsTrails` example...

#### a.
For our two predictor model of house prices, what is $R^2_{adj}$? Look for "Adjusted R-squared" in the Example 1 output.

*Write your answer here. Double-click to edit.*

#### b.
What is $R_{adj}^2$ for the simple linear regression model using only $\mathit{SquareFeet}$ as a predictor of $\mathit{Price2014}$?

*Write your answer here. Double-click to edit.*

#### c.
Based on your answers above, which model seems to be better?

*Write your answer here. Double-click to edit.*

## Confidence and prediction intervals for response

- Like with simple linear regression, we often would like to use our multiple linear regression model to make predictions

- We may want to predict:

    - The __mean response__ for a particular set of predictor values
    
    - A future __individual response__ for a particular set of predictor values

### Confidence interval for response

- To estimate the mean of $Y$ when $X_1 =x_1^*$, $X_2 = x_2^*$, etc.,  we use a __$100(1 - \alpha)$% confidence interval__ for $\mu_Y$:

$$\hat{y} \pm t_{\alpha / 2, n - (k + 1)} \mathit{SE}_{\hat{\mu}}$$

- Interpretation:
    > We are <mark>$100(1 - \alpha)$%</mark> confident that the average <mark>response</mark> for all <mark>observational units</mark> with <mark>predictor values $x_1^*, \dots, x_k^*$</mark> is between <mark>lower endpoint of CI</mark> and <mark>upper endpoint of CI</mark> <mark>units</mark>

    - Rephrase the highlighted parts so that it matches the context of the problem

- This means that, with repeated construction and use, the procedure of forming a CI will capture the true $\mu_Y$ for the predictor values $x_1^*, \dots, x_k^*$ $100(1 - \alpha)$% of the time

### Example 3

Continuing with the `RailsTrails` example...

#### a.
Use the fitted model equation directly to predict the price (in 2014) of a home that is 1800 square feet and 1.2 miles from a bike trail.

*Write your notes here. Double-click to edit.*

#### b.
Construct and interpret a 90% confidence interval for the __average__ price of all 1800 square foot houses that are 1.2 miles from a bike trail.

*Write your notes here. Double-click to edit.*

### Prediction interval for response

- To estimate a future individual response $y$ when $X_1 =x_1^*$, $X_2 = x_2^*$, etc.,  we use a __$100(1 - \alpha)$% prediction interval__ for $\hat{y}$:

$$ \hat{y} \pm t_{\alpha/2, n - (k + 1)} \mathit{SE}_{\hat{y}} $$

- Interpretation:
    > We are <mark>$100(1 - \alpha)$%</mark> confident that the <mark>response</mark> of a particular <mark>observational unit</mark> with <mark>predictor values $x_1^*, \dots, x_k^*$</mark> is between <mark>lower endpoint of CI</mark> and <mark>upper endpoint of CI</mark> <mark>units</mark>

    - Rephrase the highlighted parts so that it matches the context of the problem 

- This means that, with repeated construction and use, the procedure of forming a PI will capture the actual $y$ for the predictor values $x_1^*, \dots, x_k^*$ $100(1 - \alpha)$% of the time

### Example 4

Continuing with the `RailsTrails` example...

#### a.
Construct and interpret a 90% interval predicting the price of one particular 1800 square foot house that is 1.2 miles from a bike trail.

*Write your notes here. Double-click to edit.*

#### b.
Which is wider, the 90% CI or the 90% PI?

*Write your notes here. Double-click to edit.*

## Notes about confidence intervals vs. prediction intervals for response

- The point estimate anchoring both intervals is the same: $\hat{y}$

- The prediction interval is always wider than the confidence interval, because the prediction interval uses a larger standard error $\mathit{SE}_{\hat{y}}$

- Intuitively: 
    - The PI captures more uncertainty 
    
    - In addition to uncertainty due to sampling, the PI also captures the inherent uncertainty in the response of an _individual_ data point