**SM339 &#x25aa; Applied Statistics &#x25aa; Spring 2024 &#x25aa; Uhan**

# Lesson 16. Assessing Multiple Linear Regression Models &mdash; Part 2

## Example

After accounting for the size of a house, is its price related to its proximity to bike trails?

Use the `RailsTrails` data in the `Stat2Data` package to fit a multiple linear regression model predicting $\mathit{Price2014}$ (price in thousands of dollars) from $\mathit{SquareFeet}$ (size of house, in thousands of $\text{ft}^2$) and $\mathit{Distance}$ (miles to nearest bike trail).
Assume that the regression conditions are met.

In [1]:
library(Stat2Data)
data(RailsTrails)

We fit a multiple linear regression model with $\mathit{Price2014}$ as the response variable, and $\mathit{SquareFeet}$ and $\mathit{Distance}$ as the explanatory variables:

In [2]:
fit <- lm(Price2014 ~ SquareFeet + Distance, data = RailsTrails)
summary(fit)


Call:
lm(formula = Price2014 ~ SquareFeet + Distance, data = RailsTrails)

Residuals:
    Min      1Q  Median      3Q     Max 
-152.15  -30.27   -4.14   25.75  337.93 

Coefficients:
            Estimate Std. Error t value Pr(>|t|)    
(Intercept)   78.985     25.607   3.085  0.00263 ** 
SquareFeet   147.920     12.765  11.588  < 2e-16 ***
Distance     -15.788      7.586  -2.081  0.03994 *  
---
Signif. codes:  0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1

Residual standard error: 65.55 on 101 degrees of freedom
Multiple R-squared:  0.6574,	Adjusted R-squared:  0.6506 
F-statistic: 96.89 on 2 and 101 DF,  p-value: < 2.2e-16


We can compute a 95\% confidence interval for the coefficient of $\mathit{Distance}$ using information from the `summary()` output:

In [3]:
df = 101
alpha = 0.05

t <- qt(1 - alpha / 2, df = df)

ci_lower <- -15.788 - t * 7.586
ci_upper <- -15.788 + t * 7.586

ci_lower
ci_upper

We can also use the R function `confint()` to accomplish the same thing. 

(In fact, we get confidence intervals for the other coefficients too!)

In [4]:
confint(fit, level = 0.95)

Unnamed: 0,2.5 %,97.5 %
(Intercept),28.188,129.7824868
SquareFeet,122.59754,173.2421396
Distance,-30.83709,-0.7397968


Unfortunately, the R function `anova()` does not produce the kind of ANOVA table we want. However, we can compute the individual parts of the ANOVA table with the following code below.

Note that `predict(fit)` predicts the response variable values of the observations in the data set used to fit the model; in other words, it computes the $\hat{y}_i$ values.

In [5]:
y <- RailsTrails$Price2014
n <- 104
k <- 2

SSModel <- sum((predict(fit) - mean(y))^2)
SSE <- sum((y - predict(fit))^2)
SSTotal <- SSModel + SSE

MSModel <- SSModel / k
MSE <- SSE / (n - (k + 1))

F <- MSModel / MSE

In [6]:
SSModel
SSE
SSTotal

MSModel
MSE

F