<!-- TOC -->

- [5 Resampling Methods](#5-resampling-methods)
    - [5.1 Cross-Validation](#51-cross-validation)
        - [5.1.1 The Validation Set Approach](#511-the-validation-set-approach)
        - [5.1.2 Leave-One-Out Cross-Validation](#512-leave-one-out-cross-validation)
        - [5.1.3 k-Fold Cross-Validation](#513-k-fold-cross-validation)
        - [5.1.4 Bias-Variance Trade-Off for k-Fold Cross-Validation](#514-bias-variance-trade-off-for-k-fold-cross-validation)
    - [5.2 The Bootstrap](#52-the-bootstrap)
    - [5.3 Lab: Cross-Validation and the Bootstrap](#53-lab-cross-validation-and-the-bootstrap)
        - [5.3.1 The Validation Set Approach](#531-the-validation-set-approach)
        - [5.3.2 Leave-One-Out Cross-Validation](#532-leave-one-out-cross-validation)
        - [5.3.3 k-Fold Cross-Validation](#533-k-fold-cross-validation)
        - [5.3.4 The Bootstrap](#534-the-bootstrap)

<!-- /TOC -->

# 5 Resampling Methods
## 5.1 Cross-Validation
### 5.1.1 The Validation Set Approach
The validation set approach, displayed in Figure 5.1, is a very simple strategy for this task. It involves randomly dividing the available set of observations into two parts, a **training set** and a **validation set** or **hold-out set**. The model is fit on the training set, and the fitted model is used to predict the responses for the observations in the validation set. 

![](http://ou8qjsj0m.bkt.clouddn.com//17-12-16/9297548.jpg)

The validation set approach is conceptually simple and is easy to implement. But it has two potential drawbacks:

1. As is shown in the right-hand panel of Figure 5.2, the validation estimate of the test error rate can be highly variable.
1. In the validation approach, only a subset of the observations—those that are included in the training set rather than in the validation set—are used to fit the model. Since statistical methods tend to perform worse when trained on fewer observations, this suggests that the validation set error rate may tend to **overestimate** the test error rate for the model fit on the entire data set.

![](http://ou8qjsj0m.bkt.clouddn.com//17-12-16/58592017.jpg)

### 5.1.2 Leave-One-Out Cross-Validation
![](http://ou8qjsj0m.bkt.clouddn.com//17-12-16/71374411.jpg)

The **Leave-one-out cross-validation** (LOOCV) estimate for the test MSE is the average of these n test error estimates:

$CV_{(n)}=\frac{1}{n}\sum_{i=1}^n MSE_i.\ (5.1)$

### 5.1.3 k-Fold Cross-Validation
![](http://ou8qjsj0m.bkt.clouddn.com//17-12-16/85478205.jpg)

$CV_{(K)}=\frac{1}{K}\sum_{i=1}^K MSE_i.\ (5.3)$

![](http://ou8qjsj0m.bkt.clouddn.com//17-12-16/31781928.jpg)

### 5.1.4 Bias-Variance Trade-Off for k-Fold Cross-Validation
There is a bias-variance trade-off associated with the choice of k in k-fold cross-validation. Typically, given these considerations, one performs k-fold cross-validation using k = 5 or k = 10, as these values have been shown empirically to yield test error rate estimates that suffer neither from excessively high bias nor from very high variance.

## 5.2 The Bootstrap
Suppose that we wish to invest a fixed sum of money in two financial assets that yield returns of X and Y , respectively, where X and Y are random quantities. We will invest a fraction α of our money in X, and will invest the remaining 1 − α in Y . Since there is variability associated with the returns on these two assets, we wish to choose α to minimize the total risk, or variance, of our investment. In other words, we want to minimize Var(αX + (1 − α)Y ). One can show that the value that minimizes the risk is given by

$\alpha=\frac{\sigma_Y^2-\sigma_{XY}}{\sigma_X^2+\sigma_Y^2-2\sigma_{XY}},\ (5.6)$

- $\sigma_X^2$ =Var(X)
- $\sigma_Y^2$ =Var(Y)
- $\sigma_{XY}$ =Cov(X,Y)

We can then estimate the value of α that minimizes the variance of our investment using

$\hat{\alpha}=\frac{\hat{\sigma}_Y^2-\hat{\sigma}_{XY}}{\hat{\sigma}_X^2+\hat{\sigma}_Y^2-2\hat{\sigma}_{XY}},\ (5.7)$

![](http://ou8qjsj0m.bkt.clouddn.com//17-12-16/2601333.jpg)

To estimate the standard deviation of $\hat{\alpha}$, we repeated the process of simulating 100 paired observations of X and Y , and estimating α using (5.7), 1,000 times. We thereby obtained 1,000 estimates for α, which we can call $\hat{\alpha_1}, \hat{\alpha_2}, \cdots, \hat{\alpha_{1000}}$. The mean over all 1,000 estimates for α is

$\hat{\alpha}=\frac{1}{1000}\sum_{r=1}^{1000} \hat{\alpha_r}=0.5996$

very close to α = 0.6, and the standard deviation of the estimates is

$\sqrt{\frac{1}{1000-1}\sum_{r=1}^{1000}(\hat{\alpha_r}-\bar{\alpha})^2}=0.083.$

$SE(\hat{\alpha)} \approx 0.083.$

So roughly speaking, for a random sample from the population, we would expect $\hat{\alpha}$ to differ from α by approximately 0.08, on average.

![](http://ou8qjsj0m.bkt.clouddn.com//17-12-16/19263120.jpg)

The sampling is performed with **replacement**, which means that the same observation can occur more than once in the bootstrap data set.

$SE_{B(\hat{\alpha})}=\sqrt{\frac{1}{B-1}\sum_{r=1}^B(\hat{\alpha^{*r}}-\frac{1}{B}\sum_{r'=1}^B \hat{\alpha^{*r'}})^2}\ (5.8)$

## 5.3 Lab: Cross-Validation and the Bootstrap
### 5.3.1 The Validation Set Approach
We begin by using the sample() function to split the set of observations sample()
into two halves, by selecting a random subset of 196 observations out of the original 392 observations. 

In [1]:
library(ISLR)
set.seed(1)
train=sample(392,196)

In [2]:
lm.fit=lm(mpg~horsepower ,data=Auto,subset=train)

the **mean()** function to calculate the MSE of the 196 observations in the validation set.

In [3]:
attach(Auto)
mean((mpg-predict(lm.fit, Auto))[-train]^2)

In [4]:
lm.fit2=lm(mpg~poly(horsepower ,2),data=Auto,subset=train)
mean((mpg-predict(lm.fit2,Auto))[-train]^2)

In [5]:
lm.fit3=lm(mpg~poly(horsepower ,3),data=Auto,subset=train)
mean((mpg-predict(lm.fit3,Auto))[-train]^2)

If we choose a different training set instead, then we will obtain somewhat different errors on the validation set.

In [6]:
set.seed(2)
train=sample(392,196)
lm.fit=lm(mpg~horsepower, subset=train)
mean((mpg-predict(lm.fit,Auto))[-train]^2)

In [7]:
lm.fit2=lm(mpg~poly(horsepower ,2),data=Auto,subset=train)
mean((mpg-predict(lm.fit2,Auto))[-train]^2)

In [8]:
lm.fit3=lm(mpg~poly(horsepower ,3),data=Auto,subset=train)
mean((mpg-predict(lm.fit3,Auto))[-train]^2)

### 5.3.2 Leave-One-Out Cross-Validation

In [9]:
glm.fit=glm(mpg~horsepower ,data=Auto)
coef(glm.fit)

In [10]:
lm.fit=lm(mpg~horsepower ,data=Auto)
coef(lm.fit)

The **cv.glm()** function is part of the **boot** library.

In [13]:
library(boot)
glm.fit=glm(mpg~horsepower ,data=Auto)
cv.err=cv.glm(Auto,glm.fit)
cv.err$delta

In [14]:
cv.error=rep(0,5)
for (i in 1:5) {
    glm.fit=glm(mpg~poly(horsepower ,i),data=Auto)
    cv.error[i]=cv.glm(Auto,glm.fit)$delta[1]
}
cv.error

### 5.3.3 k-Fold Cross-Validation
The **cv.glm()** function can also be used to implement k-fold CV.

In [15]:
set.seed(17)
cv.error.10=rep(0,10)
for (i in 1:10) {
    glm.fit=glm(mpg~poly(horsepower ,i),data=Auto)
    cv.error.10[i]=cv.glm(Auto,glm.fit,K=10)$delta[1]
}
cv.error.10

### 5.3.4 The Bootstrap
#### Estimating the Accuracy of a Statistic of Interest
To illustrate the use of the bootstrap on this data, we must first create a function, **alpha.fn()**, which takes as input the (X,Y) data as well as a vector indicating which observations should be used to estimate α. The function then outputs the estimate for α based on the selected observations.

In [16]:
alpha.fn=function(data,index) {
    X=data$X[index]
    Y=data$Y[index]
    return((var(Y)-cov(X,Y))/(var(X)+var(Y)-2*cov(X,Y)))
}

In [17]:
alpha.fn(Portfolio,1:100)

The next command uses the sample() function to randomly select 100 ob- servations from the range 1 to 100, with replacement.

In [18]:
set.seed(1)
alpha.fn(Portfolio,sample(100,100,replace=T))

The boot() function automates boot() this approach. Below we produce R = 1, 000 bootstrap estimates for α.

In [19]:
boot(Portfolio,alpha.fn,R=1000)


ORDINARY NONPARAMETRIC BOOTSTRAP


Call:
boot(data = Portfolio, statistic = alpha.fn, R = 1000)


Bootstrap Statistics :
     original        bias    std. error
t1* 0.5758321 -7.315422e-05  0.08861826

The final output shows that using the original data, $\hat{\alpha} = 0.5758$, and that the bootstrap estimate for $SE(\hat{\alpha})$ is 0.0886.

#### Estimating the Accuracy of a Linear Regression Model

In [20]:
boot.fn=function(data,index) {
    return(coef(lm(mpg~horsepower ,data=data,subset=index)))
}
boot.fn(Auto,1:392)

The **boot.fn()** function can also be used in order to create bootstrap estimates for the intercept and slope terms by randomly sampling from among the observations with replacement. 

In [21]:
set.seed(1)
boot.fn(Auto,sample(392,392,replace=T))

In [22]:
boot.fn(Auto,sample(392,392,replace=T))

Next, we use the boot() function to compute the standard errors of 1,000 bootstrap estimates for the intercept and slope terms.

In [23]:
boot(Auto, boot.fn, 1000)


ORDINARY NONPARAMETRIC BOOTSTRAP


Call:
boot(data = Auto, statistic = boot.fn, R = 1000)


Bootstrap Statistics :
      original      bias    std. error
t1* 39.9358610  0.02972191 0.860007896
t2* -0.1578447 -0.00030823 0.007404467

In [24]:
summary(lm(mpg~horsepower ,data=Auto))$coef

Unnamed: 0,Estimate,Std. Error,t value,Pr(>|t|)
(Intercept),39.935861,0.717498656,55.65984,1.2203619999999999e-187
horsepower,-0.1578447,0.006445501,-24.48914,7.031989000000001e-81


Below we compute the bootstrap standard error estimates and the stan- dard linear regression estimates that result from fitting the quadratic model to the data.

In [25]:
boot.fn=function(data,index) {
    coefficients(lm(mpg~horsepower+I(horsepower^2),data=data,subset=index))
}
set.seed(1)
boot(Auto, boot.fn, 1000)


ORDINARY NONPARAMETRIC BOOTSTRAP


Call:
boot(data = Auto, statistic = boot.fn, R = 1000)


Bootstrap Statistics :
        original        bias     std. error
t1* 56.900099702  6.098115e-03 2.0944855842
t2* -0.466189630 -1.777108e-04 0.0334123802
t3*  0.001230536  1.324315e-06 0.0001208339

In [26]:
summary(lm(mpg~horsepower+I(horsepower^2),data=Auto))$coef

Unnamed: 0,Estimate,Std. Error,t value,Pr(>|t|)
(Intercept),56.900099702,1.8004268063,31.60367,1.740911e-109
horsepower,-0.46618963,0.0311246171,-14.97816,2.289429e-40
I(horsepower^2),0.001230536,0.0001220759,10.08009,2.19634e-21
