# 5 Resampling Methods

- Resampling methods are an `absolutely necessary tool` in `modern statistics`. 
- They `involve repeatedly drawing samples` from a `training set and refitting a model` of `interest` on `each sample` in `order` to `obtain additional information` about the `fitted model`. 
>__For example__, in order to `estimate the variability of a linear regression fit`, we can `repeatedly draw different samples` from the `training data`, `fit a linear regression` to `each new sample`, and then `examine` the extent to which the `resulting fits differ`. 
- Such an approach may allow us to `obtain information` that would not be `available from fitting the model only once using the original training sample`

***

- __Resampling approaches can be `computationally expensive`__, because they `involve fitting the same statistical method multiple times using different subsets of the training data`. 

- However, due to recent `advances` in __`computing power, the computational requirements` of resampling methods generally are not prohibitive__. 

- In this chapter, we discuss `two of the most commonly used resampling methods`, 


    1. cross-validation and 
    2. the bootstrap. 


- Both methods are important tools in the practical application of many statistical learning procedures. 
>__For example__, `cross-validation` can be used to `estimate` the __`test error`__ associated with a given `statistical learning method` in order to `evaluate its performance, or to select the appropriate level of flexibility`.

- The `process of evaluating a model’s performance` is known as ___`model assessment`___, whereas the `process of selecting the proper level of flexibility for a model` is known as ___`model selection`___. 


- The ___`bootstrap`___ is used in `several contexts`, `most commonly` to `provide a measure of accuracy` of a `parameter estimate` or of a `given statistical learning methods.

# 5.1 Cross-Validation

In <a href="http://localhost:8888/notebooks/islr-book/Chapter%202/Chapter%202.ipynb">Chapter 2</a> 

we discuss the `distinction between` __the test error rate and the training error rate__. 

- The ___`test error`___ is the average error that results from using a statistical learning method to predict the response on a new observation— that is, a measurement that was not used in training the method. 

Given a `data set`, the use of a `particular statistical learning method` is warranted if it results in a `low test error`. 

- The `test error` can be `easily calculated` if a `designated test set is available`. 

    Unfortunately, this is usually not the case.

In contrast, the ___`training error`___ can be `easily calculated` by applying the `statistical learning method` to the `observations used in its training`. 

- But as we saw in <a href="http://localhost:8888/notebooks/islr-book/Chapter%202/Chapter%202.ipynb">Chapter 2</a> , the __training error rate__ often is `quite different` from the __test error rate__, and in `particular` the former can `dramatically underestimate the latter`.

In the absence of a `very large designated test set` that can be used to `directly estimate the test error rate`, a `number of techniques can be used` to `estimate this quantity using the available training data`. 

Some methods make a `mathematical adjustment` to the `training error rate` in order to `estimate the test error rate`. 

    In this section, we instead consider a class of methods that estimate thetest error rate by holding out a subset of the training observations from the fitting process, and then applying the statistical learning method to those held out observations.

In <a href="#5.1.1-The-Validation-Set-Approach">Sections 5.1.1–<a href="#5.1.4-Bias-Variance-Trade-Off-for-k-Fold-Cross-Validation">5.1.4</a></a>, for simplicity we assume that we are `interested in performing regression with a quantitative response`. 

In <a href="#5.1.5-Cross-Validation-on-Classification-Problems">Section 5.1.5</a> we consider the case of `classification` with a `qualitative response`. As we will see, the `key concepts remain the same regardless of whether the response` is ___`quantitative or qualitative`___.

In [5]:
''' Page No 176 (190 to 440) '''

' Page No 176 (190 to 440) '

## 5.1.1 The Validation Set Approach

## 5.1.2 Leave-One-Out Cross-Validation

## 5.1.3 k-Fold Cross-Validation

## 5.1.4 Bias-Variance Trade-Off for k-Fold Cross-Validation

## 5.1.5 Cross-Validation on Classification Problems

# 5.2 The Bootstrap

# 5.3 Lab: Cross-Validation and the Bootstrap

## 5.3.1 The Validation Set Approach

## 5.3.2 Leave-One-Out Cross-Validation

## 5.3.3 k-Fold Cross-Validation

## 5.3.4 The Bootstrap

# 5.4 Exercises