### **Bagging**

Bootstrap aggregation, or bagging, is a general procedure for reducing the variance of a statistical learning model. Given $n$ independent observations $Z_1,Z_2,...,Z_n$ each with variance $\sigma^2$, recall that

$$Var(Z_1+Z_2+...+Z_n)$$

$$Var(Z_1)+Var(Z_2)+...Var(Z_n)$$

$$\frac{\sigma^2}{n}$$

Averaging a set of observations reduces variance. **Within the context of machine learning, bagging is the repeated sampling with replacement from the training data to generate $B$ boostrapped training sets.** The model is trained on each bootstrapped set $b \in B$ to yield predictions $\hat {f}^{*b}(x)$. Then, the $B$ resulting predictions are averaged to yield

$$\hat{f}_{bag}(x)=\frac{1}{B}\sum^B_{b=1}\hat{f}^{*b}(x)$$

**Bagged Regression Trees**

$B$ trees are constructed from $B$ bootstrapped training sets and the resulting predictions are averaged. The trees are grown deep without pruning, such that each individual tree has high variance but low bias. The variance is reduced by taking the average of the $B$ trees.

**Bagged Classification Trees**

For a given test observation, the class predicted by each of the $B$ trees is recorded and the **majority vote** is taken: the prediction is the most common class among the $B$ predictions. In general, although the test error rate is a function of $B$, **the size of $B$ is not a critical parameter in bagging**; a large $B$ doesn't lead to overfitting. $B$ must be sufficiently large for the error to "settle down."

### **Out-of-Bag Estimation**

Each bagged tree has been shown to only use around two-thirds of the training data. The remaining one-third of observations is known as the out-of-bag ($OOB$) observations. 

Thus, the $ith$ observation can be predicted by the bagged trees where it was $OOB$ to yield $B/3$ predictions for that observation. And to obtain a single prediction for the $ith$ observation, the predictions can be averaged (for regression trees) or the majority vote can be taken (for classification trees). By taking $OOB$ predictions for all the observations, an $MSE$ or classification error can be computed. 

**This method is approximately equivalent to $LOOCV$**.