# Introduction
Bagging stands for Bootstrap Aggregation. It is an ensemble learning technique that aims to improve the performance and stability of machine learning models. The core idea is to train multiple models on different subsets of the original data and then aggregate their predictions to arrive at the final prediction.

### How does bagging work?
1. Bootstrap sampling: Multiple subsets of the original data are created with replacement (called bootstrap samples). This means data points can be chosen more than once in a single sample.
2. Model training: Each base model is trained independently on its respective bootstrap sample.
3. Aggregation:
    - After training, the predictions from all the base models are combined using a specific technique.
    - For example, averaging is one technique which calculates the mean of the individual prediction (82%, 90%. and 78%) to get the final prediction (83.33%).
    - Other common aggregation techniques include voting (for classification) and weighted averaging.

### Benefits of bagging
- Reduced variance: By averaging predictions from different models (trained on slightly different data), bagging helps in reducing the variance of the overall model. This makes the final predictions less sensitive to random fluctuations in the training data and leads to better generalization.
- Improved accuracy: Bagging can often achieve higher accuracy compared to a single model, especially for problems with high variance.

### Random Forest
- Random Forest is a popular ensemble learning technique that follows the bagging principle.
- It uses decision trees as base models and incorporates additional randomness during training by randomly selecting features at each split node in the tree.
- Random Forests are known for their effectiveness in various machine learning tasks due to their ability to reduce variance and improve accuracy.

# What Is Random Forest?
- Random Forest is an ensemble technique that relies on a multitude of decision trees for prediction.
- To improve the overall performance, the tree within the forest need to be different from each other. This prevents the ensemble from overfitting to the training data.

### Random sampling for diversity
- Row sampling (RS): During tree creation, a random subset of data points (rows) is selected from the original dataset with replacement. This means that a single data point can be chosen multiple times for a single tree.
- Column sampling (CS): At each node in a tree, a random subset of features (columns) is selected from the full set of features. The tree then splits the data based on the best split among these chosen features. This injects randomness into the tree structure.

### Building the Random Forest
The core components of a Random Forest are, DTs + RS + CS + Aggregation. The following is what each one of them mean,
- DTs: The Decision Trees that make up the forest.
- RS: Row sampling for training data selection.
- CS: Column sampling for feature selection at each node.
- Aggregation: Combining predictions from all trees (usually majority vote for classification, and averaging for regression).

### Hyperparameters in Random Forest
- `n_estimators`: This hyperparameter controls the number of trees to be created in the forest. More trees generally lead to better performance but also increase the training time.
- `max_features`: This parameter specifies the number of features randomly chosen at each node for splitting. A lower value introduces more randomness and helps prevent overfitting.

### What does "Random" in Random Forest refer to?
- The "Random" in Random Forest refers to the random sampling of rows and columns during tree creation, leading to diverse trees.
- "Forest" signifies the collection of multiple Decision Trees.

### Example
- With 10 features, a 20% random column sample (2 feature) and a 40% random row sample (400 rows from 1000) would be used to train 1 Tree.
- This process is repeated for the specified number of estimators (e.g., 100 trees).

### Aggregation and underfitting
- Random Forests typically use majority voting for classification (the most frequent class predicted by the Trees wins). Averaging is used for regression.
- Individual Trees in a Random Forest might be underfit due to limited data and random features at each split. However, the ensemble combines these weaker learners to create a more robust model.

# Random Forest Algorithm
Random Forest is a powerful ensemble learning technique widely used for classification and regression tasks. It leverages the strengths of multiple Decision Trees to create a robust and accurate model.

### Core idea
- Random Forests build a collection of Decision Trees, each trained on a random subset of the data. This creates diversity among the trees, preventing them from overfitting to the training data.
- When making a prediction, the forest combines the predictions from all the individual trees using a voting mechanism for classification or averaging for regression. This reduces the variance of the model and improves its generalization ability.

### Key components
1. Decision Trees: These are the building blocks of the Forest. Each Tree learns a set of rules for classifying or predicting based on the features in the data.
2. Random sampling:
    - Row sampling (Bootstrap aggregation): During Tree creation, a random subset of data points (rows) are selected with replacement from the original dataset. This means a data point can be included in a tree multiple times.
    - Column sampling (Feature subsetting): At each node in a Tree, a random subset of features (columns) are chosen from the full set. The Tree then splits the data based ob the best split among these chosen features. This injects randomness into the Tree structure.
3. Aggregation: This refers to how th predictions from all the Trees are combined to make a final prediction.
    - Classification: Majority voting is typically used. The class predicted by most Trees becomes the final prediction.
    - Regression: Averaging the predicted values from all Trees in common.

### Building the Random Forest
1. Define the parameters: Specify the number of Trees (`n_estimators`) to create and the number of features (`max_features`) to randomly choose at each node for splitting.
2. Create Trees:
    - Draw a bootstrap sample of data points.
    - Build a Decision Tree using the sampled data and chosen features (`max_features`) at each node.
    - Repeat the above 2 steps for the specified number of Trees (`n_estimators`).
3. Prediction:
    - For a new data point, make predictions using each Tree in the forest.
    - Combine the individual predictions using the chosen aggregation method (voting or averaging).

### Advantages of Random Forests
- Improved accuracy and generalizability: By combining multiple diverse Trees, Random Forests often achieves higher accuracy than a single Decision Tree and are less prone to overfitting.
- Handles missing data well: Decision Trees can naturally handle missing data by splitting on existing features.
- Robust to outliers: Ensemble methods like Random Forests are less sensitive to outliers in the data compared to single models.
- Provides feature importance: Random Forests can be used to understand the relative importance of features in the model.

### Disadvantages of Random Forests
- Can be a black box: The inner workings of individual Trees and how they contribute to the final prediction can be difficult to interpret.
- Computationally expensive: Training a Random Forest with many Trees can be computationally expensive compared to simpler models.

# Out-Of-Bag (OOB) Score
### Core idea
- Random Forests use bootstrap aggregation (bagging) to train Decision Trees. Each Tree is trained on a random subset (m rows) of the original data (n rows).
- The remaining data points (n - m rows) that are not used to train a paticular Tree are called Out-Of-Bag (OOB) samples for that Tree.

### OOB for evaluation
Individual Tree evaluation: OOB samples can be used to evaluate the performance of each Tree in the Forest.
- The trained Tree predicts the class/ value for each OOB sample.
- These predictions are compared with the actual values to calculate the error (e.g., difference for classification, squared difference for regression).

### Overall OOB score
- Average error: The OOB score is the average of the errors calculated for all OOB samples across all Trees in the Forest.
- Cross-validation alternative: This provides an estimate of the model's generalization performance without needing a separate cross-validation process.

### Benefits of OOB score
- Efficiency: Since OOB samples are readily available during forest training, calculating the OOB score is computationally efficient.
- Reduced bias: OOB evaluation avoids bias that can occur when using a separate validation set, as each data point is left out for evaluation at least once.

### Example
- Consider a dataset with 7 data points (A-F) and 3 Trees ($m_1$, $m_2$, $m_3$).
- The OOB data for each model might be,
    - $m_1$: B, C
    - $m_2$: A, E, F
    - $m_3$: C, D
- For data point C (present in OOB for $m_1$ and $m_3$),
    - Predictions can be made using both the models ($m_1(C)$ and $m_3(C)$).
    - Average these predictions ($P(C) = \frac{m_1(C) + m_3(C)}{2}$).
    - Compare this average prediction with the actual value of $C$ ($y_{actual}(C)$) to calculate the error ($error_C = y_{actual}(C) - P(C)$).

### Overall OOB error
We repeat the above for all data points and all Trees where there are OOB samples. The OOB error is the sum of these individual errors across all data points,
    - $error_{OOB} = \sum_{i = 1}^n(error_i)$.

### OOB score v. actual score
While OOB score provides a good estimate of performance, it might not always match an explicitly calculated cross-validation score due to random sampling variations.