# Lesson 2: Deep Dive into Random Forest: From Concepts to Real-World Application


## Introduction
Welcome to our journey into the heart of ensemble machine learning with the Random Forest algorithm. As an extension of decision trees, Random Forests operate a multitude of trees, creating a "forest." This lesson will equip you to understand and implement a basic Random Forest in Python, focusing on the nuances of tree construction and aggregation within a forest. Let's get started!

## Understanding the Random Forest
Random Forest is a robust machine learning ensemble that builds upon many decision trees to solve regression and classification tasks. Each tree 'votes' for a particular class prediction, and the class with the majority votes becomes the final prediction of our model.

Random Forests rely significantly on specific core hyperparameters:
- **`n_trees`**: The number of trees in the forest. Increasing `n_trees` generally improves performance but adds computational cost.
- **`max_depth`**: Controls the depth or levels of individual trees.
- **`random_state`**: Introduces an element of randomness into the feature selection and bootstrapping processes when creating each tree.

## Building Trees: Fostering Uniqueness
A decision tree, the foundational building block of a Random Forest, embraces a structure akin to a flowchart, with branches that denote decision points and leaves that represent class outcomes. A Random Forest's strength lies in its trees' diversification, each tree constructed uniquely to ensure variety in the forest.

## Implementing the Random Forest in Python
Implementing our Random Forest begins by importing the necessary libraries:

```python
import numpy as np
from scipy import stats
from sklearn.tree import DecisionTreeClassifier
from sklearn.metrics import accuracy_score
```

We initialize our `RandomForest` class with `__init__`, creating attributes for `n_trees`, `max_depth`, `random_state`, an empty list of trees, and a list of unique `random_states` for each tree:

```python
class RandomForest:
    def __init__(self, n_trees=100, max_depth=None, random_state=None):
        self.n_trees = n_trees
        self.max_depth = max_depth
        self.random_states = np.random.RandomState(random_state).randint(0, 10000, size=n_trees)
        self.trees = []
```

## Bootstrapping: Creating Variety
Bootstrapping is a statistical method for estimating the property of an estimator by resampling with replacement from an original data sample. It's used to assign measures of accuracy to sample estimates. Each tree is built on a separate bootstrapped dataset in a Random Forest, providing necessary randomness and variety. 

Here’s how bootstrapping is incorporated into our Random Forest:

```python
def bootstrapping(self, X, y):
    n_samples = X.shape[0]
    idxs = np.random.choice(n_samples, n_samples, replace=True)
    return X[idxs], y[idxs]
```

To 'fit' the model, we generate a bootstrapped dataset and fit a different decision tree to it with each iteration, appending each tree to our list:

```python
def fit(self, X, y):
    for i in range(self.n_trees):
        X_, y_ = self.bootstrapping(X, y)
        tree = DecisionTreeClassifier(max_depth=self.max_depth, random_state=self.random_states[i])
        tree.fit(X_, y_)
        self.trees.append(tree)
```

Finally, the `predict` component of the `RandomForest` collects predictions from each tree, returning the class with the majority votes:

```python
def predict(self, X):
    tree_preds = np.array([tree.predict(X) for tree in self.trees])
    return stats.mode(tree_preds)[0][0]
```

## RandomForest in Action
To validate our RandomForest's proficiency, let's use the widely employed Iris dataset as our testing ground:

```python
from sklearn import datasets
from sklearn.model_selection import train_test_split

iris = datasets.load_iris()
X = iris.data
y = iris.target
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.20, random_state=42)

rf = RandomForest(n_trees=100, max_depth=2, random_state=42)
rf.fit(X_train, y_train)
y_pred = rf.predict(X_test)

print("Accuracy: ", accuracy_score(y_test, y_pred))
```

Here, we load the Iris dataset and split it into training and testing datasets. We train (or 'fit') the model using the training dataset and then predict the classes for the test dataset. The `accuracy_score` summarizes how well our model's predictions match the actual classes in the test data.

## Lesson Summary and Practice
Congratulations! We've delved deep into the heart of Random Forests, looked at the tree generation process, and engineered a basic Random Forest classifier from scratch using Python. Now it's time for practice to consolidate these concepts. After all, practice is the fuel for mastery! Happy coding!


## Evaluating Random Forest Accuracy on Iris Dataset

## Adjusting the Depth of Our RandomForest

## Seeding the Forest: Random State Initialization