<a href="https://colab.research.google.com/github/yoseforaz0990/ML-templates/blob/main/regression/random_forest_regression.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

| Step                                                      | Description                                                                                                          |
|-----------------------------------------------------------|----------------------------------------------------------------------------------------------------------------------|
| **Training the Random Forest Regression model on the whole dataset** | Build a Random Forest Regression model using the entire dataset.                                                     |
|                                                           | The `RandomForestRegressor` class from scikit-learn is used to create the model.                                      |
|                                                           | The `n_estimators` parameter is set to `10`, indicating the number of decision trees that will be used to construct the random forest. |
|                                                           | The `random_state` parameter is set to `0` for reproducibility in model training.                                     |
|                                                           | Train the Random Forest Regression model on the dataset to learn the relationships between the independent variable (Position level `X`) and the dependent variable (Salary `y`).   |
| **Predicting a new result**                               | Predict the salary for a new position level (e.g., `6.5`) using the trained Random Forest Regression model.       |
|                                                           | The `predict()` method is used, passing the new position level as input to get the predicted salary.                   |
| **Visualising the Random Forest Regression results**      | Create a range of position levels (`X_grid`) from the minimum to the maximum value of the original position levels `X`. |
|                                                           | The `RandomForestRegressor` model is then used to predict salaries for the position levels in `X_grid`.                |
|                                                           | Create a scatter plot to visualize the actual salary (`y`) against the position level (`X`) in red data points.        |
|                                                           | Plot the Random Forest Regression predictions (salary) based on the `X_grid` in blue to visualize how well the model fits the data. |
|                                                           | The blue curve represents the Random Forest Regression predictions, capturing the relationship between position level and salary. |
|                                                           | This visualization helps assess the performance of the Random Forest Regression model and how well it captures the underlying patterns in the data. |
| **Difference between Decision Tree Regression and Random Forest Regression** |                                                                                                                  |
|                                                           | **Decision Tree Regression:** It involves building a tree-like model to make predictions based on the independent variables. It splits the data into segments to create homogenous groups based on the dependent variable. Suitable for both linear and non-linear relationships between variables. |
|                                                           |                                                                                                                      |
|                                                           | **Random Forest Regression:** It is an ensemble method that combines multiple decision trees to make predictions. Each decision tree in the random forest is trained on a random subset of the data and a random subset of features. It aggregates the predictions from all trees to produce the final prediction. Random Forest Regression reduces overfitting, improves generalization, and provides better predictive performance compared to a single Decision Tree Regression. It is suitable for complex problems and non-linear relationships between variables.   |


In [None]:
# Training the Random Forest Regression model on the whole dataset
from sklearn.ensemble import RandomForestRegressor
regressor = RandomForestRegressor(n_estimators=10, random_state=0)
regressor.fit(X, y)

# Predicting a new result
new_position_level = 6.5
predicted_salary = regressor.predict([[new_position_level]])

# Visualising the Random Forest Regression results
import numpy as np
X_grid = np.arange(min(X), max(X), 0.01)
X_grid = X_grid.reshape((len(X_grid), 1))

import matplotlib.pyplot as plt
plt.scatter(X, y, color='red')
plt.plot(X_grid, regressor.predict(X_grid), color='blue')
plt.title('Truth or Bluff (Random Forest Regression)')
plt.xlabel('Position level')
plt.ylabel('Salary')
plt.show()


