# **ML Lab - Random Forest**
Urlana Suresh Kumar - 22071A6662

In this notebook, we explore the application of **Random Forest** models for both classification and regression tasks using Python's scikit-learn library. Random Forest is an ensemble learning method that utilizes multiple decision trees to improve predictive performance. We will demonstrate the following:

1. **Random Forest for Classification**: We will classify the Iris dataset, a classic dataset for classification problems, into three species based on features like sepal length, sepal width, petal length, and petal width.

2. **Random Forest for Regression**: We will predict continuous housing prices using the California housing dataset, where the goal is to estimate the price of homes based on various factors like location, number of rooms, and more.

For both tasks, we will evaluate the model performance using appropriate metrics such as accuracy, classification report, confusion matrix for classification, and mean absolute error, mean squared error, and R² score for regression.

# **Random Forest for Classification**

In [None]:
# Import necessary libraries
from sklearn.datasets import load_iris
from sklearn.model_selection import train_test_split
from sklearn.ensemble import RandomForestClassifier
from sklearn.metrics import accuracy_score, classification_report, confusion_matrix

# Load dataset
data = load_iris()
X = data.data # Features
y = data.target # Target variable

# Split the data into training and test sets
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.3, random_state=42)

# Initialize the Random Forest Classifier
# Set number of trees and other hyperparameters as desired
rf_model = RandomForestClassifier(n_estimators=100, max_depth=5, random_state=42)

# Train the model on the training data
rf_model.fit(X_train, y_train)

# Make predictions on the test data
y_pred = rf_model.predict(X_test)

# Evaluate the model
accuracy = accuracy_score(y_test, y_pred)
print("Random Forest Classification Accuracy:", accuracy)
print("Classification Report:\n", classification_report(y_test, y_pred))
print("Confusion Matrix:\n", confusion_matrix(y_test, y_pred))

Random Forest Classification Accuracy: 1.0
Classification Report:
               precision    recall  f1-score   support

           0       1.00      1.00      1.00        19
           1       1.00      1.00      1.00        13
           2       1.00      1.00      1.00        13

    accuracy                           1.00        45
   macro avg       1.00      1.00      1.00        45
weighted avg       1.00      1.00      1.00        45

Confusion Matrix:
 [[19  0  0]
 [ 0 13  0]
 [ 0  0 13]]


# **Random Forest for Regression**

In [None]:
# Import necessary libraries
from sklearn.datasets import fetch_california_housing
from sklearn.model_selection import train_test_split
from sklearn.ensemble import RandomForestRegressor
from sklearn.metrics import mean_absolute_error, mean_squared_error, r2_score

# Load California housing data
data = fetch_california_housing()
X = data.data # Features
y = data.target # Target variable

# Split the data into training and test sets
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.3, random_state=42)

# Initialize the Random Forest Regressor
rf_regressor = RandomForestRegressor(n_estimators=100, max_depth=10, random_state=42)

# Train the model on the training data
rf_regressor.fit(X_train, y_train)

# Make predictions on the test data
y_pred = rf_regressor.predict(X_test)

# Evaluate the model
mae = mean_absolute_error(y_test, y_pred)
mse = mean_squared_error(y_test, y_pred)
r2 = r2_score(y_test, y_pred)

print("Random Forest Regression Results")
print("Mean Absolute Error (MAE):", mae)
print("Mean Squared Error (MSE):", mse)
print("R² Score:", r2)

Random Forest Regression Results
Mean Absolute Error (MAE): 0.36774064650742816
Mean Squared Error (MSE): 0.29449903404773414
R² Score: 0.7756266400588361


# **Conclusion**

In this notebook, we have implemented **Random Forest** models for both classification and regression tasks:

1. **Random Forest for Classification**: We achieved a high classification accuracy using the Iris dataset. The classification report and confusion matrix demonstrated excellent performance in predicting the flower species.

2. **Random Forest for Regression**: We successfully predicted housing prices from the California housing dataset. The regression model achieved a good R² score, indicating that the model was able to explain a significant portion of the variance in housing prices.

Random Forest models are robust, easy to use, and handle both classification and regression tasks effectively. Their ensemble nature helps reduce overfitting and provides reliable predictions for a wide range of datasets.