# RANDOM FOREST ALGORITHM:
* Random Forest is a popular machine learning algorithm that belongs to the supervised learning technique. 
* It can be used for both Classification and Regression problems in ML. 
* It is based on the concept of ensemble learning, which is a process of combining multiple classifiers to solve a complex       problem and to improve the performance of the model.
* As the name suggests, "Random Forest is a classifier that contains a number of decision trees on various subsets of the given dataset and takes the average to improve the predictive accuracy of that dataset."
* Instead of relying on one decision tree, the random forest takes the prediction from each tree and based on the majority votes of predictions, and it predicts the final output.
* The greater number of trees in the forest leads to higher accuracy and prevents the problem of overfitting.



The below diagram explains the working of the Random Forest algorithm:
<img src="Random Forest.jpg">

# Working of Random Forest Algorithm
### Random Forest works in two-phase first is to create the random forest by combining N decision tree, and second is to make predictions for each tree created in the first phase.

* Step-1: Select random K data points from the training set.

* Step-2: Build the decision trees associated with the selected data points (Subsets).

* Step-3: Choose the number N for decision trees that you want to build.

* Step-4: Repeat Step 1 & 2.

* Step-5: For new data points, find the predictions of each decision tree, and assign the new data points to the category that wins the majority votes.

# Advantages of Random Forest:
 
* Random Forest is capable of performing both Classification and Regression tasks.
* It is capable of handling large datasets with high dimensionality.
* It enhances the accuracy of the model and prevents the overfitting issue.

# Disadvantages of Random Forest Algorithm:
* Although random forest can be used for both classification and regression tasks, it is not more suitable for Regression tasks.

# Implementation of Random Forest Algorithm

In [8]:
#Impoting Libraries
import numpy as np
import pandas as pd
from sklearn.model_selection import train_test_split
from sklearn.ensemble import RandomForestClassifier
from sklearn.metrics import precision_score, recall_score, accuracy_score, f1_score

#Loading the dataset
data = pd.read_csv('New heart.csv')
np.shape(data)


(918, 12)

In [9]:
# Handle missing values (Replace 'M' with NaN)
data = data.replace('M', np.NaN)

# Drop rows with missing target values (assuming 'HeartDisease' is the column name)
data=data.dropna(subset=['HeartDisease'])



In [10]:
# Define features (X) and target (y)
X = data.drop('HeartDisease', axis=1)
y = data['HeartDisease']

# Convert non-numeric columns to numeric using pandas get_dummies
X = pd.get_dummies(X, columns=['Sex', 'ChestPainType', 'RestingECG', 'ExerciseAngina', 'ST_Slope'])

#data



In [11]:
# Split the data into training and test sets
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.2, random_state=42)



In [12]:
# Build the Random Forest model
model = RandomForestClassifier(n_estimators=100, random_state=42)

# Train the model
model.fit(X_train, y_train)



In [13]:
# Make predictions on the test set
y_pred = model.predict(X_test)



In [14]:
# Calculate evaluation metrics
precision = precision_score(y_test, y_pred)
recall = recall_score(y_test, y_pred)
accuracy = accuracy_score(y_test, y_pred)
f1 = f1_score(y_test, y_pred)

# Print the evaluation metrics
print("Precision:", precision*100)
print("Recall:", recall*100)
print("Accuracy:", accuracy*100)
print("F1 Score:", f1*100)


Precision: 90.38461538461539
Recall: 87.85046728971963
Accuracy: 87.5
F1 Score: 89.0995260663507
