#__Applying Random Forest__

Let's examine how to construct a random forest regression. 

## Step 1: Import Required Libraries and Read the Dataset

- Import pandas and NumPy libraries
- Read the dataset and display the head
- Check the dataset information


In [None]:
import pandas as pd
import numpy as np

In [None]:
dataset = pd.read_csv('petrol_consumption.csv')

In [None]:
dataset.head()

__Observation__
- Here, you can see the first few rows of the dataset.

We will predict petrol consumption based on the above attribute.  

In [None]:
dataset.info()

__Observation__
- All data types are in numeric and there are no missing values.

## Step 2: Prepare the data

- Let's create X and y.


In [None]:
X = dataset.iloc[:, 0:4].values
y = dataset.iloc[:, 4].values

## Step 3: Split the Data into Training and Testing Sets

- Use train_test_split from sklearn.model_selection to split the data


In [None]:
from sklearn.model_selection import train_test_split

X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.2, random_state=0)

## Step 4: Standardize the Data

- Standardize the data using StandardScaler from sklearn.preprocessing


In [None]:
from sklearn.preprocessing import StandardScaler

sc = StandardScaler()
X_train = sc.fit_transform(X_train)
X_test = sc.transform(X_test)

## Step 5: Train the RandomForestRegressor

- Import RandomForestRegressor from sklearn.ensemble
- Create a regressor object, fit it with the training data, and make predictions on the test data


In [None]:
from sklearn.ensemble import RandomForestRegressor

regressor = RandomForestRegressor(n_estimators=20, random_state=0)
regressor.fit(X_train, y_train)
y_pred = regressor.predict(X_test)

## Step 6: Evaluate the Performance of the RandomForestRegressor

- Calculate the Mean Absolute Error, Mean Squared Error, and Root Mean Squared Error using metrics from sklearn


In [None]:
from sklearn import metrics

print('Train MAE:', metrics.mean_absolute_error(y_train, regressor.predict(X_train)))
print('Mean Square:', metrics.mean_squared_error(y_train, regressor.predict(X_train)))
print('Train RMSE:', np.sqrt(metrics.mean_squared_error(y_train, regressor.predict(X_train))))

In [None]:
print('Train MAE:', metrics.mean_absolute_error(y_test, regressor.predict(X_test)))
print('Mean Square:', metrics.mean_squared_error(y_test, regressor.predict(X_test)))
print('Train RMSE:', np.sqrt(metrics.mean_squared_error(y_test, regressor.predict(X_test))))

__Observation__

- Notice that there is a huge difference in metric value in terms of test and train using cross validation.

In the case of a classification problem, we will need to change the random forest regressor to a random forest classifier, and we will be able to get the classifier model.