# DS106-03-05-ML - Random Forest in Python
---

## Import Packages

In [1]:
import pandas as pd
import numpy as np
import matplotlib.pyplot as plt
import seaborn as sns
%matplotlib inline
from sklearn.ensemble import RandomForestClassifier
from sklearn.model_selection import train_test_split
from sklearn.metrics import classification_report, confusion_matrix

## Load in Data

In [2]:
iris = sns.load_dataset('iris')

## Data Wrangling

In [3]:
# `y` is target, `x` is predictor(s)
x = iris.drop('species', axis=1)
y = iris['species']

---
## Train Test Split

In [4]:
x_train, x_test, y_train, y_test = train_test_split(x,y, test_size=0.3, random_state=76)

---
## Initial Random Forest Model
And at last you are ready to kick it into high gear by creating your random forest model. 

You'll use the function `RandomForestClassifer()`, with the arguments `n_estimators=` to specify how many decision trees you want the random forest to stem from, and of course `random_state=` just to follow along with this content:

In [5]:
forest = RandomForestClassifier(n_estimators=500, random_state=76)
forest.fit(x_train, y_train)

RandomForestClassifier(n_estimators=500, random_state=76)

---
## Evaluate Model Fit

In [6]:
forestPredictions = forest.predict(x_test)
print(confusion_matrix(y_test, forestPredictions))
print(classification_report(y_test, forestPredictions))

[[19  0  0]
 [ 0 11  2]
 [ 0  0 13]]
              precision    recall  f1-score   support

      setosa       1.00      1.00      1.00        19
  versicolor       1.00      0.85      0.92        13
   virginica       0.87      1.00      0.93        13

    accuracy                           0.96        45
   macro avg       0.96      0.95      0.95        45
weighted avg       0.96      0.96      0.96        45



---
# Conclusion
The new model is 96% accurate, and in general it will become more and more accurate the larger your dataset is. 
- 100% accuracy for both _setosa_ and _versicolor_ irises
- only 87% accuracy for _virginica_