# DS106-03-04-ML - Decision Trees in Python
---

## Import Packages

In [1]:
import pandas as pd
import numpy as np
import matplotlib.pyplot as plt
import seaborn as sns
%matplotlib inline
from sklearn.model_selection import train_test_split
from sklearn.tree import DecisionTreeClassifier
from sklearn.metrics import classification_report, confusion_matrix, accuracy_score

## Load in Data

In [2]:
iris = sns.load_dataset('iris')

## Data Wrangling

In [3]:
# `y` is what you are predicting
# `x` is everything you're using to predict it.
x = iris.drop('species', axis=1)
y = iris['species']

---
## Train Test Split
- Split the data into training and testing sets.
The train variables are creating your initial model, and the test variables are what you'll use to determine the fit of the model. Note that just for following along, you will set the `random_state` to 76, which is not necessary, but it will give you the same split as the example.

In [4]:
x_train, x_test, y_train, y_test = train_test_split(x,y, test_size=0.3, random_state=76)

---
## Create Initial Decision Tree
Before you jump into the Random Forest, try a single decision tree. To do this, utilize the `DecisionTreeClassifier()` function and then `fit()` the model. Once more, to keep everyone on the same page, the random_state is 76.

In [5]:
decisionTree = DecisionTreeClassifier(random_state=76)
decisionTree.fit(x_train, y_train)

DecisionTreeClassifier(random_state=76)

---
## Assess the Model
Now that the data is fit, the next step is to create a set of predictions and interpret the results. You can start by using the `predict()` function, and then you'll utilize the same confusion matrix and classification report coding as you did last lesson.

In [6]:
treePredictions = decisionTree.predict(x_test)
from sklearn.metrics import classification_report, confusion_matrix
print(confusion_matrix(y_test, treePredictions))
print(classification_report(y_test, treePredictions))

[[19  0  0]
 [ 0 10  3]
 [ 0  2 11]]
              precision    recall  f1-score   support

      setosa       1.00      1.00      1.00        19
  versicolor       0.83      0.77      0.80        13
   virginica       0.79      0.85      0.81        13

    accuracy                           0.89        45
   macro avg       0.87      0.87      0.87        45
weighted avg       0.89      0.89      0.89        45



### Reading the Confusion Matrix

In [7]:
# (the confusion matrix by itself)
print(confusion_matrix(y_test, treePredictions))

[[19  0  0]
 [ 0 10  3]
 [ 0  2 11]]


### How Well Does your Model Fit?

In [8]:
# (the classification report by itself)
print(classification_report(y_test, treePredictions))

              precision    recall  f1-score   support

      setosa       1.00      1.00      1.00        19
  versicolor       0.83      0.77      0.80        13
   virginica       0.79      0.85      0.81        13

    accuracy                           0.89        45
   macro avg       0.87      0.87      0.87        45
weighted avg       0.89      0.89      0.89        45



### Conclusion
- _setosa_ was predicted with 100% precision/accuracy
- _versicolor_ was predicted with 83% precision/accuracy
- _virginica_ was predicted with 79% precision/accuracy