# Algoritmo KNN aplicado ao Dataset iris.csv

Iris é uma planta com flor. Alguns pesquisadores registraram várias características das diferentes flores da iris e registraram em um dataset. Esses dados foram: 
 - Comprimento da pétala;
 - Largura da pétala;
 - Comprimento da sépala;
 - Largura da sépala;
 - Tipo de espécie.
 
Este projeto tem por objetivo praticar os conceitos do algoritmo de ML KNN neste dataset clássico 

In [4]:
import pandas as pd
import seaborn as sns
import matplotlib.pyplot as plt
import numpy as np
import warnings
warnings.filterwarnings('ignore')
%matplotlib inline

In [5]:
iris = pd.read_csv("iris.csv")

In [18]:
iris.head(5)

Unnamed: 0,SepalLength,SepalWidth,PetalLength,PetalWidth,Species
0,5.1,3.5,1.4,0.2,Iris-setosa
1,4.9,3.0,1.4,0.2,Iris-setosa
2,4.7,3.2,1.3,0.2,Iris-setosa
3,4.6,3.1,1.5,0.2,Iris-setosa
4,5.0,3.6,1.4,0.2,Iris-setosa


In [7]:
iris.info()

<class 'pandas.core.frame.DataFrame'>
RangeIndex: 150 entries, 0 to 149
Data columns (total 5 columns):
 #   Column       Non-Null Count  Dtype  
---  ------       --------------  -----  
 0   SepalLength  150 non-null    float64
 1   SepalWidth   150 non-null    float64
 2   PetalLength  150 non-null    float64
 3   PetalWidth   150 non-null    float64
 4   Species      150 non-null    object 
dtypes: float64(4), object(1)
memory usage: 6.0+ KB


In [20]:
iris.describe()

Unnamed: 0,SepalLength,SepalWidth,PetalLength,PetalWidth
count,150.0,150.0,150.0,150.0
mean,5.843333,3.054,3.758667,1.198667
std,0.828066,0.433594,1.76442,0.763161
min,4.3,2.0,1.0,0.1
25%,5.1,2.8,1.6,0.3
50%,5.8,3.0,4.35,1.3
75%,6.4,3.3,5.1,1.8
max,7.9,4.4,6.9,2.5


# Dados treino vs Dados teste

In [9]:
from sklearn.model_selection import train_test_split

In [26]:
X_train, X_test, y_train, y_test = train_test_split(iris.drop('Species', axis=1),iris['Species'],test_size=0.3)

In [27]:
X_train.head(5)

Unnamed: 0,SepalLength,SepalWidth,PetalLength,PetalWidth
139,6.9,3.1,5.4,2.1
47,4.6,3.2,1.4,0.2
95,5.7,3.0,4.2,1.2
37,4.9,3.1,1.5,0.1
127,6.1,3.0,4.9,1.8


In [32]:
X_train.shape

(105, 4)

In [28]:
y_train.head(5)

139     Iris-virginica
47         Iris-setosa
95     Iris-versicolor
37         Iris-setosa
127     Iris-virginica
Name: Species, dtype: object

In [29]:
X_test.head(5)

Unnamed: 0,SepalLength,SepalWidth,PetalLength,PetalWidth
94,5.6,2.7,4.2,1.3
97,6.2,2.9,4.3,1.3
3,4.6,3.1,1.5,0.2
137,6.4,3.1,5.5,1.8
131,7.9,3.8,6.4,2.0


In [30]:
y_test.head(5)

94     Iris-versicolor
97     Iris-versicolor
3          Iris-setosa
137     Iris-virginica
131     Iris-virginica
Name: Species, dtype: object

# Instânciando algoritmo KNN

In [35]:
from sklearn.neighbors import KNeighborsClassifier

In [38]:
knn = KNeighborsClassifier(n_neighbors=3)

# Treinamento do algoritmo

In [39]:
knn.fit(X_train,y_train)

KNeighborsClassifier(n_neighbors=3)

# Executando KNN com o conjunto de teste

In [51]:
resultado = knn.predict(X_test)

In [52]:
resultado

array(['Iris-versicolor', 'Iris-versicolor', 'Iris-setosa',
       'Iris-virginica', 'Iris-virginica', 'Iris-setosa',
       'Iris-versicolor', 'Iris-virginica', 'Iris-virginica',
       'Iris-virginica', 'Iris-setosa', 'Iris-setosa', 'Iris-setosa',
       'Iris-virginica', 'Iris-setosa', 'Iris-versicolor',
       'Iris-versicolor', 'Iris-setosa', 'Iris-virginica',
       'Iris-virginica', 'Iris-virginica', 'Iris-versicolor',
       'Iris-virginica', 'Iris-virginica', 'Iris-versicolor',
       'Iris-virginica', 'Iris-virginica', 'Iris-setosa', 'Iris-setosa',
       'Iris-setosa', 'Iris-virginica', 'Iris-virginica',
       'Iris-virginica', 'Iris-versicolor', 'Iris-virginica',
       'Iris-versicolor', 'Iris-virginica', 'Iris-setosa', 'Iris-setosa',
       'Iris-virginica', 'Iris-setosa', 'Iris-setosa', 'Iris-versicolor',
       'Iris-versicolor', 'Iris-setosa'], dtype=object)

# Técnicas de validação

# Matriz de confusão

In [53]:
print (pd.crosstab(y_test,resultado,rownames=['real'],colnames=['         Predito'], margins=True))

         Predito  Iris-setosa  Iris-versicolor  Iris-virginica  All
real                                                               
Iris-setosa                15                0               0   15
Iris-versicolor             0               11               1   12
Iris-virginica              0                0              18   18
All                        15               11              19   45


# Metricas de classificação

In [54]:
from sklearn import metrics

In [58]:
#print(metrics.classification_report(y_test,resultado,target_names=iris['Species']))

In [56]:
print(metrics.classification_report(y_test,resultado))

                 precision    recall  f1-score   support

    Iris-setosa       1.00      1.00      1.00        15
Iris-versicolor       1.00      0.92      0.96        12
 Iris-virginica       0.95      1.00      0.97        18

       accuracy                           0.98        45
      macro avg       0.98      0.97      0.98        45
   weighted avg       0.98      0.98      0.98        45



# Conclusão

- O algoritmo KNN funcionou bem neste problema de classificação, e atingiu boa acurácia.