**Importing Data Sets**

In [1]:
import numpy as np
import pandas as pd
import seaborn as sns
sns.set_palette('husl')
import matplotlib.pyplot as plt
%matplotlib inline
from sklearn.model_selection import train_test_split
from sklearn.model_selection import cross_val_score
from sklearn.model_selection import StratifiedKFold
from sklearn.metrics import classification_report
from sklearn.metrics import accuracy_score
from sklearn.linear_model import LogisticRegression
from sklearn.tree import DecisionTreeClassifier
from sklearn.neighbors import KNeighborsClassifier
from sklearn.discriminant_analysis import LinearDiscriminantAnalysis
from sklearn.naive_bayes import GaussianNB
from sklearn import preprocessing
from sklearn.svm import SVC

In [2]:
iris = pd.read_csv('../input/iris/Iris.csv',index_col=0)

In [3]:
iris.head()

In [4]:
iris.shape

In [5]:
iris.info()

In [6]:
iris.describe()

In [7]:
iris['Species'].value_counts()

**Pair plot
Plotting multiple pairwise bivariate distributions in a dataset using pairplot:**

In [8]:
sns.pairplot(iris, hue='Species', markers='+')
plt.show()

**Heatmap
Plotting the heatmap to check the correlation.
dataset.corr() is used to find the pairwise correlation of all columns in the dataframe.**

In [9]:
plt.figure(figsize=(7,5))
sns.heatmap(iris.corr(), annot=True, cmap='cubehelix_r')
plt.show()

**As we can see species is highly correlated with PetalLenghtCm and PetalWidthCm**

In [10]:
#for classsification we have to encode it.
label_encoder = preprocessing.LabelEncoder()
iris['Species']= label_encoder.fit_transform(iris['Species']) 

**X is having all the dependent variables.
Y is having an independent variable (here in this case ‘Species’ is an independent variable).**


In [11]:
X=iris.iloc[:,0:4]
y=iris['Species']

**Train Test split
Splitting our dataset into train and test using train_test_split(), what we are doing here is taking 80% of data to train our model, and 20% that we will hold back as a validation dataset:**

In [12]:
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.20, random_state=1)

**Models Testing - we will try all the classification models**

In [13]:
models = []
models.append(('LR', LogisticRegression()))
models.append(('LDA', LinearDiscriminantAnalysis()))
models.append(('KNN', KNeighborsClassifier()))
models.append(('CART', DecisionTreeClassifier()))
models.append(('NB', GaussianNB()))
models.append(('SVM', SVC(gamma='auto')))
# evaluate each model in turn
results = []
model_names = []
for name, model in models:
    kfold = StratifiedKFold(n_splits=10, random_state=1, shuffle=True)
cv_results = cross_val_score(model, X_train, y_train, cv=kfold, scoring='accuracy')
results.append(cv_results)
model_names.append(name)
print('%s: %f (%f)' % (name, cv_results.mean(), cv_results.std()))

**Support Vector Classifier (SVC) is performing better than other algorithms.
Let’s train SVC model on our training set and predict on test set in the next step.**

**Model Building**

**1.We are defining our SVC model and passing gamma as auto
2.After that fitting/training the model on X_train and Y_train using .fit() method
3.Then we are predicting on X_test using .predict() method**

In [14]:
model = SVC(gamma='auto')
model.fit(X_train, y_train)
prediction = model.predict(X_test)

**Now checking the accuracy of our model using
accuracy_score(y_test, prediction)
y_test: actual values of X_test
prediction: predicted values of X_test.
2.Printing out the classification report using
classification_report(y_test, prediction).**

In [15]:
print(f'Test Accuracy: {accuracy_score(y_test, prediction)}')
print(f'Classification Report: \n {classification_report(y_test, prediction)}')

**So we finally have an accuracy of 96.67 with a suppport of 30 with good f1-scores**