# Binary classification of the Iris dataset

🔗 [Iris dataset](https://archive.ics.uci.edu/ml/datasets/iris)

## Disclaimer

The Iris dataset is a multiclass classification dataset, but our model only does binary classification. Due to this, a modification in the original dataset was made, in order to predict if the flower is an Iris-setosa or not.

In [1]:
import pandas as pd
import numpy as np
from src.main import NeuralNetwork
from src.layers import Dense, Sigmoid
from sklearn.model_selection import train_test_split

## Manipulating the dataset

In [2]:
iris = pd.read_csv('data/IRIS.csv')

In [3]:
iris.head()

Unnamed: 0,sepal_length,sepal_width,petal_length,petal_width,species
0,5.1,3.5,1.4,0.2,Iris-setosa
1,4.9,3.0,1.4,0.2,Iris-setosa
2,4.7,3.2,1.3,0.2,Iris-setosa
3,4.6,3.1,1.5,0.2,Iris-setosa
4,5.0,3.6,1.4,0.2,Iris-setosa


In [4]:
iris.describe()

Unnamed: 0,sepal_length,sepal_width,petal_length,petal_width
count,150.0,150.0,150.0,150.0
mean,5.843333,3.054,3.758667,1.198667
std,0.828066,0.433594,1.76442,0.763161
min,4.3,2.0,1.0,0.1
25%,5.1,2.8,1.6,0.3
50%,5.8,3.0,4.35,1.3
75%,6.4,3.3,5.1,1.8
max,7.9,4.4,6.9,2.5


In [5]:
iris.info()

<class 'pandas.core.frame.DataFrame'>
RangeIndex: 150 entries, 0 to 149
Data columns (total 5 columns):
 #   Column        Non-Null Count  Dtype  
---  ------        --------------  -----  
 0   sepal_length  150 non-null    float64
 1   sepal_width   150 non-null    float64
 2   petal_length  150 non-null    float64
 3   petal_width   150 non-null    float64
 4   species       150 non-null    object 
dtypes: float64(4), object(1)
memory usage: 6.0+ KB


In [6]:
iris['species'] = iris['species'].astype('category')

In [7]:
X = iris.drop(['species'], axis=1)
Y = pd.get_dummies(iris['species'])

### Extracting the input data and target data

In this case y=0 implies that is not an Iris-setosa and y=1 implies that is an Iris-setosa

In [8]:
X_numpy = X.to_numpy()
Y_numpy = Y['Iris-setosa'].to_numpy().reshape(-1, 1)

###  Splitting in train dataset and test dataset

60% for train dataset, 40% for test dataset

In [9]:
x_train, x_test, y_train, y_test = train_test_split(X_numpy, Y_numpy, test_size=0.4)

In [10]:
x_train = x_train.T
y_train = y_train.T
x_test = x_test.T
y_test = y_test.T

## Creating the neural network

We are using one hidden layer with 4 hidden units.

Input -> (Linear -> ReLU) ->  (Linear -> Sigmoid) -> Output

In [11]:
layers = [Dense(4, 4), Sigmoid(4)]
model = NeuralNetwork(layers)

In [27]:
model.fit(x_train, y_train, epochs=100, learning_rate=0.05, print_step=10)

Epoch: 0, Cost: 6.7471894057358135

Epoch: 10, Cost: 0.08335541763288343

Epoch: 20, Cost: 0.061090988810185536

Epoch: 30, Cost: 0.048522529307031076

Epoch: 40, Cost: 0.04020028989382433

Epoch: 50, Cost: 0.03425929013309293

Epoch: 60, Cost: 0.029805635700321728

Epoch: 70, Cost: 0.02634473552938524

Epoch: 80, Cost: 0.02357712238662845

Epoch: 90, Cost: 0.02131488371895673

Epoch: 99, Cost: 0.01960631569139375



In [28]:
model.evaluate(x_test, y_test)

1.0