## Iris Logistic Regression

The objective is to create a classifier that will predict whether an iris belongs to the ‘Iris-setosa' class or not. This means that we have two classes: ‘Iris-setosa' and not-‘Iris-setosa' (which includes 'Iris-versicolor' and 'Iris-virginica').

In [17]:
# importing libraries
import pandas as pd
from sklearn.model_selection import train_test_split
from sklearn.linear_model import LogisticRegression
from sklearn.metrics import confusion_matrix
from sklearn import preprocessing
from sklearn.metrics import accuracy_score, precision_score, recall_score

Read the Iris dataset 

In [18]:

# Read in the data set
# Import, read and display first 5 columns of the dataset
iris = pd.read_csv("Iris.csv")
iris.head()

Unnamed: 0,Id,SepalLengthCm,SepalWidthCm,PetalLengthCm,PetalWidthCm,Species
0,1,5.1,3.5,1.4,0.2,Iris-setosa
1,2,4.9,3.0,1.4,0.2,Iris-setosa
2,3,4.7,3.2,1.3,0.2,Iris-setosa
3,4,4.6,3.1,1.5,0.2,Iris-setosa
4,5,5.0,3.6,1.4,0.2,Iris-setosa


Identify independent variables x.

In [19]:
# Assign columns with independent variables to x
X = iris.iloc[:,[1,2,3,4]].values
# Reshape x into
X = X.reshape(-1, 4)
# scale the data
X = preprocessing.scale(X)

Encode your dependent variable y such that ‘Iris-setosa' is encoded as 0, and 'Iris-versicolor' and 'Iris-virginica' are both encoded as 1. (0 corresponds to the'Iris-setosa' class, and 1 corresponds to the not-‘Iris-setosa' class.)

In [20]:
# Assign and encode the dependant variable column to y
y = iris.iloc[:,5].apply(lambda val: 0 if val == 'Iris-setosa' else 1).values

Split the data into a training and test set.

In [21]:
# Split data into 75% training and 25% test sets
X_train, X_test, y_train, y_test = train_test_split(X, 
                y, test_size=0.25, random_state=0)

Use sklearn’s logistic regression function to fit a model and make predictions on the test set.

In [22]:
# Create a logistic regression model
log_reg = LogisticRegression()
# Fit a logistic regression model to the training data
log_reg.fit(X_train, y_train)
# Make predictions on test data
y_pred = log_reg.predict(X_test).reshape(-1,1)

### Measuring Model Performance


Analyse the confusion matrix and provide a prediction, in a comment whether the model is likely to have higher precision, higher recall, or similar precision and recall.

In [23]:
# Create a confusion matrix comparing actual and predicted values
conf_mat = confusion_matrix(y_test, y_pred)
# Convert the confusion matrix into a DataFrame
cm_iris = pd.DataFrame(conf_mat)
# Display the confusion matrix
print('Confusion Matrix: \n', cm_iris)

Confusion Matrix: 
     0   1
0  13   0
1   0  25


The model made no classification errors, it perfectly predicted all test samples. Therefore, the model will likely have 100% accuracy, precision, and recall meaning similar precision and recall.

Calculate the accuracy, precision, and recall, and check whether your prediction was right.

In [24]:
# Determine the proportion of correct predictions
acc = accuracy_score(y_test, y_pred)
# Determine the proportion of positive predictions that were actually correct
prec = precision_score(y_test, y_pred)
# Determine the proportion of actual positives that were correctly predicted.
rec = recall_score(y_test, y_pred)
# Display accuracy
print('Accuracy:', acc)
# Display precision
print('Precision:', prec)
# Display recall
print('Recall:', rec)

Accuracy: 1.0
Precision: 1.0
Recall: 1.0


Accuracy, precision and recall are perfect, thus the model shows equal and ideal performance in both metrics. Therefore, the model achieved a perfect performance in classifying whether an iris belongs to the ‘Iris-setosa’ class or not.