## Iris Logistic Regression

The objective is to create a classifier that will predict whether an iris belongs to the ‘Iris-setosa' class or not. This means that we have two classes: ‘Iris-setosa' and not-‘Iris-setosa' (which includes 'Iris-versicolor' and 'Iris-virginica').

In [9]:
# importing libraries
import pandas as pd
from sklearn.model_selection import train_test_split
from sklearn.linear_model import LogisticRegression
from sklearn.metrics import confusion_matrix
from sklearn import preprocessing
from sklearn.metrics import accuracy_score, precision_score, recall_score

Read the Iris dataset 

In [10]:

# Read in the data set
# Import, read and display first 5 columns of the dataset
iris = pd.read_csv("Iris.csv")
iris.head()

Unnamed: 0,Id,SepalLengthCm,SepalWidthCm,PetalLengthCm,PetalWidthCm,Species
0,1,5.1,3.5,1.4,0.2,Iris-setosa
1,2,4.9,3.0,1.4,0.2,Iris-setosa
2,3,4.7,3.2,1.3,0.2,Iris-setosa
3,4,4.6,3.1,1.5,0.2,Iris-setosa
4,5,5.0,3.6,1.4,0.2,Iris-setosa


Identify independent variables x.

In [11]:
# Assign columns with independent variables to x
X = iris.iloc[:,[1,2,3,4]].values
# Reshape x into
X = X.reshape(-1, 4)
# scale the data
X = preprocessing.scale(X)

Encode your dependent variable y such that all three categories ‘Iris-setosa', 'Iris-versicolor', and 'Iris-virginica' correspond to the numeric values 0, 1, and 2, respectively.

In [12]:
# Assign and encode the dependant variable column to y
y = iris.iloc[:,5].map({'Iris-setosa': 0, 'Iris-versicolor': 1, 'Iris-virginica': 2}).values

Split the data into a training and test set.

In [13]:
# Split data into 75% training and 25% test sets
X_train, X_test, y_train, y_test = train_test_split(X, 
                y, test_size=0.25, random_state=0)

Use sklearn’s logistic regression function to fit a model and make predictions on the test set.

In [14]:
# Create a logistic regression model
log_reg = LogisticRegression()
# Fit a logistic regression model to the training data
log_reg.fit(X_train, y_train)
# Make predictions on test data
y_pred = log_reg.predict(X_test).reshape(-1,1)

### Measuring Model Performance


Analyse the confusion matrix and provide a prediction, in a comment whether the model is likely to have higher precision, higher recall, or similar precision and recall.

In [15]:
# Create a confusion matrix comparing actual and predicted values
conf_mat = confusion_matrix(y_test, y_pred)
# Convert the confusion matrix into a DataFrame
cm_iris = pd.DataFrame(conf_mat)
# Display the confusion matrix
print('Confusion Matrix: \n', cm_iris)

Confusion Matrix: 
     0   1  2
0  13   0  0
1   0  15  1
2   0   0  9


The model classifies one 'Iris-virginica' as 'Iris-versicolor' and perfectly predicted all the other test samples. Therefore, the model will have less than 100% accuracy; and less than 100% precision and recall for Iris-virginica and Iris-versicolor, respectively. Therefore higher precision for Iris-versicolor, higher recall for Iris-virginica.

Calculate the accuracy, precision, and recall, and check whether your prediction was right.

In [16]:
# Determine the proportion of correct predictions
acc = accuracy_score(y_test, y_pred)
# Display accuracy
print("Accuracy:", acc)
# Determine the proportion of correct positive predictions for each class
prec = precision_score(y_test, y_pred, average=None)
# Determine proportion of correct actual positives predictions for each class
rec = recall_score(y_test, y_pred, average=None)
# # Display precision and recall per class
class_labels = ['Iris-setosa', 'Iris-versicolor', 'Iris-virginica']
for i, label in enumerate(class_labels):
    print(f"\nClass: {label}")
    print(f"  Precision: {prec[i]}")
    print(f"  Recall:    {rec[i]}")

Accuracy: 0.9736842105263158

Class: Iris-setosa
  Precision: 1.0
  Recall:    1.0

Class: Iris-versicolor
  Precision: 1.0
  Recall:    0.9375

Class: Iris-virginica
  Precision: 0.9
  Recall:    1.0


Accuracy, precision and recall are closer to 100%, thus the model prediction is okay. It achieved a performance closer to 100%, which is less than that achieved with a model for classifying whether an iris belongs to the ‘Iris-setosa’ class or not.