## Evaluation Metrics for Classification


### Confusion Matrix: A confusion matrix is a table that summarizes the performance of a classifier by showing the number of true positive, true negative, false positive, and false negative predictions. It is useful to understand the strengths and weaknesses of a classifier, and also provides information about the distribution of errors made by the classifier:


#### Accuracy: This metric is defined as the ratio of correctly predicted instances to the total number of instances in the dataset. It measures how often the classifier is correct. However, it can be misleading in imbalanced datasets, as it may give high accuracy scores even if the classifier is not able to accurately predict the minority class. Based on the confusion matrix shows, the accuracy is computed as:



In [16]:
TP, TN, FP, FN = 4, 91, 1, 4
accuracy = (TP + TN)/(TP + TN + FP + FN)
print(accuracy)


0.95


#### Precision: Precision is the ratio of correctly predicted positive instances to the total number of instances predicted as positive. It is a measure of the classifier's ability to correctly identify positive instances and avoid false positives.



In [17]:
TP = 114
FP = 14
precision = TP / (TP + FP)
print(f"precision: {precision:4.2f}")


precision: 0.89


#### Recall (or Sensitivity): Recall is the ratio of correctly predicted positive instances to the total number of actual positive instances. It is a measure of the classifier's ability to detect all positive instances.



In [12]:
recall = TP / (TP + FN)
print(f"recall: {recall:4.2f}")


recall: 0.97


#### F1 Score: The F1 Score is the harmonic mean of precision and recall. It balances precision and recall and gives an overall performance score for the classifier.



In [15]:
precision = TP / (TP + FP)
accuracy = (TP + TN)/(TP + TN + FP + FN)
recall = TP / (TP + FN)
f1_score = 2 * precision * recall / (precision + recall)
f1_score

0.9268292682926829

In [10]:
import numpy as np
import pandas as pd
from sklearn.metrics import accuracy_score, precision_score, recall_score, f1_score, confusion_matrix

# True labels of the data
y_true = [0, 1, 0, 1, 1, 0, 1, 1, 0, 0]

# Predicted labels of the data
y_pred = [0, 1, 0, 1, 0, 1, 1, 0, 1, 0]

# Calculate accuracy
accuracy = accuracy_score(y_true, y_pred)
print("Accuracy: ", accuracy)

# Calculate precision
precision = precision_score(y_true, y_pred)
print("Precision: ", precision)

# Calculate recall
recall = recall_score(y_true, y_pred)
print("Recall: ", recall)

# Calculate F1 Score
f1 = f1_score(y_true, y_pred)
print("F1 Score: ", f1)

# Calculate Confusion Matrix
conf_mat = confusion_matrix(y_true, y_pred)
print("Confusion Matrix: \n", conf_mat)


Accuracy:  0.6
Precision:  0.6
Recall:  0.6
F1 Score:  0.6
Confusion Matrix: 
 [[3 2]
 [2 3]]


### Fit a logistic regression model using the given input and output patterns X and Y, respectively. Once fitted, determine the precision of this fitted logistic regression model using the test dataset.



#### X_train = [[4,2,1],[3,4,6],[5,6,7],[8,9,7]]
#### y_train = [1,2,1,2]
#### X_test = [[4,3,1],[2,4,3],[5,6,1],[5,9,9]]
#### y_test = [1,2,2,2]


In [18]:
X_train = [[4,2,1],[3,4,6],[5,6,7],[8,9,7]]
y_train = [1,2,1,2]
X_test = [[4,3,1],[2,4,3],[5,6,1],[5,9,9]]
y_test = [1,2,2,2]
import pandas as pd
from sklearn.linear_model import LogisticRegression
from sklearn.metrics import precision_score
# Train the logistic regression model
log_reg = LogisticRegression(random_state=0)
log_reg.fit(X_train, y_train)

# Make predictions with the logistic regression model
y_pred_log_reg = log_reg.predict(X_test)

# Calculate the evaluation metrics for logistic regression
prec_log_reg = precision_score(y_test, y_pred_log_reg, average="weighted")

# Print the evaluation metrics for logistic regression
print("Precision: ", prec_log_reg)


Precision:  0.8333333333333334


#### Fit a Decision Tree model using the input patterns X and the corresponding output patterns Y. After fitting, evaluate the recall of the fitted Decision Tree model using the test dataset.

X_train = [[4,2,1],[3,4,6],[5,6,7],[8,9,7]]

y_train = [1,2,1,2]

X_test = [[4,3,1],[2,4,3],[5,6,1],[5,9,9]]

y_test = [1,2,2,2]


In [23]:
X_train = [[4,2,1],[3,4,6],[5,6,7],[8,9,7]]

y_train = [1,2,1,2]

X_test = [[4,3,1],[2,4,3],[5,6,1],[5,9,9]]

y_test = [1,2,2,2]

import pandas as pd
from sklearn.tree import DecisionTreeClassifier
from sklearn.metrics import  recall_score

# Train the decision tree model
dt = DecisionTreeClassifier(random_state=0)
dt.fit(X_train, y_train)

# Make predictions with the decision tree model
y_pred_dt = dt.predict(X_test)

# Calculate the evaluation metrics for decision tree


rec_dt = recall_score(y_test, y_pred_dt, average="weighted")

# Print the evaluation metrics for logistic regression
print("Recall: ", rec_dt)


Recall:  0.75
