<a href="https://colab.research.google.com/github/whuan349/DS3000_Credit_Card_Fraud_Detection/blob/master/baseline.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

**Task 1 — Metrics Foundations**

This Task:

*   Implements a trivial baseline classifier that predicts all transactions aS legitimate (Class = 0).

*   Computes key evaluation metrics (Accuracy, Precision, Recall, F1-Score) to establish a starting performance benchmark.

*   Demonstrates why accuracy alone is misleading for fraud detection by showing that minority-class metrics remain zero when no fraud cases are detected.
*   Highlights the necessity of using precision, recall, and F1-score to properly assess model performance on imbalanced classification tasks.

*   Provides a reference point against which all subsequent machine-learning models are compared.

In [None]:
import numpy as np
import pandas as pd
from sklearn.metrics import precision_score, recall_score, f1_score

In [None]:
# 1. Load and inspect the data
df = pd.read_csv("creditcard_2023.csv")

print(df.head())


   id        V1        V2        V3        V4        V5        V6        V7  \
0   0 -0.260648 -0.469648  2.496266 -0.083724  0.129681  0.732898  0.519014   
1   1  0.985100 -0.356045  0.558056 -0.429654  0.277140  0.428605  0.406466   
2   2 -0.260272 -0.949385  1.728538 -0.457986  0.074062  1.419481  0.743511   
3   3 -0.152152 -0.508959  1.746840 -1.090178  0.249486  1.143312  0.518269   
4   4 -0.206820 -0.165280  1.527053 -0.448293  0.106125  0.530549  0.658849   

         V8        V9  ...       V21       V22       V23       V24       V25  \
0 -0.130006  0.727159  ... -0.110552  0.217606 -0.134794  0.165959  0.126280   
1 -0.133118  0.347452  ... -0.194936 -0.605761  0.079469 -0.577395  0.190090   
2 -0.095576 -0.261297  ... -0.005020  0.702906  0.945045 -1.154666 -0.605564   
3 -0.065130 -0.205698  ... -0.146927 -0.038212 -0.214048 -1.893131  1.003963   
4 -0.212660  1.049921  ... -0.106984  0.729727 -0.161666  0.312561 -0.414116   

        V26       V27       V28    Amount  C

In [None]:
print(df.columns)


Index(['id', 'V1', 'V2', 'V3', 'V4', 'V5', 'V6', 'V7', 'V8', 'V9', 'V10',
       'V11', 'V12', 'V13', 'V14', 'V15', 'V16', 'V17', 'V18', 'V19', 'V20',
       'V21', 'V22', 'V23', 'V24', 'V25', 'V26', 'V27', 'V28', 'Amount',
       'Class'],
      dtype='object')


In [None]:
# 2. Clean and prepare the Class column (target variable)
df = df.dropna(subset=["Class"])        # Drop any rows where the label (Class) is missing
df["Class"] = df["Class"].astype(int)   # Ensure the label is stored as integer 0/1

# Check distribution of the labels
print(df["Class"].value_counts())
print("\nClass distribution (proportion):")
print(df["Class"].value_counts(normalize=True))

Class
0    284315
1    284315
Name: count, dtype: int64

Class distribution (proportion):
Class
0    0.5
1    0.5
Name: proportion, dtype: float64


In [None]:
# 3. Create baseline predictions
y_true = df["Class"].values
y_pred = np.zeros_like(y_true)

from sklearn.metrics import precision_score, recall_score, f1_score

# 4. Compute evaluation metrics
accuracy = (y_pred == y_true).mean()
precision = precision_score(y_true, y_pred, pos_label=1, zero_division=0)
recall = recall_score(y_true, y_pred, pos_label=1, zero_division=0)
f1 = f1_score(y_true, y_pred, pos_label=1, zero_division=0)

print("==== BASELINE MODEL RESULTS ====")
print("Accuracy:", accuracy)
print("Precision:", precision)
print("Recall:", recall)
print("F1 Score:", f1)

==== BASELINE MODEL RESULTS ====
Accuracy: 0.5
Precision: 0.0
Recall: 0.0
F1 Score: 0.0


Explanation of Zero Precision, Recall, and F1-Score

The baseline classifier was designed to predict all transactions as legitimate (Class = 0). As a result, the model never produced any predictions for the fraud class (Class = 1). Because no transactions were predicted as fraudulent, the number of true positives was zero.

Precision measures the proportion of predicted fraud cases that are actually fraud; since the baseline predicted zero fraud cases, precision is zero. Recall measures the proportion of actual fraud cases that were correctly identified; again, because no fraud cases were predicted, recall is zero. The F1-score, which is the harmonic mean of precision and recall, is also zero because both values are zero.

These results demonstrate that although the baseline classifier may achieve moderate or even high accuracy depending on class distribution, it completely fails to perform the primary task of interest—detecting fraudulent transactions—highlighting the need for more advanced machine learning models and more appropriate evaluation metrics.