**#Fraud Detection Model Description**

The fraud detection model developed consists of two main classifiers:
1.   Logistic Regression
2.   Random Forest


---


**Logistic Regression:** This model is a statistical method for binary classification. It uses the logistic function to model the probability of a binary outcome, in this case, whether a transaction is fraudulent or not. The model is trained with class weights balanced to handle the imbalanced nature of the dataset.

**Random Forest:** This is an ensemble learning method that constructs multiple decision trees and merges their results. It is particularly effective for classification problems where the classes are imbalanced. The model uses 100 trees and balanced class weights to manage the imbalance between fraudulent and non-fraudulent transactions.

**#Variable Selection Process:**

Non-informative columns such as **nameOrig**, **nameDest**, and step were removed as they contain information like **customer who started the transaction** and **customer who is the recipient of the transaction**.


---

The categorical variable **type** was encoded using **LabelEncoder** to convert text labels into **numerical values**

Outliers in numerical features **(amount, oldbalanceOrg, newbalanceOrig, oldbalanceDest, newbalanceDest)** were removed to ensure the model training was not Affected by Outliers.

In [39]:
import pandas as pd
import numpy as np
from sklearn.model_selection import train_test_split
from sklearn.preprocessing import LabelEncoder
from sklearn.linear_model import LogisticRegression
from sklearn.metrics import accuracy_score, precision_score, recall_score, f1_score, confusion_matrix

In [2]:
df = pd.read_csv('Fraud.csv') # Reading CSV  file

In [3]:
df_cleaned = df.drop(columns=['nameOrig', 'nameDest', 'step']) # Removing Unwanted Columns

In [5]:
total_nulls = df_cleaned.isnull().sum().sum()
print(f"Total number of null values in the dataset: {total_nulls}")  # total no_of null values

Total number of null values in the dataset: 0


In [79]:
X = df_cleaned.drop(columns=['isFraud', 'isFlaggedFraud'])
y = df_cleaned['isFraud']  # defining target variables

In [80]:
le = LabelEncoder()
X['type'] = le.fit_transform(X['type']) # transform the categorical values into numerical labels

In [81]:
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.2, random_state=42) # Split the data into training and testing sets

In [82]:
logreg = LogisticRegression(max_iter=1000, class_weight='balanced')
logreg.fit(X_train, y_train)                                          # training the logistic regression model with balanced class weights
y_pred_logreg = logreg.predict(X_test)                                # Predicting class labels
y_proba_logreg = logreg.predict_proba(X_test)[:, 1]

In [83]:
threshold = 0.85     # probability threshold for classification
y_pred_threshold = (y_proba_logreg  >= threshold).astype(int)   # Convert predicted probabilities to class labels

In [84]:
accuracy = accuracy_score(y_test, y_pred_logreg)
precision = precision_score(y_test, y_pred_threshold)
recall = recall_score(y_test, y_pred_threshold)                # Calculating the model's performance metrics
f1 = f1_score(y_test, y_pred_threshold)
conf_matrix = confusion_matrix(y_test, y_pred_threshold)

In [85]:
print("Logistic Regression Model:")
print(f"Accuracy: {accuracy:.4f}")
print(f"Precision: {precision:.4f}")
print(f"Recall: {recall:.4f}")
print(f"F1-Score: {f1:.4f}")
print("Confusion Matrix:")
print(conf_matrix)

Logistic Regression Model:
Accuracy: 0.9999
Precision: 0.7742
Recall: 1.0000
F1-Score: 0.8727
Confusion Matrix:
[[182112     14]
 [     0     48]]


**#The models were evaluated using accuracy, precision, recall, F1-score, and confusion matrix:**

**Logistic Regression Model:**


*   Accuracy: 0.9999
*   Precision: 0.7742
*   Recall: 1.0000
*   F1-Score: 0.8727
*   Confusion Matrix  
[[182112     14]
 [     0     48]]








# Key Factors

**amount:** Larger transaction amounts might be indicative of fraudulent activities.

**oldbalanceOrg and newbalanceOrig:** Significant changes in the account balance before and after the transaction could signal suspicious behavior.

**type:** The type of transaction might reveal patterns associated with fraud.

In [71]:
from sklearn.ensemble import RandomForestClassifier   # import random forest classifier

In [30]:
rf_model = RandomForestClassifier(n_estimators=100, class_weight='balanced', random_state=42)
rf_model.fit(X_train, y_train)    # Train the model

In [31]:
y_pred_rf = rf_model.predict(X_test)  # Predict the target variable

In [32]:
accuracy = accuracy_score(y_test, y_pred)
precision = precision_score(y_test, y_pred_rf)
recall = recall_score(y_test, y_pred_rf)                            # Calculating the model's performance metrics
f1 = f1_score(y_test, y_pred_rf)
conf_matrix = confusion_matrix(y_test, y_pred_rf)

In [86]:
print("Random Forest Model:")
print(f"Accuracy: {accuracy:.4f}")
print(f"Precision: {precision:.4f}")
print(f"Recall: {recall:.4f}")
print(f"F1-Score: {f1:.4f}")
print("Confusion Matrix:")
print(conf_matrix)

Random Forest Model:
Accuracy: 0.9999
Precision: 0.7742
Recall: 1.0000
F1-Score: 0.8727
Confusion Matrix:
[[182112     14]
 [     0     48]]


# Random Forest Model:
1.   Accuracy: 0.9999
2.   Precision: 0.7742
3.   Recall: 1.0000
4.   F1-Score: 0.8727
5.   Confusion Matrix:
    [[182112     14]
    [     0     48]]

#Prevention Measures:



1.   **Continuous Monitoring** : Implement real-time fraud detection systems
2.   **Enhanced Authentication** : Introduce multi-factor authentication
3.   **Fraud Detection Algorithms** : Regularly update and retrain fraud detection models
4.   **User Education** : Educate customers about safe transaction





# Evaluation of Measures:



1.   Performance Metrics
2.   Incident Tracking
3.   Customer Feedback

