# Credit Card Fraud Detection

> Credit card fraud detection is the process of identifying purchase attempts that are fraudulent and rejecting them rather than processing the order. It is important that credit card companies are able to recognize fraudulent credit card transactions so that customers are not charged for items that they did not purchase. In this notebook, we will use Unsupervised Machine Learning techniques like Isolation Forest and Local Outlier Factor to detect the outliers.

![](https://www2.deloitte.com/content/dam/Deloitte/fi/Images/header_images/machine%20learning%20in%20payment%20fraud%20detection_banner.jpg)

In [None]:
import numpy as np
import pandas as pd
import matplotlib.pyplot as plt
from sklearn.ensemble import IsolationForest
from sklearn.neighbors import LocalOutlierFactor
from sklearn.metrics import accuracy_score
from tabulate import tabulate

# Data Exploration

In [None]:
df=pd.read_csv("/kaggle/input/creditcardfraud/creditcard.csv")
df.head()

In [None]:
df.describe(include="all")

In [None]:
df.groupby(['Class']).size() # Class 0: Normal Transaction, Class 1: Fraud Transaction

In [None]:
f, (ax1, ax2) = plt.subplots(2, 1, sharex=True, sharey=True)
f.suptitle('Amount per transaction by class')
ax1.hist(df[df['Class']==1].Amount,bins=20)
ax1.set_title('Fraud Transactions')
ax2.hist(df[df['Class']==0].Amount,bins=20)
ax2.set_title('Normal Transactions')
plt.xlabel('Amount ($)')
plt.ylabel('Number of Transactions')
plt.yscale('log')
plt.rcParams["figure.figsize"] = (24,6)
plt.show();

# Model Prediction

Here we will use the following algorithms to detect the outliers:
* Isolation Forest
* Local Outlier Factor

In [None]:
columnNames=list(df.columns)[:-1] #Dropping class column, as it's the target column.
contamination=len(df[df['Class']==1])/float(len(df[df['Class']==0]))

# Isolation Forest Algorithm

The IsolationForest ‘isolates’ observations by randomly selecting a feature and then randomly selecting a split value between the maximum and minimum values of the selected feature.

**Logic:** Isolating the Anomaly observations will be easier than the normal observations as only few features will be needed to seperate the anomaly observations. The Anomaly score increases as the number of brances increases. So the Anomaly observations will have very less anomaly score compared to others.

**Important Parameters:**
* ***n_estimators***: The number of base estimators in the ensemble.
* ***max_samples***: The number of samples to draw from X to train each base estimator.
* ***contamination***: The amount of contamination of the data set, i.e. the proportion of outliers in the data set.
* ***random_state***: Controls the pseudo-randomness of the selection of the feature and split values for each branching step and each tree in the forest.

In [None]:

cIF=IsolationForest(n_estimators=1000, max_samples="auto", contamination=contamination,random_state=19)
df['Class_IF']=cIF.fit_predict(df[columnNames].values) # Class 1: Normal Transaction, Class -1: Fraud Transaction
df['Class_IF']=abs(df['Class_IF']-1)//2                # Mapping 1 -> 0 and -1 -> 1

In [None]:
print("Accuracy (IF): ",accuracy_score(df['Class'],df['Class_IF']))
print(df.groupby(['Class_IF']).size())

# Local Outlier Factor Algorithm

Local Outlier Factor works on the local deviation of the density of a given sample with respect to its neighbors. Here, Euclidean distance is being used to calculate the distance.

**Logic:** The Density of the Anomaly observations will be very less as compared to the normal observations.

**Important Parameters:**
* ***n_neighbors***: Number of neighbors to use by default for kneighbors queries.
* ***leaf_size***: Leaf is size passed to BallTree or KDTree. The optimal value depends on the nature of the problem.
* ***contamination***: The amount of contamination of the data set, i.e. the proportion of outliers in the data set.

In [None]:
cLOF=LocalOutlierFactor(n_neighbors=50,leaf_size=10, contamination=contamination)
df['Class_LOF']=cLOF.fit_predict(df[columnNames].values) # Class 1: Normal Transaction, Class -1: Fraud Transaction
df['Class_LOF']=abs(df['Class_LOF']-1)//2                # Mapping 1 -> 0 and -1 -> 1

In [None]:
print("Accuracy (LOF): ",accuracy_score(df['Class'],df['Class_LOF']))
print(df.groupby(['Class_LOF']).size())

# Model Evaluation

In [None]:
df.head()

In [None]:
data=[
     ["Isolation Forest",accuracy_score(df['Class'],df['Class_IF'])],
     ["Local Outlier Factor",accuracy_score(df['Class'],df['Class_LOF'])]
     ]
columns=["Algorithm","Accuracy"]

print(tabulate(data, headers=columns, tablefmt="fancy_grid"))