# Fraud Detection

This notebook shows how to build a fraud detection model with ThirdAI's Universal Deep Transformer (UDT) model, our all-purpose classifier for tabular datasets. In this demo, we will train and evaluate the model on a fraud detection dataset from kaggle, but you can easily replace this with your own dataset.

You can immediately run a version of this notebook in your browser on Google Colab at the following link:

https://githubtocolab.com/ThirdAILabs/Demos/blob/main/universal_deep_transformer/classification/FraudDetection.ipynb

This notebook uses an activation key that will only work with this demo. If you want to try us out on your own dataset, you can obtain a free trial license at the following link: https://www.thirdai.com/try-bolt/

In [3]:

!mkdir /root/.kaggle
!cp /content/drive/MyDrive/kaggle/kaggle.json /root/.kaggle/kaggle.json
!chmod 600 /root/.kaggle/kaggle.json

mkdir: cannot create directory ‘/root/.kaggle’: File exists


In [None]:
!pip3 install 'kaggle>1.6'
!pip3 install thirdai --upgrade

# This activates the ThirdAI package with a key that is only good for this demo
import thirdai
thirdai.licensing.activate("L3PV-EW79-EK9K-CMCV-A3X9-AY9T-APJ7-4VEK")



# Dataset Download
Here we use the kaggle api to download the fraud detection dataset found here: https://www.kaggle.com/datasets/ealaxi/paysim1

Downloading this dataset requires authentication from a kaggle account. To use the kaggle API like we do below requires a valid kaggle.json file with credentials stored. Visit https://github.com/Kaggle/kaggle-api#api-credentials for more documentation on the kaggle API.

You may also choose to download the dataset directly from the source, in which case you should provide the path to the dataset in the prep_fraud_dataset() call later on.

In [1]:
!pip install kaggle
import os
from kaggle.api.kaggle_api_extended import KaggleApi

api = KaggleApi()
api.authenticate()


# Defina o caminho para o arquivo kaggle.json
os.environ['KAGGLE_CONFIG_DIR'] = '/content/drive/MyDrive/kaggle/'

api.dataset_download_files('ealaxi/paysim1', path='./fraud_detection', unzip=True)

Dataset URL: https://www.kaggle.com/datasets/ealaxi/paysim1


In [1]:
from google.colab import drive
drive.mount('/content/drive')

Mounted at /content/drive


We will use the demos module in the thirdai package to prepare the data for training.

In [None]:
from thirdai.demos import prep_fraud_dataset

dataset_filename = "./content/fraud_detection/PS_20174392719_1491204439457_log.csv"

train_filename, test_filename, inference_batch = prep_fraud_dataset(dataset_filename)

# UDT Initialization
We can now create a UDT model by passing in the types of each column in the dataset and the target column we want to be able to predict.

In [None]:
from thirdai import bolt

model = bolt.UniversalDeepTransformer(
    data_types={
        "step": bolt.types.categorical(),
        "type": bolt.types.categorical(),
        "amount": bolt.types.numerical(range=(0, 10000001)),
        "nameOrig": bolt.types.categorical(),
        "oldbalanceOrg": bolt.types.numerical(range=(0, 59585041)),
        "newbalanceOrig": bolt.types.numerical(range=(0, 49585041)),
        "nameDest": bolt.types.categorical(),
        "oldbalanceDest": bolt.types.numerical(range=(0, 356015890)),
        "newbalanceDest": bolt.types.numerical(range=(0, 356179279)),
        "isFraud": bolt.types.categorical(n_classes=2),
        "isFlaggedFraud": bolt.types.categorical(),
    },
    target="isFraud",
)

# Training
We can now train our UDT model with just one line! Feel free to customize the number of epochs and the learning rate; we have chosen values that give good convergence.

In [None]:
!pip3 install thirdai --upgrade
from thirdai import bolt

# Certifique-se de que o caminho está correto
model.train_filename(
    filename="./content/fraud_detection/PS_20174392719_1491204439457_log.csv",
    epochs=5,
    learning_rate=0.01,
    max_in_memory_batches=12
)





# Evaluation
Evaluating the performance of the UDT model is also just one line!

In [None]:
from thirdai import bolt
test_filename = "./content/fraud_detection/PS_20174392719_1491204439457_log.csv"  # Certifique-se de que o caminho está correto

# Avaliação do modelo
model.evaluate(test_filename, metrics=["categorical_accuracy"])


# Saving and Loading
Saving and loading a trained UDT model to disk is also extremely straight forward.

In [None]:
from thirdai import bolt
save_location = "./content/fraud_detection/fraud_detection.model"

# Saving
model.save(save_location)

# Loading
model = bolt.UniversalDeepTransformer.load(save_location)

# Testing Predictions
The evaluation method is great for testing, but it requires labels, which don't exist in a production setting. We also have a predict method that can take in an in-memory batch of rows or a single row (without the target column), allowing easy integration into production pipelines.

In [None]:
from thirdai import bolt
import numpy as np

print("Inference batch:", inference_batch, "\n")

prediction = model.predict(inference_batch[0])
class_name = model.class_name(np.argmax(prediction))
print("Input:", inference_batch[0], "Prediction:", class_name, "\n")

prediction_batch = model.predict_batch(inference_batch)
class_names = [
    model.class_name(class_id) for class_id in np.argmax(prediction_batch, axis=1)
]
print("Batch Prediction Results")
for input_sample, class_name in zip(inference_batch, class_names):
    print("Input:", input_sample, "Prediction:", class_name)