<h1 align="center">
  <a href="https://uptrain.ai">
    <img width="300" src="https://user-images.githubusercontent.com/108270398/214240695-4f958b76-c993-4ddd-8de6-8668f4d0da84.png" alt="uptrain">
  </a>
</h1>

<h1 style="text-align: center;">Performance Monitoring: Fraud Detection</h1>

**Overview**: In this example, we see how to use UpTrain to monitor performance of a fraud classification task. For the same, we will be training a binary classifier on a popular network traffic dataset called the [NSL-KDD dataset](https://www.unb.ca/cic/datasets/nsl.html) for cyber-attack classification using the [XGBoost classifier](https://xgboost.readthedocs.io/en/stable/). 

**Dataset**: The NSL-KDD dataset includes a variety of network attack types, including denial-of-service (DoS) attacks, unauthorized access (U2R) attacks, and probe attacks. The dataset contains a total of around 25,000 instances and 41 different features that describe the behavior of network connections, such as the number of failed login attempts and the size of packets transmitted.

**Why is monitoring needed**: Once our fraud detection model has been trained, it may initially perform well in detecting malicious activity. However, over time, attackers may adapt their tactics and evolve their methods, leading to a mismatch between the type of attacks seen during training and those seen in production. This can result in decreased accuracy in our model's predictions.

**Solution**: We will be using UpTrain framework which provides an easy-to-configure way to log model predictions and attach ground-truth to monitor model's performance. We are using drift detection methon on top on model performance to raise alerts in case of any dip in model's accuracy, commonly called **Concept Drift.**

### Install required packages for this example [XGBoost]

In [None]:
!pip install joblib

#### Let's first import all the required packages

In [None]:
import uptrain
import time
import numpy as np
import yaml 
import pandas as pd
from sklearn.model_selection import train_test_split
from sklearn.metrics import accuracy_score
from xgboost import XGBClassifier

from helper_funcs import download_dataset, pretty

## Step 1: Let's download and prepare the NSL-KDD dataset

In [None]:
data_file = "NSL_KDD_binary.csv"
download_dataset(data_file)

#### Let's read the data and see how it looks

In [None]:
df = pd.read_csv(data_file)
print("Labels for first few rows:")
print(list(df['label'].head()), "\n")
print("Input features for first few rows:")
df.drop("label", axis=1).head()

#### Divide the data into training and test sets
We use first 10% of the data to train and 90% of the data to evaluate the model in production

In [None]:
X_train, X_test, y_train, y_test = train_test_split(df.iloc[:, :-1].values, df.iloc[:, -1].values,
                                                    test_size = 0.9, 
                                                    random_state = 0,
                                                    shuffle=False)

print("Num Training samples: ", str(len(X_train)) + ",", " Num Testing samples: ", len(X_test))

## Step 2: Train our XGBoost Classifier

In [None]:
# Train the XGBoost classifier with training data
classifier = XGBClassifier()
classifier.fit(X_train, y_train)

y_pred = classifier.predict(X_train)
print("Training accuracy: " + str(100*accuracy_score(y_train, y_pred)))

Woah! 😲🔥 The training accuracy is 100%. Let's see how long the model lasts in production. 

## Step 3: Monitoring model performance using UpTrain

In [None]:
cfg = {
    # Checks to identify concept drift
    "checks": [{
        'type': uptrain.Anomaly.CONCEPT_DRIFT,
        'algorithm': uptrain.DataDriftAlgo.DDM
    }],
    
    # Folder that stores the drifted data-points identified by UpTrain
    "retraining_folder": 'uptrain_smart_data',
    
    # Enable streamlit logging to visualize model's performance
    "st_logging": True,
}
pretty(cfg)

In [None]:
# Initialize the UpTrain framework
framework = uptrain.Framework(cfg)

batch_size = 10000
for i in range(int(len(X_test)/batch_size)):
    
    # Do model prediction
    inputs = {"feats": X_test[i*batch_size:(i+1)*batch_size]}
    preds = classifier.predict(inputs["feats"])
    
    # Log model inputs and outputs to monitor concept drift
    ids = framework.log(inputs=inputs, outputs=preds)
    
    # Attach ground truth to corresponding predictions 
    # in UpTrain framework and identify concept drift
    ground_truth = y_test[i*batch_size:(i+1)*batch_size] 
    framework.log(identifiers=ids, gts=ground_truth)
    
    # Pausing between batches to monitor progress in the dashboard
    time.sleep(0.5)

As can be noted from the dashboard, we start seeing a sharp dip in model's accuracy around the timestamp of 111k.

<img width="629" alt="concept_drift_avg_acc" src="https://user-images.githubusercontent.com/5287871/216795937-7e3e0609-6053-4256-956d-c07de3b7d73e.png">

In the this example, we used a popular drift detection algorithm called the [Drift Detection Method (DDM)](https://riverml.xyz/0.11.1/api/drift/DDM/) which is already implemented as a part of the UpTrain package. However, as we see the model accuracy is dropping from 99.7% to 96.9% which is still a slow decline and might not raise many eyebrows. 

For better detection and understanding the severity of the issue, one might want to define a customized metric and monitor the models using them. Let's see how to do that in UpTrain.

## Step 4: Define a Custom Monitor in UpTrain (for better monitoring)

We now define a custom drift metric which monitors the difference between accuracy of the model on the first 200 predictions and the most recent 200 predictions. This way, they can quickly identify if there was a sudden degradation in the model performance.

Let's define our custom check and UpTrain config with check as "Custom Monitor" as below:

In [None]:
"""
Defining a custom drift metric to check if accuracy drops beyond a threshold.
"""

def custom_initialize_func(self):
    self.initial_acc = None       
    self.acc_arr = []
    self.count = 0       
    self.thres = 0.02
    self.window_size = 200
    self.is_drift_detected = False

def custom_check_func(self, inputs, outputs, gts=None, extra_args={}):
    batch_size = len(extra_args["id"])
    self.count += batch_size
    self.acc_arr.extend(list(np.equal(gts, outputs)))
    
    # Calculate initial performance of the model on first 200 points
    if (self.count >= self.window_size) and (self.initial_acc is None):
        self.initial_acc = sum(self.acc_arr[0:self.window_size])/self.window_size
        
    # Calculate the most recent accuracy and log it to dashboard.
    if (self.initial_acc is not None):
        for i in range(self.count - batch_size, self.count, self.window_size):
            
            # Calculate the most recent accuracy
            recent_acc = sum(self.acc_arr[i:i+self.window_size])/self.window_size
            
            # Logging to UpTrain dashboard
            self.log_handler.add_scalars('custom_metrics', 
                    {'y_acc': self.initial_acc},
                i, self.dashboard_name, file_name='initial_acc')
            self.log_handler.add_scalars('custom_metrics', 
                    {'y_acc': recent_acc, },
                i, self.dashboard_name, file_name='recent_acc')
            
            # Send an alert when recent model performance goes down 
            if (self.initial_acc - recent_acc > self.thres) and (not self.is_drift_detected):
                alert = f"Concept drift detected with custom metric at time: {i}!!!" 
                print(alert)
                self.log_handler.add_alert(
                    "Model Performance Degradation Alert 🚨",
                    alert,
                    self.dashboard_name
                )
                self.is_drift_detected = True

cfg = {
    "checks": [
        {
            # Check for our custom monitor
            'type': uptrain.Anomaly.CUSTOM_MONITOR,
            'initialize_func': custom_initialize_func,
            'check_func': custom_check_func,
            'need_gt': True,
        },
        {
            # Additionally check for our concept drift from above
            'type': uptrain.Anomaly.CONCEPT_DRIFT,
            'algorithm': uptrain.DataDriftAlgo.DDM
        }
    ],
    
    # Folder that stores the drifted data-points identified by UpTrain
    "retraining_folder": 'uptrain_smart_data',
    
    # Enable streamlit logging to visualize model's performance
    "st_logging": True,
}
pretty(cfg)

In [None]:
# Initialize the UpTrain framework
framework = uptrain.Framework(cfg)

batch_size = 10000
for i in range(int(len(X_test)/batch_size)):
    
    # Do model prediction
    inputs = {"feats": X_test[i*batch_size:(i+1)*batch_size]}
    preds = classifier.predict(inputs["feats"])
    
    # Log model inputs and outputs to monitor concept drift
    ids = framework.log(inputs=inputs, outputs=preds)
    
    # Attach ground truth to corresponding predictions 
    # in UpTrain framework and identify concept drift
    ground_truth = y_test[i*batch_size:(i+1)*batch_size] 
    framework.log(identifiers=ids, gts=ground_truth)
    
    # Pausing between batches to monitor progress in the dashboard
    time.sleep(0.5)

As we see, we see a sudden (and more alarming) drop using our custom monitors. We can clearly see that the model accuracy drops from 99.7% to 77%, enabling us to send better alerts and take more urgent measures (ex: model retraining) to solve them. 

<img width="624" alt="concept_drift_custom" src="https://user-images.githubusercontent.com/5287871/216795956-a35bcd9f-8b60-439d-9ea2-8e19854390bb.png">

## Conclusion

Model monitoring is very crucial for tasks such as fraud detection, cyber-security attacks etc. where the malicious agents continuously improve their attack vectors and with time, learn to evade detection. Real-time model observability enables one to proactively address any performance degradation before it leads to serious consequences, such as hacks or financial loss.

In this example, we saw two ways to detect performance degradation - Concept Drift via DDM and Custom monitor. The UpTrain framework has many other statistical tools, such as data drift, integrity checks, shift in model outputs, and outlier detection, that can be used to identify model issues, even in cases where ground truth is not available. You can explore them [here](https://github.com/uptrain-ai/uptrain/tree/main/examples)

- Automatically detecting edge-cases and out-of-distribution samples - [Link](https://github.com/uptrain-ai/uptrain/blob/main/examples/human_orientation_classification/run.ipynb)
- Defining custom signals to identify edge-cases - [Link](https://github.com/uptrain-ai/uptrain/blob/main/examples/human_orientation_classification/deepdive_examples/uptrain_edge_cases_torch.ipynb)
- Using Data-Drift (i.e. shifts in input distribution) to identify dips in model performance - Coming soon
- Monitoring bias in recommendation systems - [Link](https://github.com/uptrain-ai/uptrain/blob/main/examples/shopping_cart_recommendation/run.ipynb)
