In this notebook, we will see how we can use UpTrain package to observe model performance, data distributions, data integrity and identify edge cases to retrain an orientation classification model to improve it's accuracy. We are considering a task where given human pose (ie location of key-points such as nose, shoulders, wrist, hips, ankles etc.), the model tries to predict whether the person is in a vertical (ie standing) or a horizontal (ie lying) position.

In [1]:
import sys
import os
import subprocess
import zipfile
import numpy as np
import uptrain
from contextlib import redirect_stdout

from model_files import input_to_dataset_transformation, read_json, write_json, KpsDataset
from model_files import body_length_signal, pushup_signal, plot_all_cluster

import json
import torch

First, let's download the training and testing datasets

In [2]:
data_dir = "data"
remote_url = "https://oodles-dev-training-data.s3.amazonaws.com/data.zip"
orig_training_file = 'data/training_data.json'
if not os.path.exists(data_dir):
    try:
        # Most Linux distributions have Wget installed by default.
        # Below command is to install wget for MacOS
        wget_installed_ok = subprocess.call("brew install wget", shell=True, stdout=subprocess.DEVNULL, stderr=subprocess.STDOUT)
        print("Successfully installed wget")
    except:
        dummy = 1
    try:
        if not os.path.exists("data.zip"):
            file_downloaded_ok = subprocess.call("wget " + remote_url, shell=True, stdout=subprocess.DEVNULL, stderr=subprocess.STDOUT)
            print("Data downloaded")
        with zipfile.ZipFile("data.zip", 'r') as zip_ref:
            zip_ref.extractall("./")
        full_training_data = read_json(orig_training_file)
        np.random.seed(1)
        np.random.shuffle(full_training_data)
        reduced_training_data = full_training_data[0:1000]
        write_json(orig_training_file, reduced_training_data)
        print("Prepared Example Dataset")
        os.remove("data.zip")
    except Exception as e:
        print(e)
        print("Could not load training data")

Let's define the data files: 

1. Real world test cases contains the data-points which the models sees in production. 
2. Golden testing file is a testing dataset which we will use to compare performance of retrained model against originally deployed model. 
3. We want to log the collected data-points to a local folder defined in data save fold (this can also be a SQL table, a data warehouse etc.). 
4. To annotate the collected data points, we are extracting the Ground Truth from the master annotation file (this can also do something like schedule an annotation job on Mechanical turk or integrate with your other annotation pipelines). 

In [3]:
real_world_test_cases = 'data/real_world_testing_data.json'
golden_testing_file = 'data/golden_testing_data.json'
data_save_fold = "uptrain_smart_data"

inference_batch_size = 16
annotation_args = {'master_file': 'data/master_annotation_data.json'}

Next, we train our network using Deep Neural Network

In [4]:
from model_files import get_accuracy_torch, train_model_torch, BinaryClassification
train_model_torch('data/training_data.json', 'version_0')

Training on:  data/training_data.json  which has  1000  data-points
Trained model exists. Skipping training again.


Next, we evaluate the model on our golden testing dataset to see it's accuracy.

In [5]:
get_accuracy_torch(golden_testing_file, 'version_0')

Evaluating on  15731  data-points


0.8841777382238891

Let's define the UpTrain config  We also pass our training and evaluation arguments to facilitate automated retraining if a significant number of edge cases are detected.

Let's define the UpTrain config with following checks:

1. Data Drift for input features - keypoints: Keypoints is a 34-dimensional vector (x,y for 17 body joints). We will use Embedding based clustering to calculate Earth Moving Distance to identify if we see data distributions very different from the reference dataset (ie original training file). Additionally, it also collects the edge datapoints.

2. Data Integrity - Check if body length (a custom defined metric) is greater than 100

3. Edge cases - We define a Pushup signal which based on location of wrist, ankle and shoulder keypoints, estimate if the person is in pushup position. We use this signal to collect edge cases as based on manual testing, we saw our model's predictions are unreliable when we were lying upside down.

4. Concept Drift - We want to monitor degradation in model's performance. We will use DDM (Drift Detection Method) for the same.

In [6]:
cfg = {
    "checks": [
    {
        'type': uptrain.Anomaly.DATA_DRIFT,
        'reference_dataset': orig_training_file,
        'is_embedding': True,
        'cluster_plot_func': plot_all_cluster,
    },
    {
        'type': uptrain.Anomaly.DATA_INTEGRITY,
        "integrity_type": "greater_than",
        "threshold": 100,
        "measurable_args": {
            'type': uptrain.MeasurableType.CUSTOM,
            'signal_formulae': uptrain.Signal("Body Length", body_length_signal),
        }
    },
    {
        'type': uptrain.Anomaly.EDGE_CASE, 
        "signal_formulae": uptrain.Signal("Pushup", pushup_signal)

    },
    {
        'type': uptrain.Anomaly.CONCEPT_DRIFT,
        'algorithm': uptrain.DataDriftAlgo.DDM
    }],

    "data_identifier": "id",
    "feat_name_list": ["kps"],

    # Connect training pipeline to annotate data and retrain the model
    "training_args": {
        "data_transformation_func": input_to_dataset_transformation,  
        "annotation_method": {"method": uptrain.AnnotationMethod.MASTER_FILE, "args": annotation_args}, 
        "training_func": train_model_torch, 
        "fold_name": data_save_fold,
        "orig_training_file": orig_training_file,
        "cluster_plot_func": plot_all_cluster
    },

    # Retrain once 250 edge cases are collected
    "retrain_after": 250,

    # Connect evaluation pipeline to test retrained model against original model
    "evaluation_args": {
        "inference_func": get_accuracy_torch,
        "golden_testing_dataset": golden_testing_file,
        "metrics_to_check": ['accuracy']
    }
}

To integrate UpTrain, we need to just initialise a Framework object with above-defined config and log model inputs and outputs in our inference function. To monitor concept drift, we will also extract ground truth from annotation file and log GTs.

To mimic real-world settings, we take a real-world testing dataset, load data-points batch by batch and run the model inference on them.

In [7]:
framework_torch = uptrain.Framework(cfg)

model_dir = 'trained_models_torch/'
model_save_name = 'version_0'
real_world_dataset = KpsDataset(
    real_world_test_cases, batch_size=inference_batch_size, is_test=True
)
model = BinaryClassification()
model.load_state_dict(torch.load(model_dir + model_save_name))
model.eval()
gt_data = read_json(annotation_args['master_file'])
all_gt_ids = [x['id'] for x in gt_data]

for i,elem in enumerate(real_world_dataset):

    # Do model prediction
    inputs = {"data": {"kps": elem[0]["kps"]}, "id": elem[0]["id"]}
    x_test = torch.tensor(inputs["data"]["kps"]).type(torch.float)
    test_logits = model(x_test).squeeze() 
    preds = torch.round(torch.sigmoid(test_logits)).detach().numpy()

    # Log model inputs and outputs to the uptrain Framework to monitor input and output data related checks
    idens = framework_torch.log(inputs=inputs, outputs=preds)

    # Attach ground truth to monitor model performance and concept drift
    this_elem_gt = [gt_data[all_gt_ids.index(x)]['gt'] for x in elem[0]['id']]
    framework_torch.log(identifiers=idens, gts=this_elem_gt)

    # Retrain only once
    if framework_torch.version > 1:
        break

Deleting the folder:  uptrain_smart_data
Deleting the folder:  uptrain_logs
55  edge-cases collected out of  208  inferred samples
100  edge-cases collected out of  416  inferred samples
151  edge-cases collected out of  624  inferred samples
206  edge-cases collected out of  832  inferred samples
250  edge-cases collected out of  992  inferred samples
Kicking off re-training
255 data-points selected out of 1008
Training on:  uptrain_smart_data/1/training_dataset.json  which has  2275  data-points
Trained model exists. Skipping training again.
Model retraining done...
Generating comparison report...
Training on:  data/training_data.json  which has  1000  data-points
Trained model exists. Skipping training again.
Evaluating on  15731  data-points
Evaluating on  15731  data-points
---------------------------------------------
---------------------------------------------
Old model accuracy:  0.8841777382238891
Retrained model accuracy (ie 255 smartly collected data-points added):  0.9924

In [8]:
!tensorboard --logdir uptrain_logs

TensorFlow installation not found - running with reduced feature set.
Serving TensorBoard on localhost; to expose to the network, use a proxy or pass --bind_all
TensorBoard 2.11.0 at http://localhost:6006/ (Press CTRL+C to quit)
^C
