In this notebook, we will see how to use UpTrain Framework to identify edge cases and retrain an orientation classification model to improve its accuracy. We are considering a task where given human pose (ie location of key-points such as nose, shoulders, wrist, hips, ankles etc.), the model tries to predict whether the person is in a vertical (ie standing) or a horizontal (ie lying) position.

In [1]:
import sys
import os
import subprocess
import zipfile
import numpy as np
import uptrain
from contextlib import redirect_stdout

from dataset import input_to_dataset_transformation, read_json, write_json, KpsDataset
from pushup_signal import pushup_signal

import joblib
import json

First, let's download the training and testing datasets

In [2]:
data_dir = "data"
remote_url = "https://oodles-dev-training-data.s3.amazonaws.com/data.zip"
orig_training_file = 'data/training_data.json'
if not os.path.exists(data_dir):
    try:
        file_downloaded_ok = subprocess.check_output("wget " + remote_url, shell=True)
    except:
        print("Could not load training data")
    with zipfile.ZipFile("data.zip", 'r') as zip_ref:
        zip_ref.extractall("./")

    full_training_data = read_json(orig_training_file)
    np.random.seed(1)
    np.random.shuffle(full_training_data)
    reduced_training_data = full_training_data[0:1000]
    write_json(orig_training_file, reduced_training_data)
    
training_file = 'data/training_data.json'
golden_testing_file = 'data/golden_testing_data.json'


### Training with Logistic Regression (LR)

In [3]:
from model_logistic_regression import get_accuracy_lr, train_model_lr
train_model_lr(training_file, 'version_0')

Training on:  data/training_data.json  which has  1000  data-points
Model saved at:  trained_models_lr/version_0


Next, we evaluate the model on our golden testing dataset to see it's accuracy.

In [4]:
get_accuracy_lr(golden_testing_file, 'version_0')

Evaluating on  15731  data-points


0.8586231008836056

We observe that the testing accuracy of the model is quite low. We saw that on manual testing, model's outputs were unreliable in cases where we were in pushup position. Next, we will define the UpTrain config with edge-case check for Pushup signals. We also pass our training and evaluation arguments to facilitate automated retraining if a significant number of edge cases are detected.

Let's define the data files: 

1. Real world test cases contains the data-points which the models sees in production. 2. Golden testing file is a testing dataset which we will use to compare performance of retrained model against originally deployed model. 
3. We want to log the collected data-points to a local folder defined in data save fold (this can also be a SQL table, a data warehouse etc.). 
4. To annotate the collected data points, we are extracting the Ground Truth from the master annotation file (this can also do something like schedule an annotation job on Mechanical turk or integrate with your other annotation pipelines). 
5. Finally, we define a Pushup signal which based on location of wrist, ankle and shoulder keypoints, estimate if the person is in pushup position. We use this signal to collect edge cases as based on manual testing, we saw our model's predictions are unreliable when we were lying upside down

In [5]:
real_world_test_cases = 'data/real_world_testing_data.json'
annotation_args = {'master_file': 'data/master_annotation_data.json'}
data_save_fold = 'uptrain_smart_data__edge_cases'

# Defining the egde-case signal
pushup_edge_case = uptrain.Signal("Pushup", pushup_signal)

cfg = {
    # Define your signal to identify edge cases
    "checks": [{
        'type': uptrain.Anomaly.EDGE_CASE, 
        "signal_formulae": pushup_edge_case
    }],
    
    # Will use this as the primary key to reference individual data-points
    "data_identifier": "id",

    # Connect training pipeline to annotate data and retrain the model
    "training_args": {
        "data_transformation_func": input_to_dataset_transformation,  
        "annotation_method": {"method": uptrain.AnnotationMethod.MASTER_FILE, "args": annotation_args}, 
        "training_func": train_model_lr, 
        "fold_name": data_save_fold,
        "orig_training_file": orig_training_file,  
    },

    # Retrain once 250 edge cases are collected
    "retrain_after": 250,

    # Connect evaluation pipeline to test retrained model against original model
    "evaluation_args": {
        "inference_func": get_accuracy_lr,
        "golden_testing_dataset": golden_testing_file,
        "metrics_to_check": ['accuracy']
    }
}

To integrate UpTrain, we need to just initialise a Framework object with above-defined config and log model inputs and outputs in our inference function. 

To mimic real-world settings, we take a real-world testing dataset, load data-points batch by batch and run the model inference on them.

In [6]:
framework_lr = uptrain.Framework(cfg)

testing_dataset = KpsDataset(real_world_test_cases, normalization=True)
X_test, y_test, id = testing_dataset.load_x_y_from_data()
inference_batch_size = 256
pred_classes = []
model = joblib.load("trained_models_lr/" + 'version_0')
for i in range(int(np.ceil(len(X_test)/inference_batch_size))): 
    # Do model prediction
    elem = X_test[i*inference_batch_size:min((i+1)*inference_batch_size,len(X_test))]
    ids = id[i*inference_batch_size:min((i+1)*inference_batch_size,len(X_test))]
    inputs = {"data": {"kps": elem}, "id": ids}
    preds = model.predict(inputs['data']['kps'])

    # Log model inputs and outputs to the uptrain Framework
    idens = framework_lr.log(inputs=inputs, outputs=preds)

    # Retrain only once
    if framework_lr.version > 1:
        break

Deleting the folder:  uptrain_smart_data__edge_cases
Deleting the folder:  uptrain_logs
132  edge-cases collected out of  256  inferred samples
246  edge-cases collected out of  512  inferred samples
368  edge-cases collected out of  768  inferred samples
Kicking off re-training
368 data-points selected out of 768
Training on:  uptrain_smart_data__edge_cases/1/training_dataset.json  which has  2840  data-points
Model saved at:  trained_models_lr/version_1
Model retraining done...
Generating comparison report...
Training on:  data/training_data.json  which has  1000  data-points
Trained model exists. Skipping training again.
Evaluating on  15731  data-points
Evaluating on  15731  data-points
---------------------------------------------
---------------------------------------------
Old model accuracy:  0.8586231008836056
Retrained model accuracy (ie 368 smartly collected data-points added):  0.9691055876930901
---------------------------------------------
-------------------------------

In the comparison report above, we can see how UpTrain improved the model performance by detecting edge-cases and retraining the model under-the-hood. Further, UpTrain is agnostic to the model type and training functions. To illustrate this, we again train our orientation classification model, but this time with Deep Neural Networks.

### Training using Deep Neural Network (with PyTorch)

In [7]:
import torch

from model_torch import get_accuracy_torch, train_model_torch, BinaryClassification
train_model_torch('data/training_data.json', 'version_0')

Training on:  data/training_data.json  which has  1000  data-points
Epoch 0: Loss 4.843758450400445
Epoch 1: Loss 1.3056830919407834
Epoch 2: Loss 0.963068675014696
Epoch 3: Loss 0.9446813049870534
Epoch 4: Loss 0.8719852773281044
Epoch 5: Loss 0.5867635330158067
Epoch 6: Loss 0.48831840431328954
Epoch 7: Loss 0.5206127582173541
Epoch 8: Loss 0.3507806431908619
Epoch 9: Loss 0.29904839011273465
Model saved at:  trained_models_torch/version_0


Next, we get the model accuracy on testing dataset, which is again low due to misclassification of Pushup signals.

In [8]:
get_accuracy_torch(golden_testing_file, 'version_0')

Evaluating on  15731  data-points


0.946792956582544

Update the UpTrain config with new training workflows and checks. Let's also add a check for edge-cases when model confidence is low (because why not!). For binary entropy confidence, we can directly use one of the pre-defined model signals and adjust the confidence threshold according to our model.

In [9]:
# Whenever model confidence is <0.9, identify it as an edge-case 
low_conf_edge_case = uptrain.Signal(uptrain.ModelSignal.BINARY_ENTROPY_CONFIDENCE, 
                is_model_signal=True) < 0.9

cfg['checks'][0].update({"signal_formulae": (pushup_edge_case | low_conf_edge_case)})
cfg['training_args'].update({'training_func': train_model_torch})
cfg['evaluation_args'].update({'inference_func': get_accuracy_torch})

In [10]:
framework_torch = uptrain.Framework(cfg)

inference_batch_size = 16
model_dir = 'trained_models_torch/'
model_save_name = 'version_0'
real_world_dataset = KpsDataset(
    real_world_test_cases, batch_size=inference_batch_size, shuffle=False, augmentations=False, is_test=True
)
model = BinaryClassification()
model.load_state_dict(torch.load(model_dir + model_save_name))
model.eval()
gt_data = read_json(annotation_args['master_file'])
all_gt_ids = [x['id'] for x in gt_data]

for i,elem in enumerate(real_world_dataset):

    # Do model prediction
    inputs = {"data": {"kps": elem[0]["kps"]}, "id": elem[0]["id"]}
    x_test = torch.tensor(inputs["data"]["kps"]).type(torch.float)
    test_logits = model(x_test).squeeze() 
    preds = torch.round(torch.sigmoid(test_logits)).detach().numpy()
    idens = framework_torch.log(inputs=inputs, outputs=preds)

    # Attach ground truth
    this_elem_gt = [gt_data[all_gt_ids.index(x)]['gt'] for x in elem[0]['id']]
    framework_torch.log(identifiers=idens, gts=this_elem_gt)

    # Retrain only once
    if framework_torch.version > 1:
        break

Deleting the folder:  uptrain_smart_data__edge_cases
Deleting the folder:  uptrain_logs
55  edge-cases collected out of  208  inferred samples
100  edge-cases collected out of  416  inferred samples
151  edge-cases collected out of  624  inferred samples
206  edge-cases collected out of  832  inferred samples
250  edge-cases collected out of  992  inferred samples
Kicking off re-training
255 data-points selected out of 1008
Training on:  uptrain_smart_data__edge_cases/1/training_dataset.json  which has  2275  data-points
Epoch 0: Loss 4.456094716983834
Epoch 1: Loss 1.5041649351908155
Epoch 2: Loss 0.7094659108056898
Epoch 3: Loss 0.5479656164785824
Epoch 4: Loss 0.44732532045410295
Epoch 5: Loss 0.4169701081771625
Epoch 6: Loss 0.38897821375919295
Epoch 7: Loss 0.2797425045332354
Epoch 8: Loss 0.23910698301602706
Epoch 9: Loss 0.21545567348820582
Model saved at:  trained_models_torch/version_1
Model retraining done...
Generating comparison report...
Training on:  data/training_data.js

### Training using Deep Neural Network (with Tensorflow), 

Note: Requires tensorflow to be installed. We ran the following code successfully with tf version 2.11.0

In [11]:
import tensorflow as tf
tf.config.set_visible_devices([], 'GPU')

from model_tf import get_accuracy_tf, train_model_tf
train_model_tf('data/training_data.json', 'version_0')

Training on:  data/training_data.json  which has  1000  data-points


2023-01-15 22:32:39.163172: I tensorflow/core/platform/cpu_feature_guard.cc:193] This TensorFlow binary is optimized with oneAPI Deep Neural Network Library (oneDNN) to use the following CPU instructions in performance-critical operations:  SSE4.1 SSE4.2
To enable them in other operations, rebuild TensorFlow with the appropriate compiler flags.


Epoch 1/10
Epoch 2/10
Epoch 3/10
Epoch 4/10
Epoch 5/10
Epoch 6/10
Epoch 7/10
Epoch 8/10
Epoch 9/10
Epoch 10/10
INFO:tensorflow:Assets written to: trained_models_tf/version_0/assets
Model saved at:  trained_models_tf/version_0


Next, we get the model accuracy on testing dataset, which is again low due to misclassification of Pushup signals.

In [12]:
get_accuracy_tf(golden_testing_file, 'version_0')

Evaluating on  15731  data-points


0.9422795753607527

Update the UpTrain config with new training workflows and checks. Let's also add a check for edge-cases when model confidence is low (because why not!). For binary entropy confidence, we can directly use one of the pre-defined model signals and adjust the confidence threshold according to our model.

In [13]:
# Whenever model confidence is <0.9, identify it as an edge-case 
low_conf_edge_case = uptrain.Signal(uptrain.ModelSignal.BINARY_ENTROPY_CONFIDENCE, 
                is_model_signal=True) < 0.9

cfg['checks'][0].update({"signal_formulae": (pushup_edge_case | low_conf_edge_case)})
cfg['training_args'].update({'training_func': train_model_tf})
cfg['evaluation_args'].update({'inference_func': get_accuracy_tf})

In [14]:
framework_tf = uptrain.Framework(cfg)

model_dir = 'trained_models_tf/'
model_save_name = 'version_0'
inference_batch_size = 16
real_world_dataset = KpsDataset(
    real_world_test_cases, batch_size=inference_batch_size, shuffle=False, augmentations=False, is_test=True
)
model = tf.keras.models.load_model(model_dir + model_save_name)
gt_data = read_json(annotation_args['master_file'])
all_gt_ids = [x['id'] for x in gt_data]

for i,elem in enumerate(real_world_dataset):

    # Do model prediction
    inputs = {"data": {"kps": elem[0]["kps"]}, "id": elem[0]["id"]}
    with open('evaluation_logs.txt', 'w') as f:
        with redirect_stdout(f):
            preds = model.predict(inputs['data']['kps'])

    # Log model inputs and outputs to the uptrain Framework
    idens = framework_tf.log(inputs=inputs, outputs=preds)

    # Retrain only once
    if framework_tf.version > 1:
        break

Deleting the folder:  uptrain_smart_data__edge_cases
Deleting the folder:  uptrain_logs
50  edge-cases collected out of  192  inferred samples
101  edge-cases collected out of  400  inferred samples
150  edge-cases collected out of  608  inferred samples
202  edge-cases collected out of  816  inferred samples
252  edge-cases collected out of  976  inferred samples
Kicking off re-training
252 data-points selected out of 976
Training on:  uptrain_smart_data__edge_cases/1/training_dataset.json  which has  2260  data-points
Epoch 1/10
Epoch 2/10
Epoch 3/10
Epoch 4/10
Epoch 5/10
Epoch 6/10
Epoch 7/10
Epoch 8/10
Epoch 9/10
Epoch 10/10
INFO:tensorflow:Assets written to: trained_models_tf/version_1/assets
Model saved at:  trained_models_tf/version_1
Model retraining done...
Generating comparison report...
Training on:  data/training_data.json  which has  1000  data-points
Trained model exists. Skipping training again.
Evaluating on  15731  data-points
Evaluating on  15731  data-points
--------