# Guided Exercise: Drift
This is a continuation of part 1. If you missed it:     [![Open In Colab](https://colab.research.google.com/assets/colab-badge.svg)](https://colab.research.google.com/github/truera/truera-examples/blob/release/rc-1.37/starter-examples/starter-drift-part-1.ipynb)

#### Goals 🎯
In this tutorial, you will learn how to:
1. View the results of stability tests set up in part 1.
2. Debug the true cause of stability issues.
3. Retest the new model and confirm the effectiveness of the mitigation strategy.

### First, set the credentials for your TruEra deployment.
If you don't have credentials yet, get them instantly by signing up for free at: https://www.truera.com

In [2]:
# connection details
TRUERA_URL = "https://app.truera.net/"
AUTH_TOKEN = "..."

### Install the required packages for running in colab

In [3]:
! pip install --upgrade truera

### From here, run the rest of the notebook and follow the analysis.

### First, load data and train the in your beach-head market, San Francisco. Also add additional data for Seattle and Austin, your target markets.

In [4]:
import logging
import pandas as pd
import xgboost as xgb
from sklearn import preprocessing
from sklearn.utils import resample

from truera.client.truera_workspace import TrueraWorkspace
from truera.client.truera_authentication import TokenAuthentication
from truera.client.ingestion import ModelOutputContext, ColumnSpec

auth = TokenAuthentication(AUTH_TOKEN)
tru = TrueraWorkspace(TRUERA_URL, auth)

# note: we'll periodically toggle between local and remote so we can interact with our remote deployment as well.

INFO:truera.client.remote_truera_workspace:Connecting to 'http://localhost:8000'


### Test for stability in Seattle and Austin.

In [5]:
# create the first project and data collection
project_name = "Starter Example Companion - Drift"
tru.set_project(project_name)
tru.set_data_collection("Data Collection v1")
tru.get_models()

INFO:truera.client.remote_truera_workspace:Data collection in remote environment is now set to "Data Collection v1". 


['model_1']

In [6]:
# add performance and feature importance tests
tru.tester.add_performance_test(
    test_name="MAE Test",
    all_data_collections=True,
    data_split_name_regex="Seattle",
    metric="MAE",
    reference_split_name="San Francisco",
    fail_if_greater_than=40,
    fail_threshold_type="RELATIVE"
)

In [7]:
tru.tester.get_model_leaderboard(sort_by="performance")

0,1,2,3,4,5,6
Model Name,Train Split Name,Train Parameters,Performance Tests (Failed/Warning/Total),Fairness Tests (Failed/Warning/Total),Stability Tests (Failed/Warning/Total),Feature Importance Tests (Failed/Warning/Total)
model_1,San Francisco,eta: 0.2 max_depth: 4.0 model_type: xgb.XGBRegressor,0 ❌ / 0 ⚠️ / 1,0 ❌ / 0 ⚠️ / 0,2 ❌ / 0 ⚠️ / 2,0 ❌ / 0 ⚠️ / 0


In [8]:
tru.set_model("model_1")
tru.tester.get_model_test_results(test_types=["stability"])

INFO:truera.client.remote_truera_workspace:Setting remote model context to "model_1".


0,1,2,3,4,5,6,7
,Name,Comparison Split,Base Split,Segment,Metric,Score,Navigate
❌,Stability Test - Seattle,Seattle,San Francisco,ALL POINTS,DIFFERENCE_OF_MEAN,-4.5582,Explore in UI
❌,Stability Test - Austin,Austin,San Francisco,ALL POINTS,DIFFERENCE_OF_MEAN,64.0755,Explore in UI


The model fails in Seattle and Austin because the scores drifted too far from the ground truth in the new cities.

In [9]:
explainer = tru.get_explainer("Austin", comparison_data_splits=["San Francisco"])
explainer.find_hotspots(max_num_responses=5)

INFO:truera.client.truera_workspace:Download temp_dir: /var/folders/pg/2f8qcnr92cx4rcwpvm_x2ckc0000gn/T/tmpsxqc3m7a
INFO:truera.client.truera_workspace:Syncing data collection "Data Collection v1" to local.
INFO:truera.client.local.local_truera_workspace:Data collection in local environment is now set to "Data Collection v1". 
INFO:truera.client.truera_workspace:Syncing data split "San Francisco" to local.
INFO:truera.client.local.local_truera_workspace:Data split "San Francisco" is added to local data collection "Data Collection v1", and set as the data split for the workspace context.
INFO:truera.client.truera_workspace:Syncing data split "Austin" to local.
INFO:truera.client.local.local_truera_workspace:Data split "Austin" is added to local data collection "Data Collection v1", and set as the data split for the workspace context.
INFO:truera.client.truera_workspace:Syncing model model_1 to local.
INFO:truera.client.local.local_truera_workspace:Model "model_1" is added and associated

Unnamed: 0,segment_definition,MAE,size,size (%)
0,"_DATA_GROUND_TRUTH in_range [829.0, 999.0]",381.233584,185,2.023185
2,"bathrooms in_range [2.5, 7.0]",174.443792,1055,11.53762
1,"accommodates in_range [6.0, 16.0]",161.83305,2619,28.641732
4,"beds in_range [3.0, 16.0]",155.644171,2636,28.827647
3,"bedrooms in_range [2.0, 10.0]",155.521023,3835,41.94007


The MAE for larger, more expensive houses is high, as represented in the hotspots. Bedrooms are a commonly used proxy for house size and price. This will correlate with all the listed hotspots. Let's resample the San Francisco data we're training on to include an equal proportion of larger listings as Austin.

In [10]:
# load data
san_francisco = pd.read_csv("https://truera-examples.s3.us-west-2.amazonaws.com/data/starter-stability/San_Francisco_for_stability.csv")
seattle = pd.read_csv("https://truera-examples.s3.us-west-2.amazonaws.com/data/starter-stability/Seattle_for_stability.csv")
austin = pd.read_csv("https://truera-examples.s3.us-west-2.amazonaws.com/data/starter-stability/Austin_for_stability.csv")

# make all float and make index ids
san_francisco = san_francisco.astype(float).reset_index(names="id")
seattle = seattle.astype(float).reset_index(names="id")
austin = austin.astype(float).reset_index(names="id")

In [11]:
large_listings = san_francisco[san_francisco["bedrooms"] >= 2]
small_listings = san_francisco[san_francisco["bedrooms"] < 2]

austin_large_listings = austin[austin["bedrooms"] >= 2]
num_samples = int(round((len(austin_large_listings)/len(austin)) * len(san_francisco), 0))

large_listings_resampled = resample(
    large_listings, 
    replace=True,
    n_samples=num_samples,
    random_state=1 # include random seed so we can perform same sampling on each data set
)

san_francisco_resampled = pd.concat([small_listings, large_listings_resampled])

In [12]:
# train new model on resampled sf data
xgb_reg = xgb.XGBRegressor(eta=0.2, max_depth=4)
xgb_reg.fit(san_francisco_resampled.drop(["id", "price"], axis=1), san_francisco_resampled.price)

tru.set_project(project_name)
tru.set_data_collection("Data Collection v1")

# register the model
tru.add_python_model(
    "model_2",
    xgb_reg,
    train_parameters={"model_type": "xgb.XGBRegressor", "eta": 0.2, "max_depth": 4},
    compute_predictions=False
)

INFO:truera.client.remote_truera_workspace:Uploading xgboost model: XGBRegressor
INFO:truera.client.remote_truera_workspace:Verifying model...
INFO:truera.client.remote_truera_workspace:✔️ Verified packaged model format.
INFO:truera.client.remote_truera_workspace:✔️ Loaded model in current environment.
INFO:truera.client.remote_truera_workspace:✔️ Called predict on model.
INFO:truera.client.remote_truera_workspace:✔️ Verified model output.
INFO:truera.client.remote_truera_workspace:Verification succeeded!


Uploading MLmodel (218.0B) -- ### -- file upload complete.
Uploading conda.yaml (208.0B) -- ### -- file upload complete.
Uploading tmpo3k1iwaw.json (174.2KiB) -- ### -- file upload complete.
Uploading xgboost_regression_predict_wrapper.py (459.0B) -- ### -- file upload complete.
Uploading xgboost_regression_predict_wrapper.cpython-310.pyc (1.1KiB) -- ### -- file upload complete.
Put resource done.


INFO:truera.client.remote_truera_workspace:Model "model_2" is added and associated with remote data collection "Data Collection v1". "model_2" is set as the model for the workspace context.


Model uploaded to: http://localhost:8000/home/p/Starter%20Example%20Companion%20-%20Drift/m/model_2/


In [13]:
# predictions
tru.set_project(project_name)
tru.set_data_collection("Data Collection v1")
tru.set_model("model_2")
tru.set_influences_background_data_split("San Francisco")
tru.set_data_split("San Francisco")
sf_preds = tru.get_ys_pred().reset_index(names="id")
tru.set_data_split("Seattle")
se_preds = tru.get_ys_pred().reset_index(names="id")
tru.set_data_split("Austin")
au_preds = tru.get_ys_pred().reset_index(names="id")

tru.add_data(
    data=sf_preds,
    data_split_name="San Francisco",
    column_spec=ColumnSpec(
        id_col_name="id",
        prediction_col_names="__truera_prediction__"
    )
)

tru.add_data(
    data=se_preds,
    data_split_name="Seattle",
    column_spec=ColumnSpec(
        id_col_name="id",
        prediction_col_names="__truera_prediction__"
    )
)

tru.add_data(
    data=au_preds,
    data_split_name="Austin",
    column_spec=ColumnSpec(
        id_col_name="id",
        prediction_col_names="__truera_prediction__"
    )
)

INFO:truera.client.truera_workspace:Download temp_dir: /var/folders/pg/2f8qcnr92cx4rcwpvm_x2ckc0000gn/T/tmpsxqc3m7a
INFO:truera.client.truera_workspace:Syncing data collection "Data Collection v1" to local.
INFO:truera.client.truera_workspace:Syncing model model_2 to local.
INFO:truera.client.local.local_truera_workspace:Model "model_2" is added and associated with local data collection "Data Collection v1". "model_2" is set as the model for the workspace context.
ERROR:truera.client.truera_workspace:Failed to sync explanation cache for model "model_2" and data_split "Seattle" to local: No such data split "Seattle" in data collection "Data Collection v1"!
INFO:truera.client.truera_workspace:Syncing segments groups from remote to local.
INFO:truera.client.local.local_truera_workspace:Setting local model context to "model_1".
INFO:truera.client.local.local_truera_workspace:Setting local model context to "model_2".
INFO:truera.client.truera_workspace:Download temp_dir: /var/folders/pg/2f8

Uploading tmpg2du0td7.parquet (70.7KiB) -- ### -- file upload complete.
Put resource done.


INFO:truera.client.remote_truera_workspace:Waiting for data split to materialize...
INFO:truera.client.remote_truera_workspace:`model_output_context` will be inferred as it was not provided.
INFO:truera.client.remote_truera_workspace:Inferred ModelOutputContext: ModelOutputContext(model_name='model_2', score_type='regression', background_split_name='', influence_type='')


Uploading tmp19qhd4cp.parquet (44.4KiB) -- ### -- file upload complete.
Put resource done.


INFO:truera.client.remote_truera_workspace:Waiting for data split to materialize...
INFO:truera.client.remote_truera_workspace:`model_output_context` will be inferred as it was not provided.
INFO:truera.client.remote_truera_workspace:Inferred ModelOutputContext: ModelOutputContext(model_name='model_2', score_type='regression', background_split_name='', influence_type='')


Uploading tmpisq0h48x.parquet (111.2KiB) -- ### -- file upload complete.
Put resource done.


INFO:truera.client.remote_truera_workspace:Waiting for data split to materialize...


In [14]:
# set model and background split
tru.set_model("model_2")
tru.set_influences_background_data_split("San Francisco")

# influence type
tru.set_influence_type("shap")

# reduce settings for speed
tru.set_num_internal_qii_samples(100)
tru.set_num_default_influences(100)

se_explainer = tru.get_explainer("Seattle")
se_infs = se_explainer.get_feature_influences().reset_index(names="id")

sf_explainer = tru.get_explainer("San Francisco")
sf_infs = sf_explainer.get_feature_influences().reset_index(names="id")

au_explainer = tru.get_explainer("Austin")
au_infs = sf_explainer.get_feature_influences().reset_index(names="id")

model_output_context = ModelOutputContext(model_name="model_2", score_type="regression", background_split_name="San Francisco", influence_type="kernel-shap")

tru.add_data(
    data=sf_infs,
    data_split_name="San Francisco",
    column_spec=ColumnSpec(
        id_col_name="id",
        feature_influence_col_names=list(sf_infs.columns.drop("id"))
    ),
    model_output_context=model_output_context
)

tru.add_data(
    data=se_infs,
    data_split_name="Seattle",
    column_spec=ColumnSpec(
        id_col_name="id",
        feature_influence_col_names=list(se_infs.columns.drop("id"))
    ),
    model_output_context=model_output_context
)

tru.add_data(
    data=au_infs,
    data_split_name="Austin",
    column_spec=ColumnSpec(
        id_col_name="id",
        feature_influence_col_names=list(se_infs.columns.drop("id"))
    ),
    model_output_context=model_output_context
)

INFO:truera.client.truera_workspace:Download temp_dir: /var/folders/pg/2f8qcnr92cx4rcwpvm_x2ckc0000gn/T/tmpsxqc3m7a
INFO:truera.client.truera_workspace:Syncing data collection "Data Collection v1" to local.
INFO:truera.client.truera_workspace:Syncing segments groups from remote to local.
INFO:truera.client.truera_workspace:Download temp_dir: /var/folders/pg/2f8qcnr92cx4rcwpvm_x2ckc0000gn/T/tmpsxqc3m7a
INFO:truera.client.truera_workspace:Syncing data collection "Data Collection v1" to local.
INFO:truera.client.truera_workspace:Syncing segments groups from remote to local.
INFO:truera.client.truera_workspace:Download temp_dir: /var/folders/pg/2f8qcnr92cx4rcwpvm_x2ckc0000gn/T/tmpsxqc3m7a
INFO:truera.client.truera_workspace:Syncing data collection "Data Collection v1" to local.
INFO:truera.client.truera_workspace:Syncing segments groups from remote to local.


Uploading tmph9_2hcos.parquet (37.8KiB) -- ### -- file upload complete.
Put resource done.


INFO:truera.client.remote_truera_workspace:Waiting for data split to materialize...


Uploading tmpz7u5b4lb.parquet (37.5KiB) -- ### -- file upload complete.
Put resource done.


INFO:truera.client.remote_truera_workspace:Waiting for data split to materialize...


Uploading tmpnj1dr14_.parquet (37.8KiB) -- ### -- file upload complete.
Put resource done.


INFO:truera.client.remote_truera_workspace:Waiting for data split to materialize...


In [15]:
# error influences
model_output_context = ModelOutputContext(model_name="model_2", score_type="mean_absolute_error_for_regression", background_split_name="San Francisco", influence_type="kernel-shap")

tru.set_data_split("San Francisco")
sf_error_infs = tru.get_feature_influences(score_type="mean_absolute_error_for_regression").reset_index(names="id")

tru.set_data_split("Seattle")
se_error_infs = tru.get_feature_influences(score_type="mean_absolute_error_for_regression").reset_index(names="id")

tru.set_data_split("Austin")
au_error_infs = tru.get_feature_influences(score_type="mean_absolute_error_for_regression").reset_index(names="id")

tru.add_data(
    data=sf_error_infs,
    data_split_name="San Francisco",
    column_spec=ColumnSpec(
        id_col_name="id",
        feature_influence_col_names=list(sf_error_infs.columns.drop("id"))
    ),
    model_output_context=model_output_context
)

tru.add_data(
    data=se_error_infs,
    data_split_name="Seattle",
    column_spec=ColumnSpec(
        id_col_name="id",
        feature_influence_col_names=list(se_error_infs.columns.drop("id"))
    ),
    model_output_context=model_output_context
)

tru.add_data(
    data=au_error_infs,
    data_split_name="Austin",
    column_spec=ColumnSpec(
        id_col_name="id",
        feature_influence_col_names=list(se_error_infs.columns.drop("id"))
    ),
    model_output_context=model_output_context
)

INFO:truera.client.truera_workspace:Download temp_dir: /var/folders/pg/2f8qcnr92cx4rcwpvm_x2ckc0000gn/T/tmpsxqc3m7a
INFO:truera.client.truera_workspace:Syncing data collection "Data Collection v1" to local.
INFO:truera.client.truera_workspace:Syncing segments groups from remote to local.


|          | 0.000% [00:00<?]

INFO:truera.client.truera_workspace:Download temp_dir: /var/folders/pg/2f8qcnr92cx4rcwpvm_x2ckc0000gn/T/tmpsxqc3m7a
INFO:truera.client.truera_workspace:Syncing data collection "Data Collection v1" to local.
INFO:truera.client.truera_workspace:Syncing segments groups from remote to local.


|          | 0.000% [00:00<?]

INFO:truera.client.truera_workspace:Download temp_dir: /var/folders/pg/2f8qcnr92cx4rcwpvm_x2ckc0000gn/T/tmpsxqc3m7a
INFO:truera.client.truera_workspace:Syncing data collection "Data Collection v1" to local.
INFO:truera.client.truera_workspace:Syncing segments groups from remote to local.


|          | 0.000% [00:00<?]

Uploading tmpsm_6sjc0.parquet (33.8KiB) -- ### -- file upload complete.
Put resource done.


INFO:truera.client.remote_truera_workspace:Waiting for data split to materialize...


Uploading tmpn6g1aai9.parquet (33.7KiB) -- ### -- file upload complete.
Put resource done.


INFO:truera.client.remote_truera_workspace:Waiting for data split to materialize...


Uploading tmpzg20xp_g.parquet (33.7KiB) -- ### -- file upload complete.
Put resource done.


INFO:truera.client.remote_truera_workspace:Waiting for data split to materialize...


In [16]:
# check drift results
tru.set_model("model_2")
tru.tester.get_model_test_results(test_types=["stability"])

0,1,2,3,4,5,6,7
,Name,Comparison Split,Base Split,Segment,Metric,Score,Navigate
❌,Stability Test - Seattle,Seattle,San Francisco,ALL POINTS,DIFFERENCE_OF_MEAN,-0.9796,Explore in UI
✅,Stability Test - Austin,Austin,San Francisco,ALL POINTS,DIFFERENCE_OF_MEAN,57.3777,Explore in UI


The model now passes in Austin and is ready for production, while it still fails in Seattle. Let's continue to iterate on Seattle.

Since the model errors with scores that are too high, we should look for the largest positive contributors to score drift.

In [17]:
explainer = tru.get_explainer("San Francisco", comparison_data_splits=["Seattle"])
explainer.compute_feature_contributors_to_instability(use_difference_of_means=True).T

INFO:truera.client.truera_workspace:Download temp_dir: /var/folders/pg/2f8qcnr92cx4rcwpvm_x2ckc0000gn/T/tmpsxqc3m7a
INFO:truera.client.truera_workspace:Syncing data collection "Data Collection v1" to local.
INFO:truera.client.truera_workspace:Syncing segments groups from remote to local.


Unnamed: 0,Seattle
availability_90,0.202483
bathrooms,-0.005774
accommodates,-0.001105
bedrooms,0.041468
reviews_per_month,-0.098158
beds,-0.013941
minimum_nights,0.051571
extra_people,-0.033043
availability_365,-0.019861
features_Host_Is_Superhost,-0.010163


Availability_90 is by far the largest positive contributor to score drift in Seattle. Let's remove that feature along with the related feature Availability_365 to mitigate this issue.

In [None]:
# train a new model
xgb_reg = xgb.XGBRegressor(eta=0.2, max_depth=4)
xgb_reg.fit(san_francisco_resampled.drop(["id", "price", "availability_90", "availability_365"], axis=1), san_francisco_resampled.price)

# create the first project and data collection
tru.add_data_collection("Data Collection v2")

In [19]:
# add data to the collection we just created
tru.add_data(
    data=san_francisco,
    data_split_name="San Francisco",
    column_spec=ColumnSpec(
        id_col_name="id",
        pre_data_col_names=list(san_francisco.columns.drop(["id","price","availability_90","availability_365"])),
        label_col_names="price")
)
tru.add_data(
    data=seattle,
    data_split_name="Seattle",
    column_spec=ColumnSpec(
        id_col_name="id",
        pre_data_col_names=list(seattle.columns.drop(["id","price","availability_90","availability_365"])),
        label_col_names="price")
)
tru.add_data(
    data=austin,
    data_split_name="Austin",
    column_spec=ColumnSpec(
        id_col_name="id",
        pre_data_col_names=list(seattle.columns.drop(["id","price","availability_90","availability_365"])),
        label_col_names="price")
)
tru.set_influences_background_data_split("San Francisco")

# register the model
tru.add_python_model(
    "model_3",
    xgb_reg,
    train_parameters={"model_type": "xgb.XGBRegressor", "eta": 0.2, "max_depth": 4},
    compute_predictions=False
)



Uploading tmpdb87_vk2.parquet (224.4KiB) -- ### -- file upload complete.
Put resource done.


INFO:truera.client.remote_truera_workspace:Waiting for data split to materialize...


Uploading tmp_0ig5s9s.parquet (147.9KiB) -- ### -- file upload complete.
Put resource done.


INFO:truera.client.remote_truera_workspace:Waiting for data split to materialize...


Uploading tmpv8fh5lor.parquet (336.8KiB) -- ### -- file upload complete.
Put resource done.


INFO:truera.client.remote_truera_workspace:Waiting for data split to materialize...
INFO:truera.client.remote_truera_workspace:Uploading xgboost model: XGBRegressor
INFO:truera.client.remote_truera_workspace:Verifying model...
INFO:truera.client.remote_truera_workspace:✔️ Verified packaged model format.
INFO:truera.client.remote_truera_workspace:✔️ Loaded model in current environment.
INFO:truera.client.remote_truera_workspace:✔️ Called predict on model.
INFO:truera.client.remote_truera_workspace:✔️ Verified model output.
INFO:truera.client.remote_truera_workspace:Verification succeeded!


Uploading MLmodel (218.0B) -- ### -- file upload complete.
Uploading conda.yaml (208.0B) -- ### -- file upload complete.
Uploading tmpsez_gvyw.json (170.9KiB) -- ### -- file upload complete.
Uploading xgboost_regression_predict_wrapper.py (459.0B) -- ### -- file upload complete.
Uploading xgboost_regression_predict_wrapper.cpython-310.pyc (1.1KiB) -- ### -- file upload complete.
Put resource done.


INFO:truera.client.remote_truera_workspace:Model "model_3" is added and associated with remote data collection "Data Collection v2". "model_3" is set as the model for the workspace context.


Model uploaded to: http://localhost:8000/home/p/Starter%20Example%20Companion%20-%20Drift/m/model_3/


In [20]:
tru.set_model("model_3")

In [21]:
tru.set_data_split("San Francisco")
sf_preds = tru.get_ys_pred().reset_index(names="id")
tru.set_data_split("Seattle")
se_preds = tru.get_ys_pred().reset_index(names="id")
tru.set_data_split("Austin")
au_preds = tru.get_ys_pred().reset_index(names="id")

INFO:truera.client.truera_workspace:Download temp_dir: /var/folders/pg/2f8qcnr92cx4rcwpvm_x2ckc0000gn/T/tmpsxqc3m7a
INFO:truera.client.truera_workspace:Syncing data collection "Data Collection v2" to local.
INFO:truera.client.local.local_truera_workspace:Data collection in local environment is now set to "Data Collection v2". The previous data collection ("Data Collection v1") and its associated data splits and/or models have been cleared from the local environment workspace context.
INFO:truera.client.truera_workspace:Syncing data split "San Francisco" to local.
INFO:truera.client.local.local_truera_workspace:Data split "San Francisco" is added to local data collection "Data Collection v2", and set as the data split for the workspace context.
INFO:truera.client.truera_workspace:Syncing model model_3 to local.
INFO:truera.client.local.local_truera_workspace:Model "model_3" is added and associated with local data collection "Data Collection v2". "model_3" is set as the model for the wor

In [22]:
tru.set_data_split("San Francisco")
sf_preds = tru.get_ys_pred().reset_index(names="id")
tru.set_data_split("Seattle")
se_preds = tru.get_ys_pred().reset_index(names="id")
tru.set_data_split("Austin")
au_preds = tru.get_ys_pred().reset_index(names="id")

tru.add_data(
    data=sf_preds,
    data_split_name="San Francisco",
    column_spec=ColumnSpec(
        id_col_name="id",
        prediction_col_names="__truera_prediction__"
    )
)

tru.add_data(
    data=se_preds,
    data_split_name="Seattle",
    column_spec=ColumnSpec(
        id_col_name="id",
        prediction_col_names="__truera_prediction__"
    )
)

tru.add_data(
    data=au_preds,
    data_split_name="Austin",
    column_spec=ColumnSpec(
        id_col_name="id",
        prediction_col_names="__truera_prediction__"
    )
)

INFO:truera.client.truera_workspace:Download temp_dir: /var/folders/pg/2f8qcnr92cx4rcwpvm_x2ckc0000gn/T/tmpsxqc3m7a
INFO:truera.client.local.local_truera_workspace:Data collection in local environment is now set to "Data Collection v1". The previous data collection ("Data Collection v2") and its associated data splits and/or models have been cleared from the local environment workspace context.
INFO:truera.client.local.local_truera_workspace:Data collection in local environment is now set to "Data Collection v2". The previous data collection ("Data Collection v1") and its associated data splits and/or models have been cleared from the local environment workspace context.
INFO:truera.client.truera_workspace:Syncing data collection "Data Collection v2" to local.
INFO:truera.client.truera_workspace:Syncing segments groups from remote to local.
INFO:truera.client.local.local_truera_workspace:Setting local model context to "model_3".
INFO:truera.client.truera_workspace:Download temp_dir: /v

Uploading tmpv8dpdlym.parquet (70.6KiB) -- ### -- file upload complete.
Put resource done.


INFO:truera.client.remote_truera_workspace:Waiting for data split to materialize...
INFO:truera.client.remote_truera_workspace:`model_output_context` will be inferred as it was not provided.
INFO:truera.client.remote_truera_workspace:Inferred ModelOutputContext: ModelOutputContext(model_name='model_3', score_type='regression', background_split_name='', influence_type='')


Uploading tmpup9hizoi.parquet (44.3KiB) -- ### -- file upload complete.
Put resource done.


INFO:truera.client.remote_truera_workspace:Waiting for data split to materialize...
INFO:truera.client.remote_truera_workspace:`model_output_context` will be inferred as it was not provided.
INFO:truera.client.remote_truera_workspace:Inferred ModelOutputContext: ModelOutputContext(model_name='model_3', score_type='regression', background_split_name='', influence_type='')


Uploading tmp6ys2zsu6.parquet (111.1KiB) -- ### -- file upload complete.
Put resource done.


INFO:truera.client.remote_truera_workspace:Waiting for data split to materialize...


In [23]:
# get the test details from model_2 so we can copy them for model_3
tru.set_model("model_2")
tru.tester.get_model_tests().as_dict()["Stability Tests"]["Rows"]

INFO:truera.client.remote_truera_workspace:Model "model_2" in remote is associated with a different data_collection ("Data Collection v1") than the one in context ("Data Collection v2").
INFO:truera.client.remote_truera_workspace:Data collection in remote environment is now set to "Data Collection v1". The previous data collection ("Data Collection v2") and its associated data splits and/or models have been cleared from the remote environment workspace context.
INFO:truera.client.remote_truera_workspace:Setting remote model context to "model_2".


[['Stability Test - Seattle',
  'Seattle',
  'San Francisco',
  'ALL POINTS',
  'DIFFERENCE_OF_MEAN',
  '',
  'Not specified',
  'DIFFERENCE_OF_MEAN < -142.44841 OR DIFFERENCE_OF_MEAN > -12.44841'],
 ['Stability Test - Austin',
  'Austin',
  'San Francisco',
  'ALL POINTS',
  'DIFFERENCE_OF_MEAN',
  '',
  'Not specified',
  'DIFFERENCE_OF_MEAN < -18.244545 OR DIFFERENCE_OF_MEAN > 61.755455']]

In [24]:
# toggle back to remote to interact with the tester

# check stability results
tru.set_model("model_3")

# let the warn conditions have $50 in wiggle room
tru.tester.add_stability_test(test_name="Stability Test - Seattle - v3",
    base_data_split_name="San Francisco",
    comparison_data_split_names=["Seattle"],
    fail_if_outside=[-142.44841, -12.44841]
)


INFO:truera.client.remote_truera_workspace:Model "model_3" in remote is associated with a different data_collection ("Data Collection v2") than the one in context ("Data Collection v1").
INFO:truera.client.remote_truera_workspace:Data collection in remote environment is now set to "Data Collection v2". The previous data collection ("Data Collection v1") and its associated data splits and/or models have been cleared from the remote environment workspace context.
INFO:truera.client.remote_truera_workspace:Setting remote model context to "model_3".


In [25]:
tru.tester.get_model_test_results(test_types=["stability"])

0,1,2,3,4,5,6,7
,Name,Comparison Split,Base Split,Segment,Metric,Score,Navigate
✅,Stability Test - Seattle - v3,Seattle,San Francisco,ALL POINTS,DIFFERENCE_OF_MEAN,-12.9648,Explore in UI


In v3, the model passes now in Seattle. We can deploy the v2 model in Austin and v3 model in Seattle as we launch and the investors of our startup are satisfied with these results!