# Quickstart: Adding a Virtual Model and Ingesting Predictions and Influences 
This notebook presents an example of ingesting local datasets and model-computed predictions with feature influences via the Python SDK.

Data to be ingested include:

- Input Data
- Label Data
- Prediction Data for provided virtual model
- Feature Influence Data for provided virtual model

## Before you begin
* Install the [TruEra Python SDK](client-installation.md)
* Read [Ingesting a Virtual Model](model_ingestion_guide_virtual.md)

## Set your TruEra URL and authentication token

- Provide your TruEra deployment URL. Free users will use `https://app.truera.net`
- Provide your Authentication Token, available [here](https://app.truera.net/home/p?modal=workspaceSettings&selectedTab=authentication)
- Create your TruEra workspace object! 

In [None]:
# FILL ME!
TRUERA_URL = "https://app.truera.net"
AUTH_TOKEN = ""

## Install packages


In [1]:
! pip install truera
! pip install s3fs

[0mLooking in indexes: https://pypi.org/simple, https://us-python.pkg.dev/colab-wheels/public/simple/
Collecting truera
  Using cached truera-11.5.6-py3-none-any.whl (860 kB)
Collecting dynaconf==3.1.11 (from truera)
  Using cached dynaconf-3.1.11-py2.py3-none-any.whl (211 kB)
Collecting grpcio-status>=1.50.0 (from truera)
  Using cached grpcio_status-1.54.0-py3-none-any.whl (5.1 kB)
Collecting importlib-metadata>=4.8.1 (from truera)
  Using cached importlib_metadata-6.6.0-py3-none-any.whl (22 kB)
Collecting protobuf>=4.21.12 (from truera)
  Using cached protobuf-4.23.0-cp37-abi3-manylinux2014_x86_64.whl (304 kB)
[0mInstalling collected packages: protobuf, importlib-metadata, dynaconf, grpcio-status, truera
  Attempting uninstall: protobuf
    Found existing installation: protobuf 3.20.3
    Uninstalling protobuf-3.20.3:
      Successfully uninstalled protobuf-3.20.3
  Attempting uninstall: grpcio-status
    Found existing installation: grpcio-status 1.48.2
    Uninstalling grpcio-st

## Connect to your TruEra Endpoint

In [None]:
from truera.client.truera_workspace import TrueraWorkspace
from truera.client.truera_authentication import TokenAuthentication

auth = TokenAuthentication(AUTH_TOKEN)
tru = TrueraWorkspace(TRUERA_URL, auth)

tru.set_environment("remote") # now we're ready to add projects remotely!

## Adding a Project with Sample Data and Model

Based on datasets for [Census Income](https://archive.ics.uci.edu/ml/datasets/adult), we have made this sample project available on a public s3.

The **Census Income** project, used throughout this quickstart tutorial, includes a formatted version of the data to illustrate the data ingestion process. For other frameworks, the process is similar.

Content in the census_income folder comprises:

- **quickstart_model.pkl** – Pickled Python model for quickstart
- **data_raw.csv** – training data, pre-transformed data (human-readable)
- **data_num.csv** – data in model-readable form
- **label.csv** – single-column containing ground truth labels
- **extra_data.csv** – used for defining segments
- **feature_influence.csv** - feature influence data
- **predictions.csv** - prediction from model computed locally

Next, to upload the data to TruEra, you'll need to:

1. Create a TruEra project
2. Define a data collection
3. Create Background split for feature influence
4. Add a virtual model
5. Add split data, labels, predictions and feature influences
6. Start using TruEra Diagnostics

### Step 1. Create a TruEra project

In [None]:
tru.add_project("AdultCensus_DemoNB_local_ingestion", score_type="probits")

### Step 2. Define a data collection

In [None]:
tru.add_data_collection("demo_data_collection")

Now for data to upload, we'll load all required data as different Dataframes

In [None]:
import os
import pandas as pd
import numpy as np

s3_folder = "s3://truera-examples/data/census_income/" # path where you download the census_income quickstart data

data = pd.read_csv(s3_folder + "data_num.csv")
labels = pd.read_csv(s3_folder + "label.csv")
predictions = pd.read_csv(s3_folder + "predictions.csv")
# Feature Influences can be computed using local explainer
feature_influence = pd.read_csv(s3_folder + "feature_influence.csv")

### Step 3. Create Background split for Feature Influence

In [None]:
tru.add_data_split("background_split", data, id_col_name="id")

### Step 4. Add a virtual model

In [None]:
model_name = "quickstart_demo"
tru.add_model(model_name)

### Step 5. Add split data, labels, predictions and feature influences

In [None]:
tru.add_data_split("demo-all", 
                   data, 
                   label_data=labels, 
                   prediction_data=predictions,
                   feature_influence_data=feature_influence, 
                   split_type="all", 
                   id_col_name="id")

### Validation

Post upload you should be able to validate the data by making following calls:

**To validate feature influence**

In [None]:
tru.get_feature_influences()

**To validate predictions**

In [None]:
tru.get_ys_pred()

**To validate label data**

In [None]:
tru.get_ys()