# Download and Install Truera Python Client
Download Python wheel from [Downloads](/downloads) page.
Install the wheel in your Python environment using `pip install truera-*.whl`



# Connect to Truera endpoint
 * Provide your Truera deployment URI as connection string.
 * Provide your username and password, example is provided for basic auth.
 * TrueraWorkspace creation will also verify the connectivity to Truera services.

In [3]:
from truera.client.truera_workspace import TrueraWorkspace
from truera.client.truera_authentication import BasicAuthentication
# Change to actual URL here.
connection_string = "http://4rtvnbgbwbkvw.westus.azurecontainer.io/"
auth = BasicAuthentication("user1", "truera2021")
tru = TrueraWorkspace(connection_string, auth)

In [2]:
!pip install /home/apoorv/Downloads/truera-2.9.0-py3-none-any.whl

Processing ./truera-2.9.0-py3-none-any.whl
Collecting pyyaml>=5.3.1
  Downloading PyYAML-6.0-cp37-cp37m-manylinux_2_5_x86_64.manylinux1_x86_64.manylinux_2_12_x86_64.manylinux2010_x86_64.whl (596 kB)
     |████████████████████████████████| 596 kB 4.3 MB/s            
[?25hCollecting cloudpickle>=1.2.2
  Using cached cloudpickle-2.0.0-py3-none-any.whl (25 kB)
Collecting python-dateutil>=2.8.1
  Using cached python_dateutil-2.8.2-py2.py3-none-any.whl (247 kB)
Collecting requests>=2.25.0
  Using cached requests-2.26.0-py2.py3-none-any.whl (62 kB)
Collecting charset-normalizer~=2.0.0
  Downloading charset_normalizer-2.0.7-py3-none-any.whl (38 kB)
Installing collected packages: python-dateutil, charset-normalizer, requests, pyyaml, cloudpickle, truera
  Attempting uninstall: python-dateutil
    Found existing installation: python-dateutil 2.8.0
    Uninstalling python-dateutil-2.8.0:
      Successfully uninstalled python-dateutil-2.8.0
  Attempting uninstall: requests
    Found existing ins

# Download sample project
Download the sample project from [Downloads page](/downloads).
The dataset is a formatted version of the [Census Income](https://archive.ics.uci.edu/ml/datasets/adult) dataset. We'll use a pickled scikit-learn model for the purposes of this quickstart, but the process is similar for most model types.

# Create Project
A project is a collection of models and datasets solving a single problem statement.
Users can be provided access to collaborate on a project.

In [4]:
tru.set_environment("remote")
tru.set_project("AdultCensus_DemoNB", create=True, score_type="logits")
tru.get_projects()

['AdultCensus_DemoNB']

# Adding a Data Collection
A data collection is a container for two related things:

* Data splits: A set of in-sample data (train, test, validate) or out-of-sample (OOS) / out-of-time (OOT) data to test model quality, stability and generalizability.
* Feature Metadata: An (optional) set of metadata defining the set of features for a set of splits and the various models trained and evaluated on them. This allows you to group features and provide feature descriptions for use throughout the tool.

Note that all splits associated with a data collection are assumed to follow the same set of features. As a general rule of thumb, if a model can read one split in a data collection it should be able to read all other splits in the data collection.

In [6]:
tru.set_data_collection("demo_data_collection", create=True)

# Train a sample model
As an illustration we train an scikit-learn `GradientBoostingClassifier` model on pre-processed data here.


In [7]:
import pandas as pd 
import numpy as np 

from sklearn.ensemble import GradientBoostingClassifier
from sklearn.model_selection import train_test_split

# Change here to point to local directory where the data is saved.
FOLDER="/home/apoorv/Downloads/quickstart_data"

X = pd.read_csv(f"{FOLDER}/data_num.csv")
Y = pd.read_csv(f"{FOLDER}/label.csv", header=None)
X_train, X_test, y_train, y_test = train_test_split(X, Y, test_size=0.4, random_state=0)
model = GradientBoostingClassifier(n_estimators=50, max_depth=3, subsample=0.7)
model.fit(X_train, y_train)
model.score(X_test, y_test)

0.86125

# Uploading a split
Now we can upload some data to our data collection to prepare for analyzing the model.
Here we upload the entire data as an "all" split type. We could choose to upload just the train or test datasets as "train" or "test" split types. 
At least one "train" or "all" split is required for generating analysis. You can have 0 or more splits of other kinds. 
You upload a split by providing:
 * A friendly name to indentify the split (required).
 * Input data in the shape the model expects (required). This can be a pandas DataFrame.
 * Labels/target ground-truth values (optional). It is strongly recommended to provide labels when available.

In [8]:
tru.add_data_split("in-sample", X, label_data=Y, split_type="all")

# Uploading the model
This is the last step before we can start analyzing the model in TruEra dashboards.
Model type and dependency versions are automatically inferred from the environment and the model object. A friendly name is provided to be able to find the model in the Truera dashboard and be able to work with it in the future.
The model is automatically attached to the current data collection, set by invoking `set_data_collection`.

In [9]:
model_name = "sklearnGBM_v1"
tru.add_python_model(model_name, model)

Verification Done
Model uploaded to: http://4rtvnbgbwbkvw.westus.azurecontainer.io/p/AdultCensus_DemoNB/m/81781b98-3bb2-4f58-b1c6-d59ec0310c09/


In [11]:
infs = tru.get_feature_influences(0, 100)
infs

Unnamed: 0,age,fnlwgt,capital-gain,capital-loss,hours-per-week,num-education,workclass_?,workclass_Federal-gov,workclass_Local-gov,workclass_Never-worked,...,native-country_Portugal,native-country_Puerto-Rico,native-country_Scotland,native-country_South,native-country_Taiwan,native-country_Thailand,native-country_Trinadad&Tobago,native-country_United-States,native-country_Vietnam,native-country_Yugoslavia
0,-0.568161,0.003536,-0.210042,-0.068645,-0.031769,-0.781751,0.002835,-0.009707,0.0,0.0,...,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.003762,0.0,0.0
1,-0.222809,0.005971,-0.250731,-0.055025,-0.055064,-0.331136,0.002835,-0.005460,0.0,0.0,...,0.0,0.0,0.0,0.0,0.0,0.0,0.0,-0.014016,0.0,0.0
2,-0.738630,0.004248,-0.207387,-0.074809,-0.025108,-0.364825,0.002835,-0.008908,0.0,0.0,...,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.003216,0.0,0.0
3,0.374829,0.005050,-0.226742,-0.051739,-0.048906,-0.336334,0.003271,-0.005259,0.0,0.0,...,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.001214,0.0,0.0
4,0.071810,0.006652,-0.226043,-0.049545,-0.544024,0.546106,0.002835,-0.004503,0.0,0.0,...,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.001031,0.0,0.0
...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...
95,0.372907,0.005068,-0.197569,-0.073845,0.005649,-0.396548,0.003458,-0.009116,0.0,0.0,...,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.003277,0.0,0.0
96,-0.377120,-0.000343,-0.208799,-0.080118,-0.006212,-0.424548,0.002835,-0.008629,0.0,0.0,...,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.003701,0.0,0.0
97,0.371649,-0.097063,-0.146637,-0.068560,0.366384,0.822825,0.003520,-0.006331,0.0,0.0,...,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.003519,0.0,0.0
98,0.387358,0.009360,2.753890,-0.055725,0.046280,0.345745,0.003552,-0.004286,0.0,0.0,...,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.003641,0.0,0.0


In [13]:
inputs = tru.get_xs(0, 100)
inputs

Unnamed: 0,age,fnlwgt,capital-gain,capital-loss,hours-per-week,num-education,workclass_?,workclass_Federal-gov,workclass_Local-gov,workclass_Never-worked,...,native-country_Portugal,native-country_Puerto-Rico,native-country_Scotland,native-country_South,native-country_Taiwan,native-country_Thailand,native-country_Trinadad&Tobago,native-country_United-States,native-country_Vietnam,native-country_Yugoslavia
0,-0.849080,0.145996,-0.145920,-0.21666,-0.035429,-1.586158,-0.24445,-0.174295,-0.262097,-0.014664,...,-0.033729,-0.059274,-0.019201,-0.049628,-0.039607,-0.023518,-0.024163,0.340954,-0.045408,-0.022173
1,-0.629143,-0.333415,-0.145920,-0.21666,-0.035429,-0.420060,-0.24445,-0.174295,-0.262097,-0.014664,...,-0.033729,16.870768,-0.019201,-0.049628,-0.039607,-0.023518,-0.024163,-2.932948,-0.045408,-0.022173
2,-1.288956,-0.032823,-0.145920,-0.21666,-0.035429,-0.420060,-0.24445,-0.174295,-0.262097,-0.014664,...,-0.033729,-0.059274,-0.019201,-0.049628,-0.039607,-0.023518,-0.024163,0.340954,-0.045408,-0.022173
3,0.250608,-0.150077,-0.145920,-0.21666,-0.035429,-0.420060,-0.24445,-0.174295,-0.262097,-0.014664,...,-0.033729,-0.059274,-0.019201,-0.049628,-0.039607,-0.023518,-0.024163,0.340954,-0.045408,-0.022173
4,-0.409205,-0.252665,-0.145920,-0.21666,-1.655225,1.134739,-0.24445,-0.174295,-0.262097,-0.014664,...,-0.033729,-0.059274,-0.019201,-0.049628,-0.039607,-0.023518,-0.024163,0.340954,-0.045408,-0.022173
...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...
95,0.250608,-0.111696,-0.145920,-0.21666,-0.035429,-0.420060,-0.24445,-0.174295,-0.262097,-0.014664,...,-0.033729,-0.059274,-0.019201,-0.049628,-0.039607,-0.023518,-0.024163,0.340954,-0.045408,-0.022173
96,-0.702455,-1.243566,-0.145920,-0.21666,-0.035429,-0.420060,-0.24445,-0.174295,-0.262097,-0.014664,...,-0.033729,-0.059274,-0.019201,-0.049628,-0.039607,-0.023518,-0.024163,0.340954,-0.045408,-0.022173
97,0.837109,-1.507331,-0.145920,-0.21666,0.774468,1.523438,-0.24445,-0.174295,3.815376,-0.014664,...,-0.033729,-0.059274,-0.019201,-0.049628,-0.039607,-0.023518,-0.024163,0.340954,-0.045408,-0.022173
98,2.010110,0.076151,1.290064,-0.21666,-0.035429,1.134739,-0.24445,-0.174295,-0.262097,-0.014664,...,-0.033729,-0.059274,-0.019201,-0.049628,-0.039607,-0.023518,-0.024163,0.340954,-0.045408,-0.022173


In [16]:
explainer = tru.get_explainer()
explainer.plot_isps(['age'])