# Library intro demo

## ValidMind Python Client Introduction

The ValidMind Python client allows model developers and validators to automatically document different aspects of the model development lifecycle. 

For modelers, the client provides the following high level features:

- Log qualitative data about the model's conceptual soundness
- Log information about datasets and models
- Log training and evaluation metrics about datasets and models
- Run data quality checks
- Run model evaluation tests

For validators, the client also provides (TBD) the ability to effectively challenge the model's performance according to its objective, use case and specific project's requirements.

### Notebook Requirements

- This notebook and the ValidMind client must be executed on an environment running Python >= 3.8.

### Installing the client library

While we finish the process of making the library publicly accessible `pip`, it can be installed with the following command that will direct `pip` to the S3 bucket that contains the latest version of the client.


In [1]:
# Load API key and secret from environment variables
from dotenv import load_dotenv
load_dotenv('./env')

True

## Creating a new project

Before we test the client library with a dataset and a model, we need to create a new project on the ValidMind dashboard:

- Navigate to the dashboard and click on the "Create new Project" button
- Provide a name and description for the project
- Select a model use case
- For modeling objective, we only support automated documentation of `Binary Clasification` models at the moment

After creating the project you will be provided with client library setup instructions. We have provided similar instructions below.

### Initializing the client library

Every validation project in the ValidMind dashboard has an associated `project identifier`. In order to initialize the client, we need to provide the following arguments:

- `project`: project identifier. The project identifier can be found in the dashboard URL when navigating to a project page, e.g. for `/projects/cl1jyvh2c000909lg1rk0a0zb` the project identifier is `cl1jyvh2c000909lg1rk0a0zb`
- `api_host`: Location of the ValidMind API. This value is already set on this notebook.
- `api_key`: Account API key. This can be found in the settings page in the ValidMind dashboard
- `api_secret`: Account Secret key. Also found in the settings page in the ValidMind dashboard

In [2]:
# Lookup your own project id
# project='cla6walda00001wl6pdzagu9v'
project='clar3ppjg000f1gmikrfmkld6'

We can now initialize the client library with the `vm.init` function:

In [3]:
import validmind as vm

vm.init(
    project=project
)

True

In [4]:
# Necessary imports for training our demo models
import pandas as pd
import xgboost as xgb

from sklearn.metrics import accuracy_score
from sklearn.model_selection import train_test_split

%matplotlib inline

## ValidMind Client Library Functions

As of version `0.8.x` of the client library, the following logging and testing functions are available:

|Function|Description|
|-|-|
|`log_dataset`|Analyzes a dataset and logs its description, column definitions and summary statistics|
|`run_dataset_tests`|Runs dataset quality tests on the input dataset|
|`analyze_dataset`|Analyzes a dataset, computes summary statistics and runs data quality tests. This function combines `log_dataset` and `run_dataset_tests`|
|`log_model`|Logs information about a model's framework, architecture, target objective and training parameters|
|`log_training_metrics`|Extracts and logs training metrics from a pre-trained model|
|`evaluate_model`|Extracts metadata and metrics from a train model instances and runs model evaluation tests according to the model objective, use case and specific validation requirements. This function combines `log_model`, `log_training_metrics` and an additional set of preconfigured model evaluation tests|



In the example model training code in this notebook, we will demonstrate each of the documented client library functions.

### `analyze_dataset`

Analyzes a dataset and logs its description, column definitions and summary statistics. The following information is extracted from the dataset:

- Descriptive statistics for numerical and categorical columns
- Histograms and value counts for summarizing distribution of values
- Pearson correlation matrix for numerical columns
- Corelation plots for top 15 correlated features

Additionally, it will run a collection of data quality tests such as:

- Class imbalance test on target column
- Duplicate rows and duplicates based on primary key
- High cardinality test on categorical columns
- Missing values
- Highly correlated column pairs
- Skewness test
- Zeros test (columns with too many zeros)

Arguments:

- `dataset`: Input dataset. Only Pandas DataFrames are supported at the moment
- `dataset_type`: Type of dataset, e.g. `training`, `test`, `validation`. Value needs to be set to `training` for now
- `targets`: `vm.DatasetTargets` describing the label column and its values
- `features`: Optional list of properties to specify for some features in the dataset

Returns:

- `results`: List of data quality test results

### `evaluate_model`

Logs the following information about a model:

- Model framework and architecture (e.g. XGBoost, Random Forest, Logistic Regression, etc.)
- Model task details (e.g. binary classification, regression, etc.)
- Model hyperparameters (e.g. number of trees, max depth, etc.)
- Model performance metrics from training, validation and test dataset

Additionally, this function runs model evaluation tests according to the model objective, use case and specific validation requirements. The following tests are available for binary classification models at the moment:

- Accuracy score
- Precision score
- Recall score
- F1 score
- ROC AUC score
- ROC AUC curve
- Confusion matrix
- Precision Recall curve
- Permutation feature importance
- SHAP global importance

Arguments:

- `model`: Trained model instance. Only Scikit-learn interface compatible models are supported at the moment
- `train_set`: Training dataset tuple (x_train, y_train)
- `val_set`: Validation dataset tuple (x_val, y_val)
- `test_set`: Test dataset tuple (x_test, y_test)

## Training an Example Model

We'll now train an example model to demonstrate the ValidMind client library functions. The following demo datasets are available to use, and on this notebook we'll train a model for the Bank Customer Churn dataset.

In [5]:
# Bank Customer Churn Dataset
churn_dataset = pd.read_csv("https://vmai.s3.us-west-1.amazonaws.com/datasets/bank_customer_churn.csv")

# Health Insurance Cross-Sell Dataset
insurance_dataset = pd.read_csv("https://vmai.s3.us-west-1.amazonaws.com/datasets/health_insurance_cross_sell.csv")

In [6]:
churn_dataset2 = pd.read_csv("https://gist.githubusercontent.com/mehdi0501/5b9e64b51ed3bbddbe8f018fc7caf626/raw/ee9b21e5f5308299eb5f4d9dd251bc1b9c5ecc85/churn_test.csv")

In [7]:
churn_dataset2.head()

Unnamed: 0,RowNumber,CustomerId,Surname,CreditScore,Geography,Gender,Age,Tenure,Balance,NumOfProducts,HasCrCard,IsActiveMember,EstimatedSalary,Exited
0,1,15634602,Hargrave,619,France,Female,42,2,0.0,1,1,1,101348.88,1
1,2,15647311,Hill,608,Spain,Female,41,1,83807.86,1,0,1,112542.58,0
2,3,15619304,Onio,502,France,Female,42,8,159660.8,3,1,0,113931.57,1
3,4,15701354,Boni,699,France,Female,39,1,0.0,2,0,0,93826.63,0
4,5,15737888,Mitchell,850,Spain,Female,43,2,125510.82,1,1,1,79084.1,0


In [8]:
churn_dataset.info()

<class 'pandas.core.frame.DataFrame'>
RangeIndex: 8000 entries, 0 to 7999
Data columns (total 14 columns):
 #   Column           Non-Null Count  Dtype  
---  ------           --------------  -----  
 0   RowNumber        8000 non-null   int64  
 1   CustomerId       8000 non-null   int64  
 2   Surname          8000 non-null   object 
 3   CreditScore      8000 non-null   int64  
 4   Geography        8000 non-null   object 
 5   Gender           8000 non-null   object 
 6   Age              8000 non-null   int64  
 7   Tenure           8000 non-null   int64  
 8   Balance          8000 non-null   float64
 9   NumOfProducts    8000 non-null   int64  
 10  HasCrCard        8000 non-null   int64  
 11  IsActiveMember   8000 non-null   int64  
 12  EstimatedSalary  8000 non-null   float64
 13  Exited           8000 non-null   int64  
dtypes: float64(2), int64(9), object(3)
memory usage: 875.1+ KB


In [None]:
churn_dataset.describe()

We can now go to `Project Overview -> Documentation -> Model Overview` and verify this content has been populated on the dashboard.

### `analyze_dataset`

After loading the dataset, we can log metadata and summary statistics, and run data quality checks for it using `analyze_dataset`. Note that the `analyze_dataset` function expects a `targets` definition. Additional information about columns can be provided with the `features` argument.

In [9]:
churn_targets = vm.DatasetTargets(
    target_column="Exited",
    class_labels={
        "0": "Did not exit",
        "1": "Exited",
    }
)

churn_features = [
    {
        "id": "RowNumber",
        "type_options": {
            "primary_key": True,
        }
    }
]

analyze_results = vm.analyze_dataset(
    dataset=churn_dataset,
    dataset_type="training",
    targets=churn_targets,
    features=churn_features
)

Analyzing dataset...
Pandas dataset detected.
Inferring dataset types...
Preparing in-memory dataset copy...
Calculating field statistics...
Calculating feature correlations...
Generating correlation plots...
Successfully logged dataset metadata and statistics.
Running data quality tests...
Running data quality tests for "training" dataset...

Preparing dataset for tests...
Preparing in-memory dataset copy...


100%|██████████| 6/6 [00:00<00:00, 22.63it/s]



Test suite has completed.
Sending results to ValidMind...
Successfully logged test results for test: class_imbalance
Successfully logged test results for test: duplicates
Successfully logged test results for test: cardinality
Successfully logged test results for test: missing
Successfully logged test results for test: skewness
Successfully logged test results for test: zeros

Summary of results:

Test             Passed      # Passed    # Errors    % Passed
---------------  --------  ----------  ----------  ----------
class_imbalance  True               1           0         100
duplicates       True               2           0         100
cardinality      False              6           1     85.7143
missing          True              14           0         100
skewness         False              6           1     85.7143
zeros            False              0           2           0



After running `analyze_dataset`, we can open the ValidMind dashboard on the following section to verify that the dataset and its data quality checks have been documented correctly:

`Dashboard -> Project Overview -> Documentation -> Data Description`

### Preparing the training dataset

We are now going to preprocess and prepare our training, validation and test datasets so we can train an example model and evaluate its performance.

In [None]:
def preprocess_churn_dataset(df):
    # Drop columns with no correlation to target
    df.drop(["RowNumber", "CustomerId", "Surname", "CreditScore"], axis=1, inplace=True)

    # Encode binary features
    genders = {"Male": 0, "Female": 1}
    df.replace({"Gender": genders}, inplace=True)

    # Encode categorical features
    df = pd.concat([df, pd.get_dummies(df["Geography"], prefix="Geography")], axis=1)
    df.drop("Geography", axis=1, inplace=True)

    return df

In [None]:
preprocessed_churn = preprocess_churn_dataset(churn_dataset)

In [None]:
def train_val_test_split_dataset(df):
    train_df, test_df = train_test_split(df, test_size=0.20)

    # This guarantees a 60/20/20 split
    train_ds, val_ds = train_test_split(train_df, test_size=0.25)

    # For training
    x_train = train_ds.drop("Exited", axis=1)
    y_train = train_ds.loc[:, "Exited"].astype(int)
    x_val = val_ds.drop("Exited", axis=1)
    y_val = val_ds.loc[:, "Exited"].astype(int)

    # For testing
    x_test = test_df.drop("Exited", axis=1)
    y_test = test_df.loc[:, "Exited"].astype(int)

    return x_train, y_train, x_val, y_val, x_test, y_test

In [None]:
x_train, y_train, x_val, y_val, x_test, y_test = train_val_test_split_dataset(preprocessed_churn)

In [None]:
def train_churn_dataset(x_train, y_train, x_val, y_val):
    xgb_model = xgb.XGBClassifier(early_stopping_rounds=10)

    xgb_model.set_params(
        eval_metric=["error", "logloss", "auc"],
    )    

    xgb_model.fit(
        x_train,
        y_train,
        eval_set=[(x_train, y_train), (x_val, y_val)],
        verbose=False,
    )
    return xgb_model

In [None]:
xgb_model = train_churn_dataset(x_train, y_train, x_val, y_val)

In [None]:
def model_accuracy(model, x, y):
    y_pred = model.predict_proba(x)[:, -1]
    predictions = [round(value) for value in y_pred]
    accuracy = accuracy_score(y, predictions)

    print(f"Accuracy: {accuracy}")    

In [None]:
model_accuracy(xgb_model, x_val, y_val)

### `evaluate_model`

Finally, after training our model, we can log its model parameters, collect performance metrics and run model evaluation tests on it using `evaluate_model`:

In [None]:
eval_results = vm.evaluate_model(
    xgb_model,
    train_set=(x_train, y_train),
    val_set=(x_val, y_val),
    test_set=(x_test, y_test)
)

After running `evaluate_model`, we can open the ValidMind dashboard on the following sections to verify that the model evaluation test results have been logged correctly:

- `Dashboard -> Project Overview -> Documentation -> Model Development -> Model Evaluation`
- `Dashboard -> Project Overview -> Documentation -> Model Development -> Model Explainability and Interpretability`