## ValidMind Python Client Introduction

The ValidMind Python client allows model developers and validators to automatically document different aspects of the model development lifecycle. 

For modelers, the client provides the following high level features:

- Log qualitative data about the model's conceptual soundness
- Log information about datasets and models
- Log training and evaluation metrics about datasets and models
- Run data quality checks
- Run model evaluation tests

For validators, the client also provides (TBD) the ability to effectively challenge the model's performance according to its objective, use case and specific project's requirements.

### Notebook Requirements

- This notebook and the ValidMind client must be executed on an environment running Python >= 3.8.

### Installing the client library

While we finish the process of making the library publicly accessible `pip`, it can be installed with the following command that will direct `pip` to the S3 bucket that contains the latest version of the client.


In [None]:
# !pip install https://vmai.s3.us-west-1.amazonaws.com/sdk/validmind-0.4.1-py3-none-any.whl

In [None]:
# !pip install shap

In [None]:
# Quick hack to load local SDK code
import os

os.chdir(os.path.join(os.getcwd(), ".."))

# Load API key and secret from environment variables
from dotenv import load_dotenv
load_dotenv()

## Creating a new project

Before we test the client library with a dataset and a model, we need to create a new project on the ValidMind dashboard:

- Navigate to the dashboard and click on the "Create new Project" button
- Provide a name and description for the project
- Select a model use case
- For modeling objective, we only support automated documentation of `Binary Clasification` models at the moment

After creating the project you will be provided with client library setup instructions. We have provided similar instructions below.

### Initializing the client library

Every validation project in the ValidMind dashboard has an associated `project identifier`. In order to initialize the client, we need to provide the following arguments:

- `project`: project identifier. The project identifier can be found in the dashboard URL when navigating to a project page, e.g. for `/projects/cl1jyvh2c000909lg1rk0a0zb` the project identifier is `cl1jyvh2c000909lg1rk0a0zb`
- `api_host`: Location of the ValidMind API. This value is already set on this notebook.
- `api_key`: Account API key. This can be found in the settings page in the ValidMind dashboard
- `api_secret`: Account Secret key. Also found in the settings page in the ValidMind dashboard

In [None]:
api_host="https://api.test.vm.validmind.ai/api/v1/tracking"

In [None]:
project='cl61pr35n001sv88h5hgws6bq'

In [None]:
# api_key = input()
api_key = 'e22b89a6b9c2a27da47cb0a09febc001'

In [None]:
# api_secret = input()
api_secret = 'a61be901b5596e3c528d94231e4a3c504ef0bb803d16815f8dfd6857fac03e57'

We can now initialize the client library with the `vm.init` function:

In [None]:
import validmind as vm

vm.init(
    # api_host=api_host,
    api_key=api_key,
    api_secret=api_secret,
    project="cl649obbs0000aa8hkbbg7fhv"
)

In [None]:
# Necessary imports for training our demo models
import pandas as pd
import xgboost as xgb

from sklearn.metrics import accuracy_score
from sklearn.model_selection import train_test_split

%matplotlib inline

## ValidMind Client Library Functions

As of version `0.4.x` of the client library, the following logging and testing functions are available:

|Function|Description|
|-|-|
|`log_metadata`|Logs free-form metadata text for a given content ID in the model documentation|
|`log_dataset`|Analyzes a dataset and logs its description, column definitions and summary statistics|
|`run_dataset_tests`|Runs dataset quality tests on the input dataset|
|`log_model`|Logs information about a model's framework, architecture, target objective and training parameters|
|`log_training_metrics`|Extracts and logs training metrics from a pre-trained model|
|`run_model_tests`|Runs model evaluation tests according to the model objective, use case and specific validation requirements|

In the example model training code in this notebook, we will demonstrate each of the documented client library functions.

### `log_metadata`

Logs free-form metadata text for a given content ID in the model documentation.

Arguments:

- `content_id`: Content ID of the model documentation. This is a unique identifier generated by the ValidMind dashboard
- `text`: Free-form text to be logged. A text template can be specified in combination with `extra_json` (see below)
- `extra_json`: (TBD support for this) JSON object containing variables to be substituted in the text template

The list of available `content_id`s for documentation will be provided in a future release.

### `log_dataset`

Analyzes a dataset and logs its description, column definitions and summary statistics. The following information is extracted from the dataset:

- Descriptive statistics for numerical and categorical columns
- Histograms and value counts for summarizing distribution of values
- Pearson correlation matrix for numerical columns
- Corelation plots for top 15 correlated features

Arguments:

- `dataset`: Input dataset. Only Pandas DataFrames are supported at the moment
- `dataset_type`: Type of dataset, e.g. `training`, `test`, `validation`. Value needs to be set to `training` for now
- `analyze`: Boolean flag indicating whether to analyze and extract descriptive statistics for the dataset (including computing correlations)
- `targets`: `vm.DatasetTargets` describing the label column and its values
- `features`: Optional list of properties to specify for some features in the dataset

Returns:

- `vm.Dataset`: VM dataset object that can be provided to `run_dataset_tests`

### `run_dataset_tests`

Runs the dataset quality tests on the input dataset:

- Class imbalance test on target column
- Duplicate rows and duplicates based on primary key
- High cardinality test on categorical columns
- Missing values
- Highly correlated column pairs
- Skewness test
- Zeros test (columns with too many zeros)

Arguments: 

- `df`: Input dataset. Only Pandas DataFrames are supported at the moment
- `dataset_type`: Type of dataset, e.g. `training`, `test`, `validation`. Value needs to be set to `training` for now
- `vm_dataset`: `vm.Dataset` object returned by `log_dataset`
- `send`: Boolean flag indicating whether to send the test results to the ValidMind dashboard. This flag is useful for debugging purposes

Returns:

- `results`: List of test results

### `log_model`

Logs the following information about a model:

- Model framework and architecture (e.g. XGBoost, Random Forest, Logistic Regression, etc.)
- Model task details (e.g. binary classification, regression, etc.)
- Model hyperparameters (e.g. number of trees, max depth, etc.)

Arguments:

- `model_instance`: Trained model instance. Only Scikit-learn interface compatible models are supported at the moment

### `log_training_metrics`

Extracts and logs training metrics from a pre-trained model. The extracted metrics are dependent on the configured evaluation metrics setup by the model developer when training the model.

Arguments:

- `model`: Trained model instance. Only Scikit-learn interface compatible models are supported at the moment
- `x_train`: Training dataset features
- `y_train`: Training dataset labels

### `run_model_tests`

Runs model evaluation tests according to the model objective, use case and specific validation requirements. The following tests are available for binary classification models at the moment:

- Accuracy score
- Precision score
- Recall score
- F1 score
- ROC AUC score
- ROC AUC curve
- Confusion matrix
- Precision Recall curve
- Permutation feature importance
- SHAP global importance

Arguments:

- `model`: Trained model instance. Only Scikit-learn interface compatible models are supported at the moment
- `x_test`: Test dataset features
- `y_test`: Test dataset labels
- `send`: Boolean flag indicating whether to send the test results to the ValidMind dashboard. This flag is useful for debugging purposes

## Training an Example Model

We'll now train an example model to demonstrate the ValidMind client library functions. The following demo datasets are available to use, and on this notebook we'll train a model for the Bank Customer Churn dataset.

In [None]:
# Bank Customer Churn Dataset
churn_dataset = pd.read_csv("https://vmai.s3.us-west-1.amazonaws.com/datasets/bank_customer_churn.csv")

# Health Insurance Cross-Sell Dataset
insurance_dataset = pd.read_csv("https://vmai.s3.us-west-1.amazonaws.com/datasets/health_insurance_cross_sell.csv")

### `log_dataset`

After loading the dataset, we can log metadata and summary statistics for it using `log_dataset`. Note that the `log_dataset` function expects a `targets` definition. Additional information about columns can be provided with the `features` argument.

In [None]:
churn_targets = vm.DatasetTargets(
    target_column="Exited",
    class_labels={
        "0": "Did not exit",
        "1": "Exited",
    }
)

churn_features = [
    {
        "id": "id",
        "type_options": {
            "primary_key": True,
        }
    }
]

churn_vm_dataset = vm.log_dataset(
    churn_dataset,
    "training",
    analyze=True,
    targets=churn_targets,
    features=churn_features
)

### `run_dataset_tests`

With the returned `vm.Dataset` object by `log_dataset`, we can run the dataset quality tests on it using `run_dataset_tests`:

In [None]:
churn_results = vm.run_dataset_tests(
    churn_dataset,
    dataset_type="training",
    vm_dataset=churn_vm_dataset,    
    send=True,
)

After running `log_dataset` and `run_dataset_tests`, we can open the ValidMind dashboard on the following section to verify that the dataset and its data quality checks have been documented correctly:

`Dashboard -> Project Overview -> Documentation -> Data Description`

### Preparing the training dataset

We are now going to preprocess and prepare our training, validation and test datasets so we can train an example model and evaluate its performance.

In [None]:
def preprocess_churn_dataset(df):
    # Drop columns with no correlation to target
    df.drop(["RowNumber", "CustomerId", "Surname", "CreditScore"], axis=1, inplace=True)

    # Encode binary features
    genders = {"Male": 0, "Female": 1}
    df.replace({"Gender": genders}, inplace=True)

    # Encode categorical features
    df = pd.concat([df, pd.get_dummies(df["Geography"], prefix="Geography")], axis=1)
    df.drop("Geography", axis=1, inplace=True)

    return df

In [None]:
preprocessed_churn = preprocess_churn_dataset(churn_dataset)

In [None]:
def train_val_test_split_dataset(df):
    train_df, test_df = train_test_split(df, test_size=0.20)

    # This guarantees a 60/20/20 split
    train_ds, val_ds = train_test_split(train_df, test_size=0.25)

    # For training
    x_train = train_ds.drop("Exited", axis=1)
    y_train = train_ds.loc[:, "Exited"].astype(int)
    x_val = val_ds.drop("Exited", axis=1)
    y_val = val_ds.loc[:, "Exited"].astype(int)

    # For testing
    x_test = test_df.drop("Exited", axis=1)
    y_test = test_df.loc[:, "Exited"].astype(int)

    return x_train, y_train, x_val, y_val, x_test, y_test

In [None]:
x_train, y_train, x_val, y_val, x_test, y_test = train_val_test_split_dataset(preprocessed_churn)

In [None]:
def train_churn_dataset(x_train, y_train, x_val, y_val):
    xgb_model = xgb.XGBClassifier(early_stopping_rounds=10)

    xgb_model.set_params(
        eval_metric=["error", "logloss", "auc"],
    )    

    xgb_model.fit(
        x_train,
        y_train,
        eval_set=[(x_train, y_train), (x_val, y_val)],
        verbose=False,
    )
    return xgb_model

In [None]:
xgb_model = train_churn_dataset(x_train, y_train, x_val, y_val)

In [None]:
def model_accuracy(model, x, y):
    y_pred = model.predict_proba(x)[:, -1]
    predictions = [round(value) for value in y_pred]
    accuracy = accuracy_score(y, predictions)

    print(f"Accuracy: {accuracy}")    

In [None]:
model_accuracy(xgb_model, x_val, y_val)

### `log_model`

With our new trained model we can now call `log_model` and log its framework, architecture, target objective and training parameters.

In [None]:
vm.log_model(xgb_model)

### `log_training_metrics`

Our example model was trained with the following evaluation metrics: "error", "logloss", "auc". We can use `log_training_metrics` to log these metrics on the ValidMind dashboard.

In [None]:
vm.log_training_metrics(xgb_model, x_train, y_train)

After running `log_model` and `log_training_metrics`, we can open the ValidMind dashboard on the following section to verify that the model training summary has been documented correctly:

`Dashboard -> Project Overview -> Documentation -> Model Development -> Model Training`

### `run_model_tests`

Finally, after training our model, we can run the model evaluation tests on it using `run_model_tests`:

In [None]:
eval_results = vm.run_model_tests(xgb_model, x_test, y_test, send=True)

After running `run_model_tests`, we can open the ValidMind dashboard on the following sections to verify that the model evaluation test results have been logged correctly:

- `Dashboard -> Project Overview -> Documentation -> Model Development -> Model Evaluation`
- `Dashboard -> Project Overview -> Documentation -> Model Development -> Model Explainability and Interpretability`