# Quickstart - Customer Churn Full Suite Model Documentation

This interactive notebook will guide you through documenting a model using the ValidMind Developer framework. We will use sample datasets provided by the library and train a simple classification model.

For this simple demonstration, we will use the following bank customer churn dataset from Kaggle: https://www.kaggle.com/code/kmalit/bank-customer-churn-prediction/data.

We will train a sample model and demonstrate the following documentation functionalities:

- Initializing the ValidMind Developer Framework
- Using a sample datasets provided by the library to train a simple classification model
- Running a test suite to quickly generate document about the data and model

## Install ValidMind Developer Framework


In [1]:
%pip install -q validmind

Note: you may need to restart the kernel to use updated packages.


## Initializing the Python environment

In [2]:
import pandas as pd
import xgboost as xgb

from sklearn.metrics import accuracy_score
from sklearn.model_selection import train_test_split

%matplotlib inline

## Initializing the ValidMind Client Library

Log in to the ValidMind platform with your registered email address, and navigate to the Documentation Projects page.

### Creating a new Documentation Project 

***(Note: if a documentation project has already been created, you can skip this section and head directly "Finding Project API key and secret")***

Clicking on "Create a new project" allows to you to register a new documentation project for our demo model. 

Select "Customer Churn model" from the Model drop-down, and "Initial Validation" as Type. Finally, click on "Create Project".

### Finding the project API key and secret 

In the "Client Integration" page of the newly created project, you will find the initialization code that allows the client library to associate documentation and tests with the appropriate project. The initialization code configures the following arguments: 

* api_host: Location of the ValidMind API.
* api_key: Account API key.
* api_secret: Account Secret key.
* project: The project identifier. The `project` argument is mandatory since it allows the library to associate all data collected with a specific account project.


The code snippet can be copied and pasted directly in the cell below to initialize the ValidMind Developer Framework when run:  

In [3]:
## Replace the code below with the code snippet from your project ## 

import validmind as vm

vm.init(
    api_host = "https://api.prod.validmind.ai/api/v1/tracking",
    api_key = "...",
    api_secret = "...",
    project = "..."
)

2023-09-26 22:07:16,032 - INFO(validmind.api_client): Connected to ValidMind. Project: Customer Churn - Initial Validation (clkvhtg6g0005q08h5h9uhtjl)


## Preview the model's documentation template

All models are assigned a documentation template when registered. The template defines a list of sections that are used to document the model. Each section can contain any number of rich text and test driven blocks that populate the documentation. Test driven blocks are populated by running tests against the model.

We can preview the model documentation template for this project by running the following code:

In [4]:
vm.preview_template()

Accordion(children=(Accordion(children=(HTML(value='<p>Empty Section</p>'), Accordion(children=(HTML(value='<p…

## Load the demo dataset

For the purpose of this demonstration, we will use a sample dataset provided by the ValidMind library. 

In [5]:
# Import the sample dataset from the library
from validmind.datasets.classification import customer_churn as demo_dataset
# You can try a different dataset with: 
#from validmind.datasets.classification import taiwan_credit as demo_dataset

df = demo_dataset.load_data()

#### Initialize a dataset object for ValidMind

Before running the test plan, we must first initialize a ValidMind dataset object using the `init_dataset` function from the `vm` module. This function takes in arguements: `dataset` which is the dataset that we want to analyze; `target_column` which is used to identify the target variable; `class_labels` which is used to identify the labels used for classification model training.

In [6]:
vm_dataset = vm.init_dataset(
    dataset=df,
    target_column=demo_dataset.target_column,
    class_labels=demo_dataset.class_labels
)

2023-09-26 22:07:17,972 - INFO(validmind.client): Pandas dataset detected. Initializing VM Dataset instance...


## Documenting the model

We will need to preprocess the dataset and produce the training, test and validation splits first.

### Prepocess the raw dataset

For demonstration purposes, we simplified the preprocessing using demo_dataset.preprocess which executes the following operations: 

In [7]:
train_df, validation_df, test_df = demo_dataset.preprocess(df)

x_train = train_df.drop(demo_dataset.target_column, axis=1)
y_train = train_df[demo_dataset.target_column]
x_val = validation_df.drop(demo_dataset.target_column, axis=1)
y_val = validation_df[demo_dataset.target_column]

model = xgb.XGBClassifier(early_stopping_rounds=10)
model.set_params(
    eval_metric=["error", "logloss", "auc"],
)
model.fit(
    x_train,
    y_train,
    eval_set=[(x_val, y_val)],
    verbose=False,
)

We can now initialize the training and test datasets into dataset objects using vm.init_dataset():

In [8]:
vm_train_ds = vm.init_dataset(
    dataset=train_df,
    type="generic",
    target_column=demo_dataset.target_column
)

vm_test_ds = vm.init_dataset(
    dataset=test_df,
    type="generic",
    target_column=demo_dataset.target_column
)

2023-09-26 22:07:18,121 - INFO(validmind.client): The 'type' argument to init_dataset() argument is deprecated and no longer required.
2023-09-26 22:07:18,122 - INFO(validmind.client): Pandas dataset detected. Initializing VM Dataset instance...
2023-09-26 22:07:18,137 - INFO(validmind.client): The 'type' argument to init_dataset() argument is deprecated and no longer required.
2023-09-26 22:07:18,137 - INFO(validmind.client): Pandas dataset detected. Initializing VM Dataset instance...


We also initialize a model object using vm.init_model():

In [9]:
vm_model = vm.init_model(
    model,
    train_ds=vm_train_ds,
    test_ds=vm_test_ds,
)

### Run the template documentation suite

We are now ready to run the model's documentation tests as defined in its template. The following function runs every test in the template and sends all documentation artifacts to the ValidMind platform.

In [10]:
full_suite = vm.run_documentation_tests(
    dataset=vm_dataset,
    model=vm_model
)

HBox(children=(Label(value='Running test suite...'), IntProgress(value=0, max=54)))

VBox(children=(HTML(value='<h2>Test Suite Results: <i style="color: #DE257E">Binary Classification V2</i></h2>…

You can access and review the resulting documentation in the ValidMind UI, in the "Model Development" section of the model documentation. 