# Quickstart for Classification Models (Full Suite)

This notebooks provides a quick introduction to documenting a model using the ValidMind developer framework. We will use sample datasets provided by the library and train a simple classification model.

## Initialize ValidMind

In [1]:
%load_ext dotenv
%dotenv dev.env

import validmind as vm
import xgboost as xgb

vm.init(
  api_host = "http://localhost:3000/api/v1/tracking",
  project = "clhdxzbb700020a8hpu126rq0"
)

Connected to ValidMind. Project: Customer Churn Model - Initial Validation (clhdxzbb700020a8hpu126rq0)


## Load the Demo Dataset

In [2]:
# You can also import customer_churn like this:
from validmind.datasets.classification import customer_churn as demo_dataset
# from validmind.datasets.classification import taiwan_credit as demo_dataset

df = demo_dataset.load_data()

In [3]:
vm_dataset = vm.init_dataset(
    dataset=df,
    target_column=demo_dataset.target_column,
    class_labels=demo_dataset.class_labels
)

Pandas dataset detected. Initializing VM Dataset instance...
Inferring dataset types...


## Run the Full Data and Model Validation Test Suite

We will need to preprocess the dataset and produce the training, test and validation splits first.

### Prepocess the Raw Dataset

In [4]:
train_df, validation_df, test_df = demo_dataset.preprocess(df)

In [5]:
x_train = train_df.drop(demo_dataset.target_column, axis=1)
y_train = train_df[demo_dataset.target_column]
x_val = validation_df.drop(demo_dataset.target_column, axis=1)
y_val = validation_df[demo_dataset.target_column]

model = xgb.XGBClassifier(early_stopping_rounds=10)
model.set_params(
    eval_metric=["error", "logloss", "auc"],
)
model.fit(
    x_train,
    y_train,
    eval_set=[(x_val, y_val)],
    verbose=False,
)

In [6]:
vm_train_ds = vm.init_dataset(
    dataset=train_df,
    type="generic",
    target_column=demo_dataset.target_column
)

vm_test_ds = vm.init_dataset(
    dataset=test_df,
    type="generic",
    target_column=demo_dataset.target_column
)

vm_model = vm.init_model(
    model,
    train_ds=vm_train_ds,
    test_ds=vm_test_ds,
)

Pandas dataset detected. Initializing VM Dataset instance...
Inferring dataset types...
Pandas dataset detected. Initializing VM Dataset instance...
Inferring dataset types...


### Run the Full Suite

In [7]:
full_suite = vm.run_test_suite(
    "binary_classifier_full_suite",
    dataset=vm_dataset,
    model=vm_model
)

                                                                                                                                        

Unnamed: 0,RowNumber,CustomerId,CreditScore,Age,Tenure,Balance,NumOfProducts,HasCrCard,IsActiveMember,EstimatedSalary,Exited
count,8000.0,8000.0,8000.0,8000.0,8000.0,8000.0,8000.0,8000.0,8000.0,8000.0,8000.0
mean,5020.52,15690470.0,650.159625,38.948875,5.033875,76434.096511,1.5325,0.702625,0.519875,99790.187959,0.202
std,2885.718516,71902.47,96.84623,10.458952,2.885267,62612.251258,0.580505,0.457132,0.499636,57520.508892,0.401517
min,1.0,15565700.0,350.0,18.0,0.0,0.0,1.0,0.0,0.0,11.58,0.0
25%,2518.75,15628160.0,583.0,32.0,3.0,0.0,1.0,0.0,0.0,50857.1025,0.0
50%,5036.5,15690140.0,651.5,37.0,5.0,97263.675,1.0,1.0,1.0,99504.89,0.0
75%,7512.25,15752380.0,717.0,44.0,8.0,128044.5075,2.0,1.0,1.0,149216.32,0.0
max,10000.0,15815660.0,850.0,92.0,10.0,250898.09,4.0,1.0,1.0,199992.48,1.0


                                                                                                                                

                                                                                                                                   

                                                                                                                                      

                                                                                                                                 