# Assign Prediction Values and Probabilities to ValidMind Datasets

In this notebook, you will be guided through the process of assigning prediction values and prediction probabilities with the `assign_prediction()` using the inputs `prediction_values` and `prediction_probabilities`. These two type of predictions are common in classification and logistic resgression models, and you'll see how they can be implemented using a logistic regression model. Throughout this guide, you will learn to:

- Assign prediction values and probabilities that have been computed outside ValidMind (VM).
- Incorporate prediction values and probabilities from datasets that already have prediction columns.
- Automate the assignment of prediction values and probabilities within VM.

## Install the client library

The client library provides Python support for the ValidMind Developer Framework. To install it:

In [None]:
%pip install -q validmind

## Initialize the client library

ValidMind generates a unique _code snippet_ for each registered model to connect with your developer environment. You initialize the client library with this code snippet, which ensures that your documentation and tests are uploaded to the correct model when you run the notebook.

Get your code snippet:

1. In a browser, [log in to ValidMind](https://docs.validmind.ai/guide/configuration/log-in-to-validmind.html).

2. In the left sidebar, navigate to **Model Inventory** and click **+ Register new model**.

3. Enter the model details and click **Continue**. ([Need more help?](https://docs.validmind.ai/guide/model-inventory/register-models-in-inventory.html))

   For example, to register a model for use with this notebook, select:

   - Documentation template: `Binary classification`
   - Use case: `Marketing/Sales - Attrition/Churn Management`

   You can fill in other options according to your preference.

4. Go to **Getting Started** and click **Copy snippet to clipboard**.

Next, replace this placeholder with your own code snippet:

In [None]:
# Replace with your code snippet

import validmind as vm

vm.init(
  api_host = "http://localhost:3000/api/v1/tracking",
  api_key = "...",
  api_secret = "...",
  project = "..."
)

### Preview the documentation template

A template predefines sections for your model documentation and provides a general outline to follow, making the documentation process much easier.

You will upload documentation and test results into this template later on. For now, take a look at the structure that the template provides with the `vm.preview_template()` function from the ValidMind library and note the empty sections:

In [None]:
vm.preview_template()

## Load the sample dataset

The sample dataset used here is provided by the ValidMind library. To be able to use it, you need to import the dataset and load it into a pandas [DataFrame](https://pandas.pydata.org/docs/reference/api/pandas.DataFrame.html), a two-dimensional tabular data structure that makes use of rows and columns:

In [None]:
import statsmodels.api as sm

%matplotlib inline

In [None]:
# Import the sample dataset from the library

from validmind.datasets.credit_risk import lending_club

df = lending_club.load_data(source="offline")

df.info()

## Prepocess the raw dataset

Preprocessing performs a number of operations to get ready for the subsequent steps:

- Preprocess the data: Splits the DataFrame (`df`) into multiple datasets (`train_df`, `validation_df`, and `test_df`) using `demo_dataset.preprocess` to simplify preprocessing.
- Separate features and targets: Drops the target column to create feature sets (`x_train`, `x_val`) and target sets (`y_train`, `y_val`).

In [None]:
preprocess_df = lending_club.preprocess(df)
fe_df = lending_club.feature_engineering(preprocess_df)
train_df, test_df = lending_club.split(fe_df, add_constant=True)

## Train models for testing

- Initialize a GLM Logistic Regression Classifier model

In [None]:
x_train = train_df.drop(lending_club.target_column, axis=1)
y_train = train_df[lending_club.target_column]
x_test = test_df.drop(lending_club.target_column, axis=1)
y_test = test_df[lending_club.target_column]

# Define the model
model = sm.GLM(
    y_train, 
    x_train, 
    family=sm.families.Binomial())

# Fit the model
model = model.fit()
model.summary()

## Initialize ValidMind objects

### Initialize the ValidMind datasets and models

In [None]:
vm_train_ds = vm.init_dataset(
    dataset=train_df,
    input_id="train_dataset",
    target_column=lending_club.target_column,
)

vm_test_ds = vm.init_dataset(
    dataset=test_df, 
    input_id="test_dataset", 
    target_column=lending_club.target_column
)

vm_model = vm.init_model(
    model,
    input_id="glm_model",
)

## Options to assign prediction values and probabilities to VM datasets

### 1. Assing predictions values and probabilities computed outside VM

In [None]:
# Compute probabilities from the model outside ValidMind
train_probabilities = model.predict(x_train)
test_probabilities = model.predict(x_test)

# Compute binary predictions from the probabilities
cut_off_threshold = 0.5
train_binary_predictions = (train_probabilities > cut_off_threshold).astype(int)
test_binary_predictions = (test_probabilities > cut_off_threshold).astype(int)

# Compute scores from the probabilities 
train_scores = lending_club.compute_scores(train_probabilities)
test_scores = lending_club.compute_scores(test_probabilities)

In [None]:
vm_train_ds.assign_predictions(
    model=vm_model,
    prediction_values=train_binary_predictions,
    prediction_probabilities = train_probabilities,
)

vm_test_ds.assign_predictions(
    model=vm_model,
    prediction_values=test_binary_predictions,
    prediction_probabilities = test_probabilities,
)

In [None]:
print(vm_test_ds)
print(vm_train_ds)

#### Run some example tests

In [None]:
run_test = True
if run_test: 

    test= vm.tests.run_test(
        "validmind.model_validation.sklearn.ROCCurve",
        inputs = {
            "dataset": vm_test_ds,
            "model": vm_model,
        }
    )

In [None]:
run_test = True
if run_test: 

    test= vm.tests.run_test(
        "validmind.model_validation.statsmodels.GINITable",
        input_grid = {
            "dataset": [vm_train_ds, vm_test_ds],
            "model": [vm_model],
        }
    )

In [None]:
run_test = True
if run_test:

    test= vm.tests.run_test(
        "validmind.model_validation.sklearn.ClassifierPerformance",
        inputs = {
            "dataset": vm_train_ds,
            "model": vm_model,
        }
    )

### 2. Assing prediction values and probabilities from datasets with existing prediction columns

In [None]:
train_df2 = train_df.copy()
train_df2["glm_prediction_values"] = train_binary_predictions
train_df2["glm_prediction_probabilities"] = train_probabilities
train_df2.head(5)

In [None]:
test_df2 = test_df.copy()
test_df2["glm_prediction_values"] = test_binary_predictions
test_df2["glm_prediction_probabilities"] = test_probabilities
test_df2.head(5)

In [None]:
vm_train_ds = vm.init_dataset(
    dataset=train_df2,
    input_id="train_dataset",
    target_column=lending_club.target_column,
)

vm_test_ds = vm.init_dataset(
    dataset=test_df2,
    input_id="test_dataset",
    target_column=lending_club.target_column,
)


In [None]:
vm_train_ds.assign_predictions(
    model=vm_model, 
    prediction_column="glm_prediction_values",
    probability_column="glm_prediction_probabilities"
)

vm_test_ds.assign_predictions(
    model=vm_model, 
    prediction_column="glm_prediction_values",
    probability_column="glm_prediction_probabilities"
)

#### Run some example tests

In [None]:
run_test = True
if run_test: 

    test= vm.tests.run_test(
        "validmind.model_validation.sklearn.ROCCurve",
        inputs = {
            "dataset": vm_test_ds,
            "model": vm_model,
        }
    )

In [None]:
run_test = True
if run_test: 

    test= vm.tests.run_test(
        "validmind.model_validation.statsmodels.GINITable",
        input_grid = {
            "dataset": [vm_train_ds, vm_test_ds],
            "model": [vm_model],
        }
    )

In [None]:
run_test = True
if run_test:

    test= vm.tests.run_test(
        "validmind.model_validation.sklearn.ClassifierPerformance",
        inputs = {
            "dataset": vm_train_ds,
            "model": vm_model,
        }
    )

### 3. Assign prediction values and probabilities computed automatically within VM

In [None]:
vm_train_ds = vm.init_dataset(
    dataset=train_df,
    input_id="train_dataset",
    target_column=lending_club.target_column,
)

vm_test_ds = vm.init_dataset(
    dataset=test_df,
    input_id="test_dataset",
    target_column=lending_club.target_column,
)

In [None]:
vm_train_ds.assign_predictions(model=vm_model)
vm_test_ds.assign_predictions(model=vm_model)

In [None]:
print(vm_train_ds)
print(vm_test_ds)

#### Run some example tests

In [None]:
run_test = True
if run_test: 

    test= vm.tests.run_test(
        "validmind.model_validation.sklearn.ROCCurve",
        inputs = {
            "dataset": vm_test_ds,
            "model": vm_model,
        }
    )

In [None]:
run_test = True
if run_test: 

    test= vm.tests.run_test(
        "validmind.model_validation.statsmodels.GINITable",
        input_grid = {
            "datasets": [vm_train_ds, vm_test_ds],
            "model": [vm_model],
        }
    )

In [None]:
run_test = True
if run_test:

    test= vm.tests.run_test(
        "validmind.model_validation.sklearn.ClassifierPerformance",
        inputs = {
            "dataset": vm_train_ds,
            "model": vm_model,
        }
    )