# Configure dataset features

This notebook shows how to use custom feature columns with `init_dataset`. The default behavior of `init_dataset` is to utilize all dataset columns when running tests. As we'll see below, it is also possible to pass in a list of features to use and thus restrict computations to only those features.


## ValidMind at a glance

ValidMind's platform enables organizations to identify, document, and manage model risks for all types of models, including AI/ML models, LLMs, and statistical models. As a model developer, you use the ValidMind Developer Framework to automate documentation and validation tests, and then use the ValidMind AI Risk Platform UI to collaborate on model documentation. Together, these products simplify model risk management, facilitate compliance with regulations and institutional standards, and enhance collaboration between yourself and model validators.

If this is your first time trying out ValidMind, we recommend going through the following resources first:

- [Get started](https://docs.validmind.ai/guide/get-started.html) — The basics, including key concepts, and how our products work
- [Get started with the ValidMind Developer Framework](https://docs.validmind.ai/guide/get-started-developer-framework.html) — The path for developers, more code samples, and our developer reference


## Before you begin

::: {.callout-tip}

### New to ValidMind?

For access to all features available in this notebook, create a free ValidMind account.

Signing up is FREE — [**Sign up now**](https://app.prod.validmind.ai)
:::

If you encounter errors due to missing modules in your Python environment, install the modules with `pip install`, and then re-run the notebook. For more help, refer to [Installing Python Modules](https://docs.python.org/3/installing/index.html).


## Install the client library

The client library provides Python support for the ValidMind Developer Framework. To install it:


In [None]:
%pip install -q validmind

## Initialize the client library

ValidMind generates a unique _code snippet_ for each registered model to connect with your developer environment. You initialize the client library with this code snippet, which ensures that your documentation and tests are uploaded to the correct model when you run the notebook.

Get your code snippet:

1. In a browser, log into the [Platform UI](https://app.prod.validmind.ai).

2. In the left sidebar, navigate to **Model Inventory** and click **+ Register new model**.

3. Enter the model details, making sure to select **Binary classification** as the template and **Marketing/Sales - Attrition/Churn Management** as the use case, and click **Continue**. ([Need more help?](https://docs.validmind.ai/guide/register-models-in-model-inventory.html))

4. Go to **Getting Started** and click **Copy snippet to clipboard**.

Next, replace this placeholder with your own code snippet:


In [None]:
# Replace with your code snippet

import validmind as vm

vm.init(
    api_host="https://api.prod.validmind.ai/api/v1/tracking",
    api_key="...",
    api_secret="...",
    project="..."
)

## Load the sample dataset


In [None]:
%matplotlib inline

# Import the sample dataset from the library

from validmind.datasets.classification import customer_churn as demo_dataset

# You can also try a different dataset with:
# from validmind.datasets.classification import taiwan_credit as demo_dataset

df = demo_dataset.load_data()

### Initialize the training and test datasets

Before you can run a test suite, which are just a collection of tests, you must first initialize a ValidMind dataset object using the [`init_dataset`](https://docs.validmind.ai/validmind/validmind.html#init_dataset) function from the ValidMind (`vm`) module.

This function takes a number of arguments:

- `dataset` — the raw dataset that you want to analyze
- `input_id` - a unique identifier that allows tracking what inputs are used when running each individual test
- `target_column` — the name of the target column in the dataset
- `feature_columns` - the names of the feature columns in the dataset


In [None]:
feature_columns = ['CreditScore', 'Age', 'Tenure', 'Balance',
                   'NumOfProducts', 'HasCrCard', 'IsActiveMember', 'EstimatedSalary']

vm_dataset = vm.init_dataset(
    dataset=df,
    input_id="raw_dataset",
    target_column=demo_dataset.target_column,
    feature_columns=feature_columns
)

### Defining custom features

This section shows how we can define a subset of features to use when computing dataset metrics. Any feature that is not included in the `feature_columns` argument is omitted from the computation of the `DescriptiveStatistics` metric in the examples below.


In the following example we use the metric `DescriptiveStatistics` to show how the output changes when customizing features.


1. Running metric with all the features.


In [None]:
vm_dataset = vm.init_dataset(
    dataset=df,
    input_id="raw_dataset_all_features",
    target_column=demo_dataset.target_column
)

test = vm.tests.run_test(
    test_id="validmind.data_validation.DescriptiveStatistics",
    inputs={"dataset": vm_dataset}
)

2. Running metric with a subset of features.


In [None]:
vm_dataset = vm.init_dataset(
    dataset=df,
    input_id="raw_dataset_subset",
    target_column=demo_dataset.target_column,
    feature_columns=['CreditScore', 'Age', 'Balance', 'Geography']
)

test = vm.tests.run_test(
    test_id="validmind.data_validation.DescriptiveStatistics",
    inputs={"dataset": vm_dataset}
)