# Understand and utilize `RawData` in ValidMind tests

Test functions in ValidMind can return a special object called *`RawData`*, which holds intermediate or unprocessed data produced somewhere in the test logic but not returned as part of the test's visible output, such as in tables or figures.

This data is useful when running post-processing functions with tests to recompute tabular outputs, redraw figures, or even create new outputs entirely. In this notebook, learn how to access, inspect, and utilize `RawData` from ValidMind tests with a couple of examples.

## Setup

### Install and intialize the ValidMind Library

First, let's make sure that the ValidMind Library is installed and ready to go:

In [None]:
%pip install -q validmind
import validmind as vm

### Initialize the Python environment

Next, we'll import the necessary libraries and set up your Python environment for data analysis:

In [None]:
import xgboost as xgb

### Load the sample dataset

Then, we'll import a sample ValidMind dataset:

In [None]:
from validmind.datasets.classification import customer_churn
raw_df = customer_churn.load_data()

#### Preprocess the raw dataset

We'll also perform a number of operations to get ready for the subsequent steps:


- **Preprocess the data:** Splits the DataFrame (`df`) into multiple datasets (`train_df`, `validation_df`, and `test_df`) using `demo_dataset.preprocess` to simplify preprocessing.
- **Separate features and targets:** Drops the target column to create feature sets (`x_train`, `x_val`) and target sets (`y_train`, `y_val`).
- **Initialize XGBoost classifier:** Creates an `XGBClassifier` object with early stopping rounds set to 10.
- **Set evaluation metrics:** Specifies metrics for model evaluation as `error`, `logloss`, and `auc`.
- **Fit the model:** Trains the model on `x_train` and `y_train` using the validation set `(x_val, y_val)`. Verbose output is disabled.

In [None]:
train_df, validation_df, test_df = customer_churn.preprocess(raw_df)

x_train = train_df.drop(customer_churn.target_column, axis=1)
y_train = train_df[customer_churn.target_column]
x_val = validation_df.drop(customer_churn.target_column, axis=1)
y_val = validation_df[customer_churn.target_column]

model = xgb.XGBClassifier(early_stopping_rounds=10)
model.set_params(
    eval_metric=["error", "logloss", "auc"],
)
model.fit(
    x_train,
    y_train,
    eval_set=[(x_val, y_val)],
    verbose=False,
)

### Initialize the ValidMind objects

#### Initialize the datasets

Before you can run tests, you'll need to initialize a ValidMind dataset object using the [`init_dataset`](https://docs.validmind.ai/validmind/validmind.html#init_dataset) function from the ValidMind (`vm`) module.

We'll include the following arguments:

- **`dataset`** — the raw dataset that you want to provide as input to tests
- **`input_id`** - a unique identifier that allows tracking what inputs are used when running each individual test
- **`target_column`** — a required argument if tests require access to true values. This is the name of the target column in the dataset
- **`class_labels`** — an optional value to map predicted classes to class labels

With all datasets ready, you can now initialize the raw, training, and test datasets (`raw_df`, `train_df` and `test_df`) created earlier into their own dataset objects using [`vm.init_dataset()`](https://docs.validmind.ai/validmind/validmind.html#init_dataset):

In [None]:
vm_raw_dataset = vm.init_dataset(
    dataset=raw_df,
    input_id="raw_dataset",
    target_column=customer_churn.target_column,
    class_labels=customer_churn.class_labels,
    __log=False,
)

vm_train_ds = vm.init_dataset(
    dataset=train_df,
    input_id="train_dataset",
    target_column=customer_churn.target_column,
    __log=False,
)

vm_test_ds = vm.init_dataset(
    dataset=test_df,
    input_id="test_dataset",
    target_column=customer_churn.target_column,
    __log=False,
)

#### Initialize a model object

Additionally, you'll need to initialize a ValidMind model object (`vm_model`) that can be passed to other functions for analysis and tests on the data. 

Simply intialize this model object with [`vm.init_model()`](https://docs.validmind.ai/validmind/validmind.html#init_model):

In [None]:
vm_model = vm.init_model(
    model,
    input_id="model",
    __log=False,
)

#### Assign predictions to the datasets

We can now use the `assign_predictions()` method from the Dataset object to link existing predictions to any model.

If no prediction values are passed, the method will compute predictions automatically:

In [None]:
vm_train_ds.assign_predictions(
    model=vm_model,
)

vm_test_ds.assign_predictions(
    model=vm_model,
)

## Examples

Once you're set up, you can then run the following examples:

### Using `RawData` from the ROC Curve Test

### Pearson Correlation Matrix

### Precision-Recall Curve

### Using `RawData` in custom tests

## Conclusion

ValidMind's `RawData` feature allows you to customize the output of tests, including custom tests. 

This notebook has demonstrated how to use the `RawData` feature of ValidMind tests to customize the output of tests. It has also shown how to create custom tests that return `RawData` objects and use them in the same way.

This feature is a powerful tool for creating custom tests and post-processing functions that can be used to generate a wide variety of outputs from ValidMind tests.