# Understanding and Utilizing RawData in ValidMind Tests

In ValidMind, test functions can return a special object called **RawData**. This object holds, as the name suggests, intermediate or unprocessed data that is produced somewhere in the test logic but is not returned as part of the test's visible output (tables or figures). This data can be useful when running post-processing functions with tests to recompute tabular outputs, redraw figure or even create new outputs entirely. In this notebook, we demonstrate how to access, inspect, and utilize RawData from ValidMind tests with a couple of examples.

In [None]:
import xgboost as xgb
import validmind as vm
from validmind.datasets.classification import customer_churn

raw_df = customer_churn.load_data()

train_df, validation_df, test_df = customer_churn.preprocess(raw_df)

x_train = train_df.drop(customer_churn.target_column, axis=1)
y_train = train_df[customer_churn.target_column]
x_val = validation_df.drop(customer_churn.target_column, axis=1)
y_val = validation_df[customer_churn.target_column]

model = xgb.XGBClassifier(early_stopping_rounds=10)
model.set_params(
    eval_metric=["error", "logloss", "auc"],
)
model.fit(
    x_train,
    y_train,
    eval_set=[(x_val, y_val)],
    verbose=False,
)

vm_raw_dataset = vm.init_dataset(
    dataset=raw_df,
    input_id="raw_dataset",
    target_column=customer_churn.target_column,
    class_labels=customer_churn.class_labels,
    __log=False,
)

vm_train_ds = vm.init_dataset(
    dataset=train_df,
    input_id="train_dataset",
    target_column=customer_churn.target_column,
    __log=False,
)

vm_test_ds = vm.init_dataset(
    dataset=test_df,
    input_id="test_dataset",
    target_column=customer_churn.target_column,
    __log=False,
)

vm_model = vm.init_model(
    model,
    input_id="model",
    __log=False,
)

vm_train_ds.assign_predictions(
    model=vm_model,
)

vm_test_ds.assign_predictions(
    model=vm_model,
)

## Example 1: Using RawData from the ROC Curve Test

In this example, we run the ROC Curve test, inspect its RawData output, and then create a custom ROC curve using the raw data values.

In [None]:
from validmind.tests import run_test

# Run the ROC Curve test normally
result_roc = run_test(
    "validmind.model_validation.sklearn.ROCCurve",
    inputs={"dataset": vm_test_ds, "model": vm_model},
    generate_description=False,
)

Now let's assume we want to create a custom version of the above figure. First, let's inspect the raw data that this test produces so we can see what we have to work with.

`RawData` objects have a `inspect()` method that will pretty print the attributes of the object to be able to quickly see the data and its types.

In [None]:
# Inspect the RawData output from the ROC test
print("RawData from ROC Curve Test:")
result_roc.raw_data.inspect()

As we can see, the ROC Curve returns a `RawData` object with the following attributes:
- `fpr`: A list of false positive rates
- `tpr`: A list of true positive rates
- `auc`: The area under the curve

This should be enough to create our own custom ROC curve via a post-processing function without having to create a whole new test from scratch and without having to recompute any of the data.

In [None]:
import matplotlib.pyplot as plt

from validmind.vm_models.result import TestResult


def custom_roc_curve(result: TestResult):
    # Extract raw data from the test result
    fpr = result.raw_data.fpr
    tpr = result.raw_data.tpr
    auc = result.raw_data.auc

    # Create a custom ROC curve plot
    fig = plt.figure()
    plt.plot(fpr, tpr, label=f"Custom ROC (AUC = {auc:.2f})", color="blue")
    plt.plot([0, 1], [0, 1], linestyle="--", color="gray", label="Random Guess")
    plt.xlabel("False Positive Rate")
    plt.ylabel("True Positive Rate")
    plt.title("Custom ROC Curve from RawData")
    plt.legend()

    # close the plot to avoid it automatically being shown in the notebook
    plt.close()

    # remove existing figure
    result.remove_figure(0)

    # add new figure
    result.add_figure(fig)

    return result

# test it on the existing result
modified_result = custom_roc_curve(result_roc)

# show the modified result
modified_result.show()

Now that we have created a post-processing function and verified that it works on our existing test result, we can use it directly in `run_test()` from now on:

In [None]:
result = run_test(
    "validmind.model_validation.sklearn.ROCCurve",
    inputs={"dataset": vm_test_ds, "model": vm_model},
    post_process_fn=custom_roc_curve,
    generate_description=False,
)

## More Examples

### Pearson Correlation Matrix

Try commenting out the `post_process_fn` argument in the following cell and see what happens between different runs.

In [None]:
import plotly.graph_objects as go


def custom_heatmap(result: TestResult):
    corr_matrix = result.raw_data.correlation_matrix

    heatmap = go.Heatmap(
        z=corr_matrix.values,
        x=list(corr_matrix.columns),
        y=list(corr_matrix.index),
        colorscale="Viridis",
    )
    fig = go.Figure(data=[heatmap])
    fig.update_layout(title="Custom Heatmap from RawData")

    plt.close()

    result.remove_figure(0)
    result.add_figure(fig)

    return result


result_corr = run_test(
    "validmind.data_validation.PearsonCorrelationMatrix",
    inputs={"dataset": vm_test_ds},
    generate_description=False,
    post_process_fn=custom_heatmap,
)

### Precision-Recall Curve

Let's try the same thing with the Precision-Recall Curve test.

In [None]:
def custom_pr_curve(result: TestResult):
    precision = result.raw_data.precision
    recall = result.raw_data.recall

    fig = plt.figure()
    plt.plot(recall, precision, label="Precision-Recall Curve")
    plt.xlabel("Recall")
    plt.ylabel("Precision")
    plt.title("Custom Precision-Recall Curve from RawData")
    plt.legend()

    plt.close()
    result.remove_figure(0)
    result.add_figure(fig)

    return result

result_pr = run_test(
    "validmind.model_validation.sklearn.PrecisionRecallCurve",
    inputs={"dataset": vm_test_ds, "model": vm_model},
    generate_description=False,
    post_process_fn=custom_pr_curve,
)

## Using RawData in Custom Tests

These examples demonstrate some very simple ways to use the `RawData` feature of ValidMind tests. The majority of ValidMind-developed tests return some form of raw data that can be used to customize the output of the test. But you can also create your own tests that return `RawData` objects and use them in the same way.

Let's take a look at how this can be done in custom tests.

In [None]:
import pandas as pd

from validmind import test, RawData
from validmind.vm_models import VMDataset, VMModel


@test("custom.MyCustomTest")
def MyCustomTest(dataset: VMDataset, model: VMModel) -> tuple[go.Figure, RawData]:
    """Custom test that produces a figure and a RawData object"""
    # pretend we are using the dataset and model to compute some data
    # ...

    # create some fake data that will be used to generate a figure
    data = pd.DataFrame({"x": [10, 20, 30, 40, 50], "y": [10, 20, 30, 40, 50]})

    # create the figure (scatter plot)
    fig = go.Figure(data=go.Scatter(x=data["x"], y=data["y"]))

    # now let's create a RawData object that holds the "computed" data
    raw_data = RawData(scatter_data_df=data)

    # finally, return both the figure and the raw data
    return fig, raw_data


my_result = run_test(
    "custom.MyCustomTest",
    inputs={"dataset": vm_test_ds, "model": vm_model},
    generate_description=False,
)

We can see that the test result shows the figure. But since we returned a `RawData` object, we can also inspect the contents and see how we could use it to customize or regenerate the figure in the post-processing function.

In [None]:
my_result.raw_data.inspect()

We can see that we get a nicely-formatted preview of the dataframe we stored in the raw data object. Let's go ahead and use it to re-plot our data.

In [None]:
def custom_plot(result: TestResult):
    data = result.raw_data.scatter_data_df

    # use something other than a scatter plot
    fig = go.Figure(data=go.Bar(x=data["x"], y=data["y"]))
    fig.update_layout(title="Custom Bar Chart from RawData")
    fig.update_xaxes(title="X Axis")
    fig.update_yaxes(title="Y Axis")

    result.remove_figure(0)
    result.add_figure(fig)

    return result

result = run_test(
    "custom.MyCustomTest",
    inputs={"dataset": vm_test_ds, "model": vm_model},
    post_process_fn=custom_plot,
    generate_description=False,
)

## Conclusion

This notebook has demonstrated how to use the `RawData` feature of ValidMind tests to customize the output of tests. It has also shown how to create custom tests that return `RawData` objects and use them in the same way.

This feature is a powerful tool for creating custom tests and post-processing functions that can be used to generate a wide variety of outputs from ValidMind tests.