# Explore tests

View and learn more about the tests available in the ValidMind Library, including code examples and usage of key functions.

In this notebook, we'll dive deep into the utilities available for viewing and understanding the various tests that ValidMind provides through the `tests` module. Whether you're just getting started or looking for advanced tips, you'll find clear examples and explanations to assist you every step of the way.

Before we go into the details, let's import the `describe_test` and `list_tests` functions from the `validmind.tests` module. These are the two functions that can be used to easily filter through tests and view details for individual tests.


In [1]:
from validmind.tests import (
    describe_test,
    list_tests,
    list_tasks,
    list_tags,
    list_tasks_and_tags,
)

## Contents

- [Listing All Tests](#toc1_)
- [Understanding Tags and Task Types](#toc2_)
- [Searching for Specific Tests using `tags` and `tasks`](#toc3_)
- [Delving into Test Details with `describe_test`](#toc4_)
- [Next steps](#toc5_)
  - [Discover more learning resources](#toc5_1_)

<!-- vscode-jupyter-toc-config
	numbering=false
	anchor=true
	flat=false
	minLevel=2
	maxLevel=4
	/vscode-jupyter-toc-config -->
<!-- THIS CELL WILL BE REPLACED ON TOC UPDATE. DO NOT WRITE YOUR TEXT IN THIS CELL -->


<a id='toc1_'></a>

## [Listing All Tests](#toc0_)


The `list_tests` function provides a convenient way to retrieve all available tests in the `validmind.tests` module. When invoked without any parameters, it returns a pandas DataFrame containing detailed information about each test.


In [2]:
list_tests()

ID,Name,Description,Required Inputs,Params,Tags,Tasks
validmind.data_validation.ACFandPACFPlot,AC Fand PACF Plot,Analyzes time series data using Autocorrelation Function (ACF) and Partial Autocorrelation Function (PACF) plots to...,['dataset'],{},"['time_series_data', 'forecasting', 'statistical_test', 'visualization']",['regression']
validmind.data_validation.ADF,ADF,Assesses the stationarity of a time series dataset using the Augmented Dickey-Fuller (ADF) test....,['dataset'],{},"['time_series_data', 'statsmodels', 'forecasting', 'statistical_test', 'stationarity']",['regression']
validmind.data_validation.AutoAR,Auto AR,Automatically identifies the optimal Autoregressive (AR) order for a time series using BIC and AIC criteria....,['dataset'],"{'max_ar_order': {'type': 'int', 'default': 3}}","['time_series_data', 'statsmodels', 'forecasting', 'statistical_test']",['regression']
validmind.data_validation.AutoMA,Auto MA,Automatically selects the optimal Moving Average (MA) order for each variable in a time series dataset based on...,['dataset'],"{'max_ma_order': {'type': 'int', 'default': 3}}","['time_series_data', 'statsmodels', 'forecasting', 'statistical_test']",['regression']
validmind.data_validation.AutoStationarity,Auto Stationarity,Automates Augmented Dickey-Fuller test to assess stationarity across multiple time series in a DataFrame....,['dataset'],"{'max_order': {'type': 'int', 'default': 5}, 'threshold': {'type': 'float', 'default': 0.05}}","['time_series_data', 'statsmodels', 'forecasting', 'statistical_test']",['regression']
validmind.data_validation.BivariateScatterPlots,Bivariate Scatter Plots,Generates bivariate scatterplots to visually inspect relationships between pairs of numerical predictor variables...,['dataset'],{},"['tabular_data', 'numerical_data', 'visualization']",['classification']
validmind.data_validation.BoxPierce,Box Pierce,Detects autocorrelation in time-series data through the Box-Pierce test to validate model performance....,['dataset'],{},"['time_series_data', 'forecasting', 'statistical_test', 'statsmodels']",['regression']
validmind.data_validation.ChiSquaredFeaturesTable,Chi Squared Features Table,Assesses the statistical association between categorical features and a target variable using the Chi-Squared test....,['dataset'],"{'p_threshold': {'type': '_empty', 'default': 0.05}}","['tabular_data', 'categorical_data', 'statistical_test']",['classification']
validmind.data_validation.ClassImbalance,Class Imbalance,Evaluates and quantifies class distribution imbalance in a dataset used by a machine learning model....,['dataset'],"{'min_percent_threshold': {'type': 'int', 'default': 10}}","['tabular_data', 'binary_classification', 'multiclass_classification', 'data_quality']",['classification']
validmind.data_validation.DatasetDescription,Dataset Description,Provides comprehensive analysis and statistical summaries of each column in a machine learning model's dataset....,['dataset'],{},"['tabular_data', 'time_series_data', 'text_data']","['classification', 'regression', 'text_classification', 'text_summarization']"


<a id='toc2_'></a>

## Understanding Tags and Task Types

Effectively using ValidMind's tests involves a deep understanding of its 'tags' and 'task types'. Here's a breakdown:

- **Task Types**: Represent the kind of modeling task associated with a test. For instance:

  - **classification:** Works with Classification Models and Datasets
  - **regression:** Works with Regression Models and Datasets
  - **text classification:** Works with Text Classification Models and Datasets
  - **text summarization:** Works with Text Summarization Models and Datasets

- **Tags**: Free-form descriptors providing more details about the test, what data and models the test is compatible with and what category the test falls into etc. Some examples include:
  - **llm:** Tests that work with Large Language Models
  - **nlp:** Tests relevant for natural language processing.
  - **binary_classification:** Tests for binary classification tasks.
  - **forecasting:** Tests for forecasting and time-series analysis.
  - **tabular_data:** Tests for tabular data like CSVs and Excel spreadsheets.


You can use the functions list_tasks() and list_tags() to view all the tasks and tags used for classifying all the tests available in the ValidMind Library:


In [3]:
list_tasks()

['time_series_forecasting',
 'feature_extraction',
 'text_qa',
 'text_generation',
 'residual_analysis',
 'visualization',
 'text_classification',
 'regression',
 'nlp',
 'text_summarization',
 'data_validation',
 'classification',
 'clustering',
 'monitoring']

In [4]:
list_tags()

['few_shot',
 'ragas',
 'bias_and_fairness',
 'AUC',
 'visualization',
 'rag_performance',
 'logistic_regression',
 'model_validation',
 'credit_risk',
 'model_selection',
 'linear_regression',
 'clustering',
 'data_distribution',
 'model_explainability',
 'frequency_analysis',
 'model_interpretation',
 'time_series_data',
 'forecasting',
 'llm',
 'multiclass_classification',
 'data_validation',
 'binary_classification',
 'stationarity',
 'senstivity_analysis',
 'retrieval_performance',
 'categorical_data',
 'seasonality',
 'qualitative',
 'model_comparison',
 'model_training',
 'data_quality',
 'regression',
 'anomaly_detection',
 'calibration',
 'model_predictions',
 'dimensionality_reduction',
 'descriptive_statistics',
 'classification',
 'unit_root_test',
 'metadata',
 'threshold_optimization',
 'model_diagnosis',
 'feature_selection',
 'data_analysis',
 'statistical_test',
 'embeddings',
 'analysis',
 'feature_importance',
 'scorecard',
 'correlation',
 'classification_metrics',


If you want to see which tags correspond to which task type, you can use the function `list_tasks_and_tags()`:


In [5]:
list_tasks_and_tags()

Task,Tags
regression,"bias_and_fairness, visualization, model_selection, linear_regression, data_distribution, model_explainability, model_interpretation, time_series_data, forecasting, multiclass_classification, data_validation, binary_classification, stationarity, model_performance, senstivity_analysis, categorical_data, seasonality, data_quality, regression, model_predictions, descriptive_statistics, unit_root_test, metadata, model_diagnosis, feature_selection, data_analysis, statistical_test, analysis, feature_importance, correlation, sklearn, statsmodels, numerical_data, text_data, tabular_data, model_training"
classification,"bias_and_fairness, AUC, visualization, logistic_regression, model_validation, credit_risk, linear_regression, data_distribution, time_series_data, multiclass_classification, binary_classification, categorical_data, model_comparison, model_training, data_quality, anomaly_detection, calibration, descriptive_statistics, classification, metadata, model_diagnosis, threshold_optimization, feature_selection, data_analysis, statistical_test, classification_metrics, feature_importance, scorecard, correlation, sklearn, statsmodels, numerical_data, text_data, tabular_data, model_performance"
text_classification,"few_shot, ragas, visualization, frequency_analysis, model_comparison, feature_importance, time_series_data, nlp, llm, sklearn, multiclass_classification, zero_shot, text_data, binary_classification, retrieval_performance, tabular_data, model_performance, model_diagnosis"
text_summarization,"few_shot, ragas, qualitative, visualization, frequency_analysis, embeddings, rag_performance, time_series_data, nlp, llm, zero_shot, text_data, dimensionality_reduction, retrieval_performance, tabular_data"
data_validation,"stationarity, time_series_data, statsmodels, unit_root_test"
time_series_forecasting,"model_explainability, visualization, time_series_data, sklearn, model_predictions, data_validation, model_performance, model_training, metadata"
nlp,"visualization, frequency_analysis, data_validation, nlp, text_data"
clustering,"sklearn, kmeans, clustering, model_performance"
residual_analysis,regression
visualization,regression


<a id='toc3_'></a>

## Searching for Specific Tests using `tags` and `tasks`

While listing all tests is valuable, there are times when you need to narrow down your search. The `list_tests` function offers `filter`, `task`, and `tags` parameters to assist in this.


If you're targeting a specific test or tests that match a particular task type, the `filter` parameter comes in handy. For example, to list tests that are compatible with 'sklearn' models:


In [6]:
list_tests(filter="sklearn")

ID,Name,Description,Required Inputs,Params,Tags,Tasks
validmind.model_validation.ClusterSizeDistribution,Cluster Size Distribution,Assesses the performance of clustering models by comparing the distribution of cluster sizes in model predictions...,"['dataset', 'model']",{},"['sklearn', 'model_performance']",['clustering']
validmind.model_validation.TimeSeriesR2SquareBySegments,Time Series R2 Square By Segments,Evaluates the R-Squared values of regression models over specified time segments in time series data to assess...,"['dataset', 'model']","{'segments': {'type': '_empty', 'default': None}}","['model_performance', 'sklearn']","['regression', 'time_series_forecasting']"
validmind.model_validation.sklearn.AdjustedMutualInformation,Adjusted Mutual Information,"Evaluates clustering model performance by measuring mutual information between true and predicted labels, adjusting...","['model', 'dataset']",{},"['sklearn', 'model_performance', 'clustering']",['clustering']
validmind.model_validation.sklearn.AdjustedRandIndex,Adjusted Rand Index,Measures the similarity between two data clusters using the Adjusted Rand Index (ARI) metric in clustering machine...,"['model', 'dataset']",{},"['sklearn', 'model_performance', 'clustering']",['clustering']
validmind.model_validation.sklearn.CalibrationCurve,Calibration Curve,Evaluates the calibration of probability estimates by comparing predicted probabilities against observed...,"['model', 'dataset']","{'n_bins': {'type': 'int', 'default': 10}}","['sklearn', 'model_performance', 'classification']",['classification']
validmind.model_validation.sklearn.ClassifierPerformance,Classifier Performance,"Evaluates performance of binary or multiclass classification models using precision, recall, F1-Score, accuracy,...","['dataset', 'model']","{'average': {'type': 'str', 'default': 'macro'}}","['sklearn', 'binary_classification', 'multiclass_classification', 'model_performance']","['classification', 'text_classification']"
validmind.model_validation.sklearn.ClassifierThresholdOptimization,Classifier Threshold Optimization,Analyzes and visualizes different threshold optimization methods for binary classification models....,"['dataset', 'model']","{'methods': {'type': None, 'default': None}, 'target_recall': {'type': None, 'default': None}}","['model_validation', 'threshold_optimization', 'classification_metrics']",['classification']
validmind.model_validation.sklearn.ClusterCosineSimilarity,Cluster Cosine Similarity,Measures the intra-cluster similarity of a clustering model using cosine similarity....,"['model', 'dataset']",{},"['sklearn', 'model_performance', 'clustering']",['clustering']
validmind.model_validation.sklearn.ClusterPerformanceMetrics,Cluster Performance Metrics,Evaluates the performance of clustering machine learning models using multiple established metrics....,"['model', 'dataset']",{},"['sklearn', 'model_performance', 'clustering']",['clustering']
validmind.model_validation.sklearn.CompletenessScore,Completeness Score,Evaluates a clustering model's capacity to categorize instances from a single class into the same cluster....,"['model', 'dataset']",{},"['sklearn', 'model_performance', 'clustering']",['clustering']


The `task` parameter is designed for pinpointing tests that align with a specific task type. For instance, to find tests tailored for 'classification' tasks:


In [7]:
list_tests(task="classification")

ID,Name,Description,Required Inputs,Params,Tags,Tasks
validmind.data_validation.BivariateScatterPlots,Bivariate Scatter Plots,Generates bivariate scatterplots to visually inspect relationships between pairs of numerical predictor variables...,['dataset'],{},"['tabular_data', 'numerical_data', 'visualization']",['classification']
validmind.data_validation.ChiSquaredFeaturesTable,Chi Squared Features Table,Assesses the statistical association between categorical features and a target variable using the Chi-Squared test....,['dataset'],"{'p_threshold': {'type': '_empty', 'default': 0.05}}","['tabular_data', 'categorical_data', 'statistical_test']",['classification']
validmind.data_validation.ClassImbalance,Class Imbalance,Evaluates and quantifies class distribution imbalance in a dataset used by a machine learning model....,['dataset'],"{'min_percent_threshold': {'type': 'int', 'default': 10}}","['tabular_data', 'binary_classification', 'multiclass_classification', 'data_quality']",['classification']
validmind.data_validation.DatasetDescription,Dataset Description,Provides comprehensive analysis and statistical summaries of each column in a machine learning model's dataset....,['dataset'],{},"['tabular_data', 'time_series_data', 'text_data']","['classification', 'regression', 'text_classification', 'text_summarization']"
validmind.data_validation.DatasetSplit,Dataset Split,"Evaluates and visualizes the distribution proportions among training, testing, and validation datasets of an ML...",['datasets'],{},"['tabular_data', 'time_series_data', 'text_data']","['classification', 'regression', 'text_classification', 'text_summarization']"
validmind.data_validation.DescriptiveStatistics,Descriptive Statistics,Performs a detailed descriptive statistical analysis of both numerical and categorical data within a model's...,['dataset'],{},"['tabular_data', 'time_series_data', 'data_quality']","['classification', 'regression']"
validmind.data_validation.Duplicates,Duplicates,"Tests dataset for duplicate entries, ensuring model reliability via data quality verification....",['dataset'],"{'min_threshold': {'type': '_empty', 'default': 1}}","['tabular_data', 'data_quality', 'text_data']","['classification', 'regression']"
validmind.data_validation.FeatureTargetCorrelationPlot,Feature Target Correlation Plot,Visualizes the correlation between input features and the model's target output in a color-coded horizontal bar...,['dataset'],"{'fig_height': {'type': '_empty', 'default': 600}}","['tabular_data', 'visualization', 'correlation']","['classification', 'regression']"
validmind.data_validation.HighCardinality,High Cardinality,Assesses the number of unique values in categorical columns to detect high cardinality and potential overfitting....,['dataset'],"{'num_threshold': {'type': 'int', 'default': 100}, 'percent_threshold': {'type': 'float', 'default': 0.1}, 'threshold_type': {'type': 'str', 'default': 'percent'}}","['tabular_data', 'data_quality', 'categorical_data']","['classification', 'regression']"
validmind.data_validation.HighPearsonCorrelation,High Pearson Correlation,Identifies highly correlated feature pairs in a dataset suggesting feature redundancy or multicollinearity....,['dataset'],"{'max_threshold': {'type': 'float', 'default': 0.3}, 'top_n_correlations': {'type': 'int', 'default': 10}, 'feature_columns': {'type': 'list', 'default': None}}","['tabular_data', 'data_quality', 'correlation']","['classification', 'regression']"


The `tags` parameter facilitates searching tests by their tags. For instance, if you're interested in only tests associated designed for `model_performance` that produce a plot (denoted by the `visualization` tag)


In [8]:
list_tests(tags=["model_performance", "visualization"])

ID,Name,Description,Required Inputs,Params,Tags,Tasks
validmind.model_validation.RegressionResidualsPlot,Regression Residuals Plot,Evaluates regression model performance using residual distribution and actual vs. predicted plots....,"['model', 'dataset']","{'bin_size': {'type': 'float', 'default': 0.1}}","['model_performance', 'visualization']",['regression']
validmind.model_validation.sklearn.ConfusionMatrix,Confusion Matrix,Evaluates and visually represents the classification ML model's predictive performance using a Confusion Matrix...,"['dataset', 'model']","{'threshold': {'type': 'float', 'default': 0.5}}","['sklearn', 'binary_classification', 'multiclass_classification', 'model_performance', 'visualization']","['classification', 'text_classification']"
validmind.model_validation.sklearn.PrecisionRecallCurve,Precision Recall Curve,Evaluates the precision-recall trade-off for binary classification models and visualizes the Precision-Recall curve....,"['model', 'dataset']",{},"['sklearn', 'binary_classification', 'model_performance', 'visualization']","['classification', 'text_classification']"
validmind.model_validation.sklearn.ROCCurve,ROC Curve,Evaluates binary classification model performance by generating and plotting the Receiver Operating Characteristic...,"['model', 'dataset']",{},"['sklearn', 'binary_classification', 'multiclass_classification', 'model_performance', 'visualization']","['classification', 'text_classification']"
validmind.model_validation.sklearn.TrainingTestDegradation,Training Test Degradation,Tests if model performance degradation between training and test datasets exceeds a predefined threshold....,"['datasets', 'model']","{'max_threshold': {'type': 'float', 'default': 0.1}}","['sklearn', 'binary_classification', 'multiclass_classification', 'model_performance', 'visualization']","['classification', 'text_classification']"
validmind.ongoing_monitoring.CalibrationCurveDrift,Calibration Curve Drift,Evaluates changes in probability calibration between reference and monitoring datasets....,"['datasets', 'model']","{'n_bins': {'type': 'int', 'default': 10}, 'drift_pct_threshold': {'type': 'float', 'default': 20}}","['sklearn', 'binary_classification', 'model_performance', 'visualization']","['classification', 'text_classification']"
validmind.ongoing_monitoring.ROCCurveDrift,ROC Curve Drift,Compares ROC curves between reference and monitoring datasets....,"['datasets', 'model']",{},"['sklearn', 'binary_classification', 'model_performance', 'visualization']","['classification', 'text_classification']"


The above parameters can be combined to create complex queries. For instance, to find tests that are compatible with 'sklearn' models, designed for 'classification' tasks, and produce a plot:


In [9]:
list_tests(
    tags=["model_performance", "visualization", "sklearn"], task="classification"
)

ID,Name,Description,Required Inputs,Params,Tags,Tasks
validmind.model_validation.sklearn.ConfusionMatrix,Confusion Matrix,Evaluates and visually represents the classification ML model's predictive performance using a Confusion Matrix...,"['dataset', 'model']","{'threshold': {'type': 'float', 'default': 0.5}}","['sklearn', 'binary_classification', 'multiclass_classification', 'model_performance', 'visualization']","['classification', 'text_classification']"
validmind.model_validation.sklearn.PrecisionRecallCurve,Precision Recall Curve,Evaluates the precision-recall trade-off for binary classification models and visualizes the Precision-Recall curve....,"['model', 'dataset']",{},"['sklearn', 'binary_classification', 'model_performance', 'visualization']","['classification', 'text_classification']"
validmind.model_validation.sklearn.ROCCurve,ROC Curve,Evaluates binary classification model performance by generating and plotting the Receiver Operating Characteristic...,"['model', 'dataset']",{},"['sklearn', 'binary_classification', 'multiclass_classification', 'model_performance', 'visualization']","['classification', 'text_classification']"
validmind.model_validation.sklearn.TrainingTestDegradation,Training Test Degradation,Tests if model performance degradation between training and test datasets exceeds a predefined threshold....,"['datasets', 'model']","{'max_threshold': {'type': 'float', 'default': 0.1}}","['sklearn', 'binary_classification', 'multiclass_classification', 'model_performance', 'visualization']","['classification', 'text_classification']"
validmind.ongoing_monitoring.CalibrationCurveDrift,Calibration Curve Drift,Evaluates changes in probability calibration between reference and monitoring datasets....,"['datasets', 'model']","{'n_bins': {'type': 'int', 'default': 10}, 'drift_pct_threshold': {'type': 'float', 'default': 20}}","['sklearn', 'binary_classification', 'model_performance', 'visualization']","['classification', 'text_classification']"
validmind.ongoing_monitoring.ROCCurveDrift,ROC Curve Drift,Compares ROC curves between reference and monitoring datasets....,"['datasets', 'model']",{},"['sklearn', 'binary_classification', 'model_performance', 'visualization']","['classification', 'text_classification']"


# Programmatic Use

To work with a specific set of tests programmatically, you can store the results in a variable. For instance, let's list all tests that are designed for Text Summarization tests and store them in `text_summarization_tests` for further use.


In [10]:
text_summarization_tests = list_tests(task="text_summarization", pretty=False)
text_summarization_tests

['validmind.data_validation.DatasetDescription',
 'validmind.data_validation.DatasetSplit',
 'validmind.data_validation.nlp.CommonWords',
 'validmind.data_validation.nlp.Hashtags',
 'validmind.data_validation.nlp.LanguageDetection',
 'validmind.data_validation.nlp.Mentions',
 'validmind.data_validation.nlp.Punctuations',
 'validmind.data_validation.nlp.StopWords',
 'validmind.data_validation.nlp.TextDescription',
 'validmind.model_validation.BertScore',
 'validmind.model_validation.BleuScore',
 'validmind.model_validation.ContextualRecall',
 'validmind.model_validation.MeteorScore',
 'validmind.model_validation.RegardScore',
 'validmind.model_validation.RougeScore',
 'validmind.model_validation.TokenDisparity',
 'validmind.model_validation.ToxicityScore',
 'validmind.model_validation.embeddings.CosineSimilarityComparison',
 'validmind.model_validation.embeddings.CosineSimilarityHeatmap',
 'validmind.model_validation.embeddings.EuclideanDistanceComparison',
 'validmind.model_validation.

<a id='toc4_'></a>

## Delving into Test Details with `describe_test`

After identifying a set of potential tests, you might want to explore the specifics of an individual test. The `describe_test` function provides a deep dive into the details of a test. It reveals the test name, description, ID, test type, and required inputs. Below, we showcase how to describe a test using its ID:


In [11]:
describe_test("validmind.model_validation.sklearn.OverfitDiagnosis")

Accordion(children=(HTML(value='\n<div>\n  <h2>Overfit Diagnosis</h2>\n  <div style="border: 1px solid #ddd; b…

<a id='toc5_'></a>

## Next steps

By harnessing the functionalities presented in this guide, you should be able to easily list and filter through all of ValidMind's available tests and find those you are interested in running against your model and/or dataset. The next step is to take the IDs of the tests you'd like to run and either create a test suite for reuse or just run them directly to try them out. See the other notebooks for a tutorial on how to do both.

<a id='toc5_1_'></a>

### Discover more learning resources

We offer many interactive notebooks to help you document models:

- [Run tests & test suites](https://docs.validmind.ai/developer/model-testing/testing-overview.html)
- [Code samples](https://docs.validmind.ai/developer/samples-jupyter-notebooks.html)

Or, visit our [documentation](https://docs.validmind.ai/) to learn more about ValidMind.
