In [1]:
from validmind.tests import list_tests, load_test, describe_test

list_tests()

Test Type,Name,Description,ID
ThresholdTest,Bias,"**Purpose:** Bias Evaluation is aimed at assessing if and how the distribution and order of exemplars (examples) within a few-shot learning prompt affect the Language Learning Model's (LLM) output, potentially introducing biases. By examining these influences, we can optimize the model's performance and mitigate unintended biases in its responses. **Test Mechanism:** 1. **Distribution of Exemplars:** Check how varying the number of positive vs. negative examples in a prompt impacts the LLM's classification of a neutral or ambiguous statement. 2. **Order of Exemplars:** Examine if the sequence in which positive and negative examples are presented can sway the LLM's response. For each test case, an LLM is used to grade the input prompt on a scale from 1 to 10, based on whether the examples in the prompt may lead to biased responses. A minimum threshold must be met in order for the test to pass. By default, this threshold is set to 7, but it can be adjusted as needed via the test parameters.",validmind.prompt_validation.Bias
ThresholdTest,Clarity,"**Purpose:** The Clarity Evaluation is designed to assess whether prompts provided to a Language Learning Model (LLM) are unmistakably clear in their instructions. With clear prompts, the LLM is better suited to more accurately and effectively interpret and respond to instructions in the prompt **Test Mechanism:** Using an LLM, prompts are scrutinized for clarity, considering aspects like detail inclusion, persona adoption, step-by-step instructions, use of examples, and desired output length. Each prompt is graded on a scale from 1 to 10 based on its clarity. Prompts scoring at or above a predetermined threshold (default is 7) are marked as clear. This threshold can be adjusted via the test parameters. **Why Clarity Matters:** Clear prompts minimize the room for misinterpretation, allowing the LLM to generate more relevant and accurate responses. Ambiguous or vague instructions might leave the model guessing, leading to suboptimal outputs. **Tactics for Ensuring Clarity that will be referenced during evaluation:** 1. **Detail Inclusion:** Provide essential details or context to prevent the LLM from making assumptions. 2. **Adopt a Persona:** Use system messages to specify the desired persona for the LLM's responses. 3. **Specify Steps:** For certain tasks, delineate the required steps explicitly, helping the model in sequential understanding. 4. **Provide Examples:** While general instructions are efficient, in some scenarios, ""few-shot"" prompting or style examples can guide the LLM more effectively. 5. **Determine Output Length:** Define the targeted length of the response, whether in terms of paragraphs, bullet points, or other units. While word counts aren't always precise, specifying formats like paragraphs can offer more predictable results.",validmind.prompt_validation.Clarity
ThresholdTest,Specificity,"**Purpose:** The Specificity Test aims to assess the clarity, precision, and effectiveness of prompts provided to a Language Learning Model (LLM). Ensuring specificity in the prompts given to an LLM can significantly influence the accuracy and relevance of its outputs. The goal of this test is to ascertain that the instructions in a prompt are unmistakably clear and relevant, eliminating ambiguity and steering the LLM toward desired outcomes. **Test Mechanism:** Utilizing an LLM, each prompt is graded on a specificity scale ranging from 1 to 10. The grade reflects how well the prompt adheres to principles of clarity, detail, and relevancy without being overly verbose. Prompts that achieve a grade equal to or exceeding a predefined threshold (default set to 7) are deemed to pass the evaluation, while those falling below are marked as failing. This threshold can be adjusted as needed. **Why Specificity Matters:** Prompts that are detailed and descriptive often yield better and more accurate results from an LLM. Rather than relying on specific keywords or tokens, it's crucial to have a well-structured and descriptive prompt. Including relevant examples within the prompt can be particularly effective, guiding the LLM to produce outputs in desired formats. However, it's essential to strike a balance. While prompts need to be detailed, they shouldn't be overloaded with unnecessary information. The emphasis should always be on relevancy and conciseness, considering there are limitations to how long a prompt can be. **Example:** Imagine wanting an LLM to extract specific details from a given text. A vague prompt might yield varied results. However, with a prompt like, ""Extract the names of all characters and the cities they visited from the text"", the LLM is guided more precisely towards the desired information extraction.",validmind.prompt_validation.Specificity
ThresholdTest,Robustness,"**Purpose:** The Robustness Integrity Assessment evaluates the resilience and reliability of prompts provided to a Language Learning Model (LLM). The primary objective is to ensure that prompts consistently produce accurate and desired outputs, even in diverse or challenging scenarios. **Test Mechanism:** Prompts are subjected to various conditions, alterations, and contexts to check their stability in eliciting consistent responses from the LLM. Factors such as different phrasings, inclusion of potential distractors, and varied input complexities are introduced to test the robustness of the prompt. By default, the test generates 10 inputs for the prompt but this can be adjusted via the test parameters. **Why Robustness Matters:** A robust prompt ensures consistent performance and reduces the likelihood of unexpected or off-tangent outputs. This consistency is vital for applications where predictability and reliability of the LLM's response are paramount.",validmind.prompt_validation.Robustness
ThresholdTest,Negative Instruction,"**Purpose:** The Positive Instructional Assessment evaluates prompts provided to a Language Learning Model (LLM) to ensure they are framed using affirmative and proactive language. By focusing on what should be done rather than what should be avoided, prompts can guide the LLM more effectively towards generating appropriate and desired outputs. **Test Mechanism:** Employing an LLM as an evaluator, each prompt is meticulously analyzed and graded on use of positive instructions on a scale from 1 to 10. The grade indicates how well the prompt employs affirmative language while avoiding negative or prohibitive instructions. Prompts that achieve a grade equal to or exceeding a predetermined threshold (default set to 7) are recognized as adhering to positive instruction best practices. This threshold can be adjusted via the test parameters. **Why Positive Instructions Matter:** Prompts that are phrased in the affirmative, emphasizing what to do, tend to direct the LLM more clearly than those that focus on what not to do. Negative instructions can lead to ambiguities and undesired model responses. By emphasizing clarity and proactive guidance, we optimize the chances of obtaining relevant and targeted responses from the LLM. **Example:** Consider a scenario involving a chatbot designed to recommend movies. An instruction framed as, ""Don't recommend movies that are horror or thriller"" might cause the LLM to fixate on the genres mentioned, inadvertently producing undesired results. On the other hand, a positively-framed prompt like, ""Recommend family-friendly movies or romantic comedies"" provides clear guidance on the desired output.",validmind.prompt_validation.NegativeInstruction
ThresholdTest,Conciseness,"**Purpose:** The Conciseness Assessment is designed to evaluate the brevity and succinctness of prompts provided to a Language Learning Model (LLM). A concise prompt strikes a balance between offering clear instructions and eliminating redundant or unnecessary information, ensuring that the LLM receives relevant input without being overwhelmed. **Test Mechanism:** Using an LLM, this test puts input prompts through a conciseness analysis where it's graded on a scale from 1 to 10. The grade reflects how well the prompt maintains clarity while avoiding verbosity. Prompts that achieve a grade equal to or surpassing a predefined threshold (default set to 7) are considered successful in being concise. This threshold can be adjusted based on specific requirements. **Why Conciseness Matters:** While detailed prompts can guide an LLM towards accurate results, excessive details can clutter the instruction and potentially lead to undesired outputs. Concise prompts are straightforward, reducing ambiguity and focusing the LLM's attention on the primary task. This is especially important considering there are limitations to the length of prompts that can be fed to an LLM. **Example:** For an LLM tasked with summarizing a document, a verbose prompt might introduce unnecessary constraints or biases. A concise, effective prompt like, ""Provide a brief summary highlighting the main points of the document"" ensures that the LLM captures the essence of the content without being sidetracked.",validmind.prompt_validation.Conciseness
ThresholdTest,Delimitation,"**Purpose:** The Delimitation Test ensures that prompts provided to the Language Learning Model (LLM) use delimiters correctly to distinctly mark sections of the input. Properly delimited prompts simplify the LLM's interpretation process, ensuring accurate and precise responses. **Test Mechanism:** Using an LLM, prompts are checked for their appropriate use of delimiters such as triple quotation marks, XML tags, and section titles. Each prompt receives a score from 1 to 10 based on its delimitation integrity. Prompts scoring at or above a set threshold (default is 7) pass the check. This threshold can be modified as needed. **Why Proper Delimitation Matters:** Delimiters play a crucial role in segmenting and organizing prompts, especially when diverse data or multiple tasks are involved. They help in clearly distinguishing between different parts of the input, reducing ambiguity for the LLM. As task complexity increases, the correct use of delimiters becomes even more critical to ensure the LLM understands the prompt's intent. **Example:** When given a prompt like: ```USER: Summarize the text delimited by triple quotes. '''insert text here'''``` or: ```USER: insert first article here insert second article here ``` The LLM can more accurately discern sections of the text to be treated differently, thanks to the clear delimitation.",validmind.prompt_validation.Delimitation
Metric,Model Metadata,"This section describes attributes of the selected model such as its modeling technique, training parameters, and task type. This helps understand the model's capabilities and limitations in the context of a modeling framework.",validmind.model_validation.ModelMetadata
Metric,Classifier Out Of Sample Performance,"This section shows the performance of the model on the test data. Popular metrics such as the accuracy, precision, recall, F1 score, etc. are used to evaluate the model.",validmind.model_validation.sklearn.ClassifierOutOfSamplePerformance
ThresholdTest,Robustness Diagnosis,"The robustness of a machine learning model refers to its ability to maintain performance in the face of perturbations or changes to the input data. One way to test the robustness of a model is by perturbing its input features and observing how the model's performance changes. To perturb the input features, one can add random noise or modify the values of the features within a certain range. By perturbing the input features, one can simulate different scenarios in which the input data may be corrupted or incomplete, and test whether the model is able to handle such scenarios. The performance of the model can be measured in terms of its accuracy, precision, recall, or any other relevant metric, both before and after perturbing the input features. A model that is robust to perturbations should maintain a high level of performance even after the input features have been perturbed.",validmind.model_validation.sklearn.RobustnessDiagnosis


In [2]:
load_test("classifier_performance")

validmind.tests.model_validation.sklearn.ClassifierPerformance.ClassifierPerformance

In [3]:
list_tests(task="text_summarization")

Test Type,Name,Description,ID
ThresholdTest,Bias,"**Purpose:** Bias Evaluation is aimed at assessing if and how the distribution and order of exemplars (examples) within a few-shot learning prompt affect the Language Learning Model's (LLM) output, potentially introducing biases. By examining these influences, we can optimize the model's performance and mitigate unintended biases in its responses. **Test Mechanism:** 1. **Distribution of Exemplars:** Check how varying the number of positive vs. negative examples in a prompt impacts the LLM's classification of a neutral or ambiguous statement. 2. **Order of Exemplars:** Examine if the sequence in which positive and negative examples are presented can sway the LLM's response. For each test case, an LLM is used to grade the input prompt on a scale from 1 to 10, based on whether the examples in the prompt may lead to biased responses. A minimum threshold must be met in order for the test to pass. By default, this threshold is set to 7, but it can be adjusted as needed via the test parameters.",validmind.prompt_validation.Bias
ThresholdTest,Clarity,"**Purpose:** The Clarity Evaluation is designed to assess whether prompts provided to a Language Learning Model (LLM) are unmistakably clear in their instructions. With clear prompts, the LLM is better suited to more accurately and effectively interpret and respond to instructions in the prompt **Test Mechanism:** Using an LLM, prompts are scrutinized for clarity, considering aspects like detail inclusion, persona adoption, step-by-step instructions, use of examples, and desired output length. Each prompt is graded on a scale from 1 to 10 based on its clarity. Prompts scoring at or above a predetermined threshold (default is 7) are marked as clear. This threshold can be adjusted via the test parameters. **Why Clarity Matters:** Clear prompts minimize the room for misinterpretation, allowing the LLM to generate more relevant and accurate responses. Ambiguous or vague instructions might leave the model guessing, leading to suboptimal outputs. **Tactics for Ensuring Clarity that will be referenced during evaluation:** 1. **Detail Inclusion:** Provide essential details or context to prevent the LLM from making assumptions. 2. **Adopt a Persona:** Use system messages to specify the desired persona for the LLM's responses. 3. **Specify Steps:** For certain tasks, delineate the required steps explicitly, helping the model in sequential understanding. 4. **Provide Examples:** While general instructions are efficient, in some scenarios, ""few-shot"" prompting or style examples can guide the LLM more effectively. 5. **Determine Output Length:** Define the targeted length of the response, whether in terms of paragraphs, bullet points, or other units. While word counts aren't always precise, specifying formats like paragraphs can offer more predictable results.",validmind.prompt_validation.Clarity
ThresholdTest,Specificity,"**Purpose:** The Specificity Test aims to assess the clarity, precision, and effectiveness of prompts provided to a Language Learning Model (LLM). Ensuring specificity in the prompts given to an LLM can significantly influence the accuracy and relevance of its outputs. The goal of this test is to ascertain that the instructions in a prompt are unmistakably clear and relevant, eliminating ambiguity and steering the LLM toward desired outcomes. **Test Mechanism:** Utilizing an LLM, each prompt is graded on a specificity scale ranging from 1 to 10. The grade reflects how well the prompt adheres to principles of clarity, detail, and relevancy without being overly verbose. Prompts that achieve a grade equal to or exceeding a predefined threshold (default set to 7) are deemed to pass the evaluation, while those falling below are marked as failing. This threshold can be adjusted as needed. **Why Specificity Matters:** Prompts that are detailed and descriptive often yield better and more accurate results from an LLM. Rather than relying on specific keywords or tokens, it's crucial to have a well-structured and descriptive prompt. Including relevant examples within the prompt can be particularly effective, guiding the LLM to produce outputs in desired formats. However, it's essential to strike a balance. While prompts need to be detailed, they shouldn't be overloaded with unnecessary information. The emphasis should always be on relevancy and conciseness, considering there are limitations to how long a prompt can be. **Example:** Imagine wanting an LLM to extract specific details from a given text. A vague prompt might yield varied results. However, with a prompt like, ""Extract the names of all characters and the cities they visited from the text"", the LLM is guided more precisely towards the desired information extraction.",validmind.prompt_validation.Specificity
ThresholdTest,Robustness,"**Purpose:** The Robustness Integrity Assessment evaluates the resilience and reliability of prompts provided to a Language Learning Model (LLM). The primary objective is to ensure that prompts consistently produce accurate and desired outputs, even in diverse or challenging scenarios. **Test Mechanism:** Prompts are subjected to various conditions, alterations, and contexts to check their stability in eliciting consistent responses from the LLM. Factors such as different phrasings, inclusion of potential distractors, and varied input complexities are introduced to test the robustness of the prompt. By default, the test generates 10 inputs for the prompt but this can be adjusted via the test parameters. **Why Robustness Matters:** A robust prompt ensures consistent performance and reduces the likelihood of unexpected or off-tangent outputs. This consistency is vital for applications where predictability and reliability of the LLM's response are paramount.",validmind.prompt_validation.Robustness
ThresholdTest,Negative Instruction,"**Purpose:** The Positive Instructional Assessment evaluates prompts provided to a Language Learning Model (LLM) to ensure they are framed using affirmative and proactive language. By focusing on what should be done rather than what should be avoided, prompts can guide the LLM more effectively towards generating appropriate and desired outputs. **Test Mechanism:** Employing an LLM as an evaluator, each prompt is meticulously analyzed and graded on use of positive instructions on a scale from 1 to 10. The grade indicates how well the prompt employs affirmative language while avoiding negative or prohibitive instructions. Prompts that achieve a grade equal to or exceeding a predetermined threshold (default set to 7) are recognized as adhering to positive instruction best practices. This threshold can be adjusted via the test parameters. **Why Positive Instructions Matter:** Prompts that are phrased in the affirmative, emphasizing what to do, tend to direct the LLM more clearly than those that focus on what not to do. Negative instructions can lead to ambiguities and undesired model responses. By emphasizing clarity and proactive guidance, we optimize the chances of obtaining relevant and targeted responses from the LLM. **Example:** Consider a scenario involving a chatbot designed to recommend movies. An instruction framed as, ""Don't recommend movies that are horror or thriller"" might cause the LLM to fixate on the genres mentioned, inadvertently producing undesired results. On the other hand, a positively-framed prompt like, ""Recommend family-friendly movies or romantic comedies"" provides clear guidance on the desired output.",validmind.prompt_validation.NegativeInstruction
ThresholdTest,Conciseness,"**Purpose:** The Conciseness Assessment is designed to evaluate the brevity and succinctness of prompts provided to a Language Learning Model (LLM). A concise prompt strikes a balance between offering clear instructions and eliminating redundant or unnecessary information, ensuring that the LLM receives relevant input without being overwhelmed. **Test Mechanism:** Using an LLM, this test puts input prompts through a conciseness analysis where it's graded on a scale from 1 to 10. The grade reflects how well the prompt maintains clarity while avoiding verbosity. Prompts that achieve a grade equal to or surpassing a predefined threshold (default set to 7) are considered successful in being concise. This threshold can be adjusted based on specific requirements. **Why Conciseness Matters:** While detailed prompts can guide an LLM towards accurate results, excessive details can clutter the instruction and potentially lead to undesired outputs. Concise prompts are straightforward, reducing ambiguity and focusing the LLM's attention on the primary task. This is especially important considering there are limitations to the length of prompts that can be fed to an LLM. **Example:** For an LLM tasked with summarizing a document, a verbose prompt might introduce unnecessary constraints or biases. A concise, effective prompt like, ""Provide a brief summary highlighting the main points of the document"" ensures that the LLM captures the essence of the content without being sidetracked.",validmind.prompt_validation.Conciseness
ThresholdTest,Delimitation,"**Purpose:** The Delimitation Test ensures that prompts provided to the Language Learning Model (LLM) use delimiters correctly to distinctly mark sections of the input. Properly delimited prompts simplify the LLM's interpretation process, ensuring accurate and precise responses. **Test Mechanism:** Using an LLM, prompts are checked for their appropriate use of delimiters such as triple quotation marks, XML tags, and section titles. Each prompt receives a score from 1 to 10 based on its delimitation integrity. Prompts scoring at or above a set threshold (default is 7) pass the check. This threshold can be modified as needed. **Why Proper Delimitation Matters:** Delimiters play a crucial role in segmenting and organizing prompts, especially when diverse data or multiple tasks are involved. They help in clearly distinguishing between different parts of the input, reducing ambiguity for the LLM. As task complexity increases, the correct use of delimiters becomes even more critical to ensure the LLM understands the prompt's intent. **Example:** When given a prompt like: ```USER: Summarize the text delimited by triple quotes. '''insert text here'''``` or: ```USER: insert first article here insert second article here ``` The LLM can more accurately discern sections of the text to be treated differently, thanks to the clear delimitation.",validmind.prompt_validation.Delimitation
Metric,Model Metadata,"This section describes attributes of the selected model such as its modeling technique, training parameters, and task type. This helps understand the model's capabilities and limitations in the context of a modeling framework.",validmind.model_validation.ModelMetadata
Metric,Dataset Description,Collects a set of descriptive statistics for a dataset,validmind.data_validation.DatasetDescription
Metric,Dataset Split,"This section shows the size of the dataset split into training, test (and validation) sets where applicable. The size of each dataset is shown in absolute terms and as a proportion of the total dataset size. The dataset split is important to understand because it can affect the performance of the model. For example, if the training set is too small, the model may not be able to learn the patterns in the data and will perform poorly on the test set. On the other hand, if the test set is too small, the model may not be able to generalize well to unseen data and will perform poorly on the validation set.",validmind.data_validation.DatasetSplit


In [4]:
list_tests(filter="timeseries")

Test Type,Name,Description,ID
Metric,Zivot Andrews Arch,Zivot-Andrews unit root test for establishing the order of integration of time series,validmind.model_validation.statsmodels.ZivotAndrewsArch
Metric,Rolling Stats Plot,"This class provides a metric to visualize the stationarity of a given time series dataset by plotting the rolling mean and rolling standard deviation. The rolling mean represents the average of the time series data over a fixed-size sliding window, which helps in identifying trends in the data. The rolling standard deviation measures the variability of the data within the sliding window, showing any changes in volatility over time. By analyzing these plots, users can gain insights into the stationarity of the time series data and determine if any transformations or differencing operations are required before applying time series models.",validmind.data_validation.RollingStatsPlot
ThresholdTest,Time Series Outliers,Test that find outliers for time series data using the z-score method,validmind.data_validation.TimeSeriesOutliers
Metric,Engle Granger Coint,Test for cointegration between pairs of time series variables in a given dataset using the Engle-Granger test.,validmind.data_validation.EngleGrangerCoint
Metric,Seasonal Decompose,Calculates seasonal_decompose metric for each of the dataset features,validmind.data_validation.SeasonalDecompose
Metric,Time Series Histogram,Generates a visual analysis of time series data by plotting the histogram. The input dataset can have multiple time series if necessary. In this case we produce a separate plot for each time series.,validmind.data_validation.TimeSeriesHistogram
ThresholdTest,Time Series Frequency,Test that detects frequencies in the data,validmind.data_validation.TimeSeriesFrequency
Metric,AC Fand PACF Plot,Plots ACF and PACF for a given time series dataset.,validmind.data_validation.ACFandPACFPlot
Metric,Auto Stationarity,Automatically detects stationarity for each time series in a DataFrame using the Augmented Dickey-Fuller (ADF) test.,validmind.data_validation.AutoStationarity
DatasetMetadata,Dataset Metadata,Custom class to collect a set of descriptive statistics for a dataset. This class will log dataset metadata via `log_dataset` instead of a metric. Dataset metadata is necessary to initialize dataset object that can be related to different metrics and test results,validmind.data_validation.DatasetMetadata


In [17]:
list_tests(filter="regression", tags=["time_series_data"])

Test Type,Name,Description,ID
Metric,Zivot Andrews Arch,Zivot-Andrews unit root test for establishing the order of integration of time series,validmind.model_validation.statsmodels.ZivotAndrewsArch
Metric,Rolling Stats Plot,"This class provides a metric to visualize the stationarity of a given time series dataset by plotting the rolling mean and rolling standard deviation. The rolling mean represents the average of the time series data over a fixed-size sliding window, which helps in identifying trends in the data. The rolling standard deviation measures the variability of the data within the sliding window, showing any changes in volatility over time. By analyzing these plots, users can gain insights into the stationarity of the time series data and determine if any transformations or differencing operations are required before applying time series models.",validmind.data_validation.RollingStatsPlot
ThresholdTest,Time Series Outliers,Test that find outliers for time series data using the z-score method,validmind.data_validation.TimeSeriesOutliers
Metric,Engle Granger Coint,Test for cointegration between pairs of time series variables in a given dataset using the Engle-Granger test.,validmind.data_validation.EngleGrangerCoint
Metric,Seasonal Decompose,Calculates seasonal_decompose metric for each of the dataset features,validmind.data_validation.SeasonalDecompose
Metric,Time Series Histogram,Generates a visual analysis of time series data by plotting the histogram. The input dataset can have multiple time series if necessary. In this case we produce a separate plot for each time series.,validmind.data_validation.TimeSeriesHistogram
ThresholdTest,Time Series Frequency,Test that detects frequencies in the data,validmind.data_validation.TimeSeriesFrequency
Metric,AC Fand PACF Plot,Plots ACF and PACF for a given time series dataset.,validmind.data_validation.ACFandPACFPlot
Metric,Auto Stationarity,Automatically detects stationarity for each time series in a DataFrame using the Augmented Dickey-Fuller (ADF) test.,validmind.data_validation.AutoStationarity
DatasetMetadata,Dataset Metadata,Custom class to collect a set of descriptive statistics for a dataset. This class will log dataset metadata via `log_dataset` instead of a metric. Dataset metadata is necessary to initialize dataset object that can be related to different metrics and test results,validmind.data_validation.DatasetMetadata


In [5]:
list_tests(pretty=False)

['validmind.prompt_validation.Bias',
 'validmind.prompt_validation.Clarity',
 'validmind.prompt_validation.Specificity',
 'validmind.prompt_validation.Robustness',
 'validmind.prompt_validation.NegativeInstruction',
 'validmind.prompt_validation.Conciseness',
 'validmind.prompt_validation.Delimitation',
 'validmind.model_validation.ModelMetadata',
 'validmind.model_validation.sklearn.ClassifierOutOfSamplePerformance',
 'validmind.model_validation.sklearn.RobustnessDiagnosis',
 'validmind.model_validation.sklearn.SHAPGlobalImportance',
 'validmind.model_validation.sklearn.ConfusionMatrix',
 'validmind.model_validation.sklearn.ClassifierInSamplePerformance',
 'validmind.model_validation.sklearn.OverfitDiagnosis',
 'validmind.model_validation.sklearn.PermutationFeatureImportance',
 'validmind.model_validation.sklearn.MinimumROCAUCScore',
 'validmind.model_validation.sklearn.PrecisionRecallCurve',
 'validmind.model_validation.sklearn.ClassifierPerformance',
 'validmind.model_validation.skl

In [6]:
describe_test("validmind.model_validation.ModelMetadata")

Unnamed: 0,Unnamed: 1
ID:,validmind.model_validation.ModelMetadata
Name:,Model Metadata
Description:,"This section describes attributes of the selected model such as its modeling technique, training parameters, and task type. This helps understand the model's capabilities and limitations in the context of a modeling framework."
Test Type:,Metric
Required Inputs:,['model']
Params:,{}


In [7]:
describe_test("ModelMetadata")

Unnamed: 0,Unnamed: 1
ID:,validmind.model_validation.ModelMetadata
Name:,Model Metadata
Description:,"This section describes attributes of the selected model such as its modeling technique, training parameters, and task type. This helps understand the model's capabilities and limitations in the context of a modeling framework."
Test Type:,Metric
Required Inputs:,['model']
Params:,{}


In [8]:
test = load_test("validmind.model_validation.ModelMetadata")
print(test)
print(f"Test name is: {test.name}")
print(test.__doc__)
print(test.required_inputs)

<class 'validmind.tests.model_validation.ModelMetadata.ModelMetadata'>
Test name is: model_metadata

    Custom class to collect the following metadata for a model:
    - Model architecture
    
['model']


In [9]:
load_test("validmind.model_validation.sklearn.ConfusionMatrix")

validmind.tests.model_validation.sklearn.ConfusionMatrix.ConfusionMatrix

In [10]:
load_test("validmind.data_validation.ClassImbalance")

validmind.tests.data_validation.ClassImbalance.ClassImbalance

In [11]:
for test in list_tests(pretty=False, filter="validmind.model_validation"):
    print(load_test(test))

<class 'validmind.tests.model_validation.statsmodels.ZivotAndrewsArch.ZivotAndrewsArch'>
<class 'validmind.tests.model_validation.statsmodels.RegressionModelForecastPlot.RegressionModelForecastPlot'>
<class 'validmind.tests.model_validation.statsmodels.RegressionModelOutsampleComparison.RegressionModelOutsampleComparison'>
<class 'validmind.tests.model_validation.statsmodels.KolmogorovSmirnov.KolmogorovSmirnov'>
<class 'validmind.tests.model_validation.statsmodels.ScorecardProbabilitiesHistogram.ScorecardProbabilitiesHistogram'>
<class 'validmind.tests.model_validation.statsmodels.RegressionModelsPerformance.RegressionModelsPerformance'>
<class 'validmind.tests.model_validation.sklearn.SHAPGlobalImportance.SHAPGlobalImportance'>
<class 'validmind.tests.model_validation.statsmodels.ScorecardBucketHistogram.ScorecardBucketHistogram'>
<class 'validmind.tests.model_validation.sklearn.ClassifierInSamplePerformance.ClassifierInSamplePerformance'>
<class 'validmind.tests.model_validation.Mode

In [12]:
from validmind.tests.data_validation.ClassImbalance import ClassImbalance
ClassImbalance

validmind.tests.data_validation.ClassImbalance.ClassImbalance

In [13]:
from validmind.tests.model_validation.sklearn.ConfusionMatrix import ConfusionMatrix
ConfusionMatrix

validmind.tests.model_validation.sklearn.ConfusionMatrix.ConfusionMatrix