>### 🚩 *Create a free WhyLabs account to get more value out of whylogs!*<br> 
>*Did you know you can store, visualize, and monitor whylogs profiles with the [WhyLabs Observability Platform](https://whylabs.ai/whylogs-free-signup?utm_source=whylogs-Github&utm_medium=whylogs-example&utm_campaign=Constraints_Suite)? Sign up for a [free WhyLabs account](https://whylabs.ai/whylogs-free-signup?utm_source=whylogs-Github&utm_medium=whylogs-example&utm_campaign=Constraints_Suite) to leverage the power of whylogs and WhyLabs together!*

# Simple Constraints - Examples and Usage

[![Open in Colab](https://colab.research.google.com/assets/colab-badge.svg)](https://colab.research.google.com/github/whylabs/whylogs/blob/mainline/python/examples/advanced/Constraints_Suite.ipynb)

> This is a `whylogs v1` example. For the analog feature in `v0`, please refer to [this example](https://github.com/whylabs/whylogs/blob/maintenance/0.7.x/examples/Constraints_Suite.ipynb)

In this example, we'll show how to define a number of simple constraints and examples on how to use them. For the basics on how to build your own set of constraints, see the example - [Data Validation with Metric Constraints](https://whylogs.readthedocs.io/en/stable/examples/advanced/Metric_Constraints.html).

The constraints are listed according to the metric namespace used when defining them. For each category, we will create helper functions for simple and popular constraints. Each helper function has a brief explanation in its docstring. After defining the helper functions, we'll show a simple example on how to build the constraints out of the functions and visualize them as a report with the visualization module.

> Note: The constraints shown here are still experimental and subject to further changes. Stay tuned for upgrades!

## Table of Contents

- [Installing and Importing Modules](#pre)
- [Distribution Metrics Constraints](#distribution)
    - greater_than_number
    - lower_than_number
    - mean_between_range
    - stddev_between_range
    - quantile_between_range
- [Frequent Items/Frequent Strings Metrics Constraints](#frequent)
    - frequent_strings_in_reference_set
    - n_most_common_items_in_set
- [Counters Constraints](#counts)
    - total_value_below_number
    - null_values_below_number
- [Cardinality Constraints](#card)
    - distinct_number_in_range

## Installing whylogs and importing modules <a class="anchor" id="pre"></a>

If you haven't already, install whylogs:

In [1]:
%pip install 'whylogs[viz]'

Then, let's import the helper functions needed to define the constraints:

In [1]:
from whylogs.core.constraints import ConstraintsBuilder
from whylogs.core.constraints.factories import (
    greater_than_number,
    mean_between_range,
    smaller_than_number,
    stddev_between_range,
    quantile_between_range
)

### Examples - Distribution Metrics Constraints

In [2]:
import whylogs as why
import pandas as pd
data = {
    "animal": ["cat", "hawk", "snake", "cat", "mosquito"],
    "legs": [4, 2, 0, 4, 6],
    "weight": [4.3, 1.8, 1.3, 4.1, 5.5e-6],
}

results = why.log(pd.DataFrame(data))
profile_view = results.view()

In [3]:
builder = ConstraintsBuilder(dataset_profile_view=profile_view)
builder.add_constraint(greater_than_number(column_name="weight", number=0.14))
builder.add_constraint(mean_between_range(column_name="weight", lower=2, upper=3))
builder.add_constraint(smaller_than_number(column_name="weight", number=20.5))
builder.add_constraint(stddev_between_range(column_name="weight", lower=1, upper=3))
builder.add_constraint(quantile_between_range(column_name="weight", quantile=0.5, lower=1.5, upper=2.0))

# animal has missing distribution metrics. this will pass if skip_missing = True and fail otherwise.
builder.add_constraint(
    quantile_between_range(
        column_name="animal", 
        quantile=0.5, 
        lower=1.5, 
        upper=2.0, 
        skip_missing=False
    )
)

constraints = builder.build()

from whylogs.viz import NotebookProfileVisualizer

visualization = NotebookProfileVisualizer()
visualization.constraints_report(constraints, cell_height=300)

## Frequent Items/Frequent Strings Constraints <a class="anchor" id="frequent"></a>

In [4]:
from whylogs.core.constraints.factories import n_most_common_items_in_set, frequent_strings_in_reference_set

### Examples - Frequent Items/Frequent Strings Constraints

In [5]:
import whylogs as why
import pandas as pd
data = {
    "animal": ["cat", "snake", "snake", "cat", "mosquito"],
    "legs": [4, 2, 0, 4, 6],
    "weight": [4.3, 1.8, 1.3, 4.1, 5.5e-6],
}

results = why.log(pd.DataFrame(data))
profile_view = results.view()

In [6]:
builder = ConstraintsBuilder(dataset_profile_view=profile_view)
reference_set = {"cat","snake"}
builder.add_constraint(frequent_strings_in_reference_set(column_name="animal", reference_set=reference_set))
builder.add_constraint(n_most_common_items_in_set(column_name="animal",n=2,reference_set=reference_set))
constraints = builder.build()

from whylogs.viz import NotebookProfileVisualizer

visualization = NotebookProfileVisualizer()
visualization.constraints_report(constraints, cell_height=300)

## Counters Constraints <a class="anchor" id="counts"></a>

In [7]:
from whylogs.core.constraints.factories import count_below_number, null_percentage_below_number

### Examples - Counters Constraints

In [8]:
import whylogs as why
import pandas as pd
data = {
    "animal": ["cat", "snake", "snake", "cat", "mosquito"],
    "legs": [4, 2, 0, None, 6],
    "weight": [4.3, 1.8, 1.3, 4.1, 5.5e-6],
}

results = why.log(pd.DataFrame(data))
profile_view = results.view()

In [9]:
builder = ConstraintsBuilder(dataset_profile_view=profile_view)
builder.add_constraint(count_below_number(column_name="legs", number=10))
builder.add_constraint(null_percentage_below_number(column_name="legs", number=0.05))

constraints = builder.build()

from whylogs.viz import NotebookProfileVisualizer

visualization = NotebookProfileVisualizer()
visualization.constraints_report(constraints, cell_height=300)

## Cardinality Constraints <a class="anchor" id="card"></a>

In [10]:
from whylogs.core.constraints.factories import distinct_number_in_range

### Examples - Cardinality Constraints

In [11]:
import whylogs as why
import pandas as pd
data = {
    "animal": ["cat", "snake", "snake", "cat", "mosquito"],
    "legs": [4, 2, 0, None, 6],
    "weight": [4.3, 1.8, 1.3, 4.1, 5.5e-6],
}

results = why.log(pd.DataFrame(data))
profile_view = results.view()

In [12]:
builder = ConstraintsBuilder(dataset_profile_view=profile_view)
builder.add_constraint(distinct_number_in_range(column_name = "animal", lower = 3, upper = 6))

constraints = builder.build()

from whylogs.viz import NotebookProfileVisualizer

visualization = NotebookProfileVisualizer()
visualization.constraints_report(constraints, cell_height=300)

## Types Metrics

### Examples - Types Metrics

In [13]:
import whylogs as why
import pandas as pd

data = {
    "animal": ["cat", "snake", "snake", "cat", "mosquito"],
    "legs": [4, 2, 0, None, 6],
    "weight": [4.3, 1.8, 1.3, 4.1, 5.5e-6],
    "flies": [False, False, "False", False, True],
    "obj": [{"a":1}, None, {"a":1}, {"a":1}, {"a":1}]
}
df = pd.DataFrame(data)
results = why.log(df)
profile_view = results.view()

#### Check Nullable Types

In [14]:

from whylogs.core.constraints.factories import ( 
    column_is_nullable_integral,
    column_is_nullable_boolean, 
    column_is_nullable_fractional,
    column_is_nullable_object,
    column_is_nullable_string,
)
from whylogs.core.constraints import ConstraintsBuilder


builder = ConstraintsBuilder(dataset_profile_view=profile_view)
builder.add_constraint(column_is_nullable_string(column_name="animal"))
builder.add_constraint(column_is_nullable_integral(column_name="legs"))
builder.add_constraint(column_is_nullable_fractional(column_name="weight"))
builder.add_constraint(column_is_nullable_boolean(column_name="flies"))
builder.add_constraint(column_is_nullable_object(column_name="obj"))

constraints = builder.build()

from whylogs.viz import NotebookProfileVisualizer

visualization = NotebookProfileVisualizer()
visualization.constraints_report(constraints, cell_height=300)

The constraints above will pass if all values are of a given type. Null values are accepted.

Note that for `legs`, the constraints failed. That is because whylogs leverages __pandas' dtypes__ when it is available, and when a `None` is present, the column is considered to be `fractional`, even though the remaining values were originally integers. 

## Combined Metrics

### Examples - Combined Metrics

In [15]:
import whylogs as why
import pandas as pd

data = {
    "animal": ["cat", "snake", "snake", "cat", "mosquito"],
    "legs": [4, 2, 0, None, 6],
    "weight": [4.3, 1.8, 1.3, 4.1, 5.5e-6],
    "flies": [False, False, "False", False, True],
    "obj": [{"a":1}, None, {"a":1}, {"a":1}, {"a":1}]
}
df = pd.DataFrame(data)
results = why.log(df)
profile_view = results.view()

#### Check Non-nullable Types

In [16]:
from whylogs.core.constraints.factories import ( 
    column_is_nullable_integral,
    column_is_nullable_boolean, 
    column_is_nullable_fractional,
    column_is_nullable_object,
    column_is_nullable_string,
    null_values_below_number,
)
from whylogs.core.constraints import ConstraintsBuilder


builder = ConstraintsBuilder(dataset_profile_view=profile_view)
builder.add_constraint(column_is_nullable_string(column_name="animal"))
builder.add_constraint(null_values_below_number(column_name="animal",number=1))

# The combination of these metrics makes a check of non-nullable integral
builder.add_constraint(column_is_nullable_integral(column_name="legs"))
builder.add_constraint(null_values_below_number(column_name="legs",number=1))

# The combination of these metrics makes a check of non-nullable fractional
builder.add_constraint(column_is_nullable_fractional(column_name="weight"))
builder.add_constraint(null_values_below_number(column_name="weight",number=1))

# The combination of these metrics makes a check of non-nullable boolean
builder.add_constraint(column_is_nullable_boolean(column_name="flies"))
builder.add_constraint(null_values_below_number(column_name="flies",number=1))

# The combination of these metrics makes a check of non-nullable object
builder.add_constraint(column_is_nullable_object(column_name="obj"))
builder.add_constraint(null_values_below_number(column_name="obj",number=1))

constraints = builder.build()

from whylogs.viz import NotebookProfileVisualizer

visualization = NotebookProfileVisualizer()
visualization.constraints_report(constraints, cell_height=300)

To create a constraint that checks for a non-nullable type, we combine two separate constraints:

- `column is nullable datatype`
- `null values below 1`