# Table of Contents

In this notebook, you'll learn how to:

- Create customized constraints that chan operate on invidiual column values or on overall feature summaries
- Auto-generate constraints based on the profile's statistics

# Creating Customized Constraints

> This notebook shows how you can create customized constraints for specific needs. There is a large number of common constraints that can be seen in the [Constraints Suite](Constraints_Suite.ipynb) example. Please refer to the list of existing constraints before going through the work of creating your own set of constraints.

Constraints test whether values in the data are consistent with expectations.

For the moment, constraints apply only to numeric (fraction) features.  Constraints on string features may be applied to numeric statistics such as number_unique or maximum_length.


There are two types of constraints - ValueConstraint and SummaryConstraint. 
- ValueConstraints are applied as values are processed through whylogs.
- SummaryConstraints are applied to feature summaries.



In [2]:
from whylogs.core.statistics.constraints import SummaryConstraint, ValueConstraint, Op 
from whylogs.util.protobuf import message_to_json

v = ValueConstraint(Op.LT, 3.6)
print(message_to_json(v.to_protobuf()))

# constraints may have an optional name
s = SummaryConstraint('min', Op.LT, 300000, name='< 30K')
print(message_to_json(s.to_protobuf()))



{
  "name": "value LT 3.6",
  "value": 3.6,
  "op": "LT",
  "verbose": false
}
{
  "name": "< 30K",
  "field": "min",
  "value": 300000.0,
  "op": "LT",
  "verbose": false
}


Constraints are internallyconverted to python lambda functions that are faster to evaluate.

In [3]:
print(s.func)

<function <lambda>.<locals>.<lambda> at 0x7f9486b97f70>


Constraints may be applied across multiple features, and a single feature may have multiple constraints.

The `verbose` option will log every failure of a constraint.

In [6]:
from whylogs.core.statistics.constraints import DatasetConstraints

conforming_loan = ValueConstraint(Op.LT, 548250)
smallest_loan = ValueConstraint(Op.GT, 2500.0, verbose=True)

high_fico = ValueConstraint(Op.GT, 4000)

non_negative = SummaryConstraint('min', Op.GE, 0)

dc = DatasetConstraints(None, value_constraints={'loan_amnt':[conforming_loan, smallest_loan], 'fico_range_high':[high_fico]},
                              summary_constraints={'annual_inc':[non_negative]})

with open("constraints.json", "w") as f:
     f.write(dc.to_json())
     print(dc.to_json())

with open("constraints.json", "r") as f:
    data = f.read()
    dc = DatasetConstraints.from_json(data)


{
  "valueConstraints": {
    "fico_range_high": {
      "constraints": [
        {
          "name": "value GT 4000",
          "value": 4000.0,
          "op": "GT",
          "verbose": false
        }
      ]
    },
    "loan_amnt": {
      "constraints": [
        {
          "name": "value LT 548250",
          "value": 548250.0,
          "op": "LT",
          "verbose": false
        },
        {
          "name": "value GT 2500.0",
          "value": 2500.0,
          "op": "GT",
          "verbose": true
        }
      ]
    }
  },
  "summaryConstraints": {
    "annual_inc": {
      "constraints": [
        {
          "name": "summary min GE 0/None",
          "field": "min",
          "value": 0.0,
          "op": "GE",
          "verbose": false
        }
      ]
    }
  }
}


The assembled DatasetConstraints may be applied to a whylogs logging session.

In [7]:
import os.path
import pandas as pd
import numpy as np
from whylogs import get_or_create_session
from whylogs.logs import display_logging

# turn on logging to show verbose constraints.
display_logging('info')

session = get_or_create_session()
data_file = "data/lending_club_1000.csv"
data = pd.read_csv(os.path.join(data_file))
profile = session.log_dataframe(data, 'test.data', constraints=dc)


WARN: Missing config


`DatasetConstraints.report()` API can report failures as a list of tuples.

In [8]:
from tabulate import tabulate

def indent(txt, spaces=4):
    return "\n".join(" " * spaces + ln for ln in txt.splitlines())

def format_report(r):
    # report failures in tabular form
    print("Constraint failures by feature - ")
    for c,r in r:
        print(f"{c}:")
        print(indent(tabulate(r, tablefmt="plain", headers=['test_name', 'total_run', 'failed'])))
        
format_report(dc.report())


Constraint failures by feature - 
loan_amnt:
    test_name          total_run    failed
    value LT 548250         1000         2
    value GT 2500.0         1000        20
fico_range_high:
    test_name        total_run    failed
    value GT 4000         1000      1000
annual_inc:
    test_name                total_run    failed
    summary min GE 0/None            0         0


So far we have only seen the value constraints applied during WhyLogs processing.

What about the Summary constraint defined above?

The `apply_summary_constraints()` API will apply summary constraints to a existing profile.  Without an argument, this applies constraints that were supplied when the profile was created.  Use the `constraints` to pass in your own map of summary constraints that should be applied.

In [8]:
print("Apply existing constraints to summary:")
r = profile.apply_summary_constraints()
format_report(r)

print("\n\n")
print("Apply new constraints to summary:")
non_negative = SummaryConstraint('min', Op.LT, 0)
r = profile.apply_summary_constraints(summary_constraints={'funded_amnt':[non_negative]})
format_report(r)

Apply existing constraints to summary:
Constraint failures by feature - 
annual_inc:
    test_name                total_run    failed
    summary min GE 0/None            1         0



Apply new constraints to summary:
Constraint failures by feature - 
funded_amnt:
    test_name                total_run    failed
    summary min LT 0/None            1         1


Some simple summary constraints can be automatically generated from a whylogs profiles.
By definition, generated constraints are valid for the profile used to generate them.


In [9]:
auto_constraints = profile.generate_constraints() # returns DatasetConstraint
print(message_to_json(auto_constraints.to_protobuf()))

{
  "properties": {
    "schemaMajorVersion": 1,
    "schemaMinorVersion": 1,
    "sessionId": "eb4e4d8b-8a52-477d-8f73-2b52aa51de92",
    "sessionTimestamp": "1613231839754",
    "tags": {
      "name": "test.data"
    },
    "dataTimestamp": "0",
    "metadata": {}
  },
  "summaryConstraints": {
    "dti_joint": {
      "constraints": [
        {
          "name": "summary min GT 0/None",
          "field": "min",
          "value": 0.0,
          "op": "GT"
        }
      ]
    },
    "funded_amnt_inv": {
      "constraints": [
        {
          "name": "summary min GT 0/None",
          "field": "min",
          "value": 0.0,
          "op": "GT"
        }
      ]
    },
    "num_rev_accts": {
      "constraints": [
        {
          "name": "summary min GT 0/None",
          "field": "min",
          "value": 0.0,
          "op": "GT"
        }
      ]
    },
    "settlement_term": {
      "constraints": [
        {
          "name": "summary min GT 0/None",
          "field"