# Smart Config demo

This demo shows how easy it is to create a set of monitors using the WhyLabs smart config library (a proposed extension to the whylabs_toolkit).

You can use the library to set up a monitored resource with recommended monitors even before you have data. If you have a reference profile, the recommendations will be tailored based on that profile.

## Install requirements

In [7]:
%pip install whylabs_toolkit pandas requests


Note: you may need to restart the kernel to use updated packages.


## Setup whylabs API connection

First, set up the information to connect to WhyLabs. Update the demo_org_id and demo_api_key in the following before running it.

The specified organization needs to exist, but the resource (model or dataset) doesn't have to.


In [8]:
import pandas as pd
import whylogs as why
from smart_config.config import env_setup

# demo_org_id = 'org-JR37ks'
# demo_dataset_id = 'hack-25'
# demo_api_key = 'HimBAeF4bp.YPMeRgc0xojeX4fWYEomP76MQpBRmpYCDeTSXBMP3dcCBRnidEnDm'
# demo_endpoint = 'https://songbird.development.whylabsdev.com'

demo_org_id = 'org-Ae8Sen'
demo_dataset_id = 'hack-1'
demo_api_key = 'Hsk5Q4FVCa.texoW2huaWOQm4ktvGNjGA2Y216C7I5stA2CXwYAsABchNsNIMnui'
demo_endpoint = None # upload to prod

if not demo_api_key:
    raise Exception('Please provide an API key')

env_setup(
    org_id=demo_org_id,
    dataset_id=demo_dataset_id,
    api_key=demo_api_key,
    whylabs_endpoint=demo_endpoint
)

## Setup the monitored resource

Now let's set up our monitored resource. The `get_or_create_resource` helper makes this trivial.

In [9]:
from smart_config.resource import get_or_create_resource

resource = get_or_create_resource(demo_org_id, demo_dataset_id, f'Hackathon model {demo_dataset_id}')
resource

{'active': True,
 'creation_time': 1689603596259,
 'id': 'hack-1',
 'model_category': 'MODEL',
 'name': 'Hackathon model hack-1',
 'org_id': 'org-Ae8Sen',
 'time_period': 'P1D'}

## Log and upload a reference profile

Next we get the demo reference dataset, profile and upload it.

## Update entity schema

Let's check the inferred schema for the dataset. We may need to wait for the upload to be processed.

In [10]:
from smart_config.resource import wait_for_nonempty_schema, set_outputs, set_data_type
from whylabs_toolkit.monitor import ColumnDataType

# schema = wait_for_nonempty_schema(demo_org_id, demo_dataset_id)
# schema

Our dataset has some output fields that don't use the naming convention, and some fields with the wrong inferred type. Let's fix that.

In [11]:
schema = set_outputs(demo_org_id, demo_dataset_id, ['predicted', 'income'])
schema = set_data_type(demo_org_id, demo_dataset_id, ColumnDataType.fractional, ['capital-gain', 'capital-loss'])
schema

{'columns': {},
 'metadata': {'author': 'system',
              'updated_timestamp': 1689603648383,
              'version': 6},
 'metrics': {}}

In [12]:
from smart_config.ref_profile import check_create_ref_profile
from smart_config.ref_profile import get_ref_profile_by_name

ref_dataset_name = 'adult-train'
ref_data = './data_randomized_drift_missing/adult/adult_reference_dataset.csv'

ref_df = pd.read_csv(ref_data)
demo_ref_profile = why.log(ref_df)

check_create_ref_profile(demo_org_id, demo_dataset_id, ref_dataset_name, demo_ref_profile)

INFO:whylogs.api.writer.whylabs:checking: https://whylabs-public.s3.us-west-2.amazonaws.com/whylogs_config/whylabs_condition_count_disabled
INFO:whylogs.api.writer.whylabs:checking: https://whylabs-public.s3.us-west-2.amazonaws.com/whylogs_config/whylabs_condition_count_disabled
INFO:whylogs.api.writer.whylabs:headers are: {'x-amz-id-2': 'D0t6UI5MoUHyBVpDFO42jSrI+iAEUs/fwO3vm4yIuvqnpZvwHlXuiVxQdY+M4gSROIUwTYl1GsI=', 'x-amz-request-id': 'N3TBRND9R276VN14', 'Date': 'Mon, 17 Jul 2023 14:20:51 GMT', 'Last-Modified': 'Thu, 23 Feb 2023 17:53:29 GMT', 'ETag': '"c4ca4238a0b923820dcc509a6f75849b"', 'x-amz-server-side-encryption': 'AES256', 'Accept-Ranges': 'bytes', 'Content-Type': 'binary/octet-stream', 'Server': 'AmazonS3', 'Content-Length': '1'} code: 200
INFO:whylogs.api.writer.whylabs:found the whylabs condition count disabled file so running uncompound on condition count metrics
INFO:whylogs.api.writer.whylabs:Done uploading org-Ae8Sen/hack-1/1689603649163 to https://api.whylabsapp.com wit

## Set up recommended monitors

We're ready to set up the recommended monitors. We need to get the unique ID for the uploaded reference profile, so we use `get_ref_profile_by_name` to find the profile metadata.

In [13]:
ref_profile_metadata = get_ref_profile_by_name(demo_org_id, demo_dataset_id, ref_dataset_name)
ref_profile_id = ref_profile_metadata.id
ref_profile_id

'ref-jmHJ6e4fgUIde1FL'

Then we pass the reference profile into `setup_recommended_monitors`. This will look at the profile and generate tailored monitors.

In [None]:
from smart_config.recommenders.whylabs_recommender import setup_recommended_monitors

setup_recommended_monitors(org_id=demo_org_id, dataset_id=demo_dataset_id, ref_profile=demo_ref_profile, ref_profile_id=ref_profile_id)


INFO:whylabs_toolkit.monitor.manager.monitor_setup:Did not find a monitor with unique-count-no-more-than-ref, creating a new one.
INFO:whylabs_toolkit.monitor.manager.monitor_setup:Did not find a monitor with missing-value-ratio-stddev, creating a new one.


## Upload batch data

To see the monitors in action, we need some batch data. We'll upload 14 days of prepared demo data.

In [None]:
from datetime import datetime, timedelta
current = datetime.now()
ts = current - timedelta(days=13)

for num in range(0, 14):
    df = pd.read_csv(f'./data_randomized_drift_missing/adult/adult_monitored_dataset_{num:0>2}.csv')
    results = why.log(df)
    results.profile().set_dataset_timestamp(ts)
    results.writer('whylabs').write()
    ts = ts + timedelta(days=1)

In [None]:
from IPython.display import display, HTML
endpoint = 'https://observatory.development.whylabsdev.com' if demo_endpoint else 'https://hub.whylabsapp.com'
display(HTML(f"""Go to the WhyLabs feature view <a target="_blank" href="{endpoint}/resources/{demo_dataset_id}/columns/gender?targetOrgId={demo_org_id}">WhyLabs feature view</a> and click Preview to see the results.""" ))

display(HTML(f"""Investigate the differences between the batch profiles and reference profile in the  <a target="_blank" href="{endpoint}/resources/{demo_dataset_id}/profiles?targetOrgId={demo_org_id}">WhyLabs profile view</a>"""))

## Further recommendation scenarios

All of these are **to be implemented**.

### Using semantic type information
The recommendations make use of additional information about the semantic type of the columns (e.g., Age, Gender, Country)
```
type_hints = get_semantic_types(ref_profile)
setup_recommended_monitors(
    org_id=demo_org_id,
    dataset_id=demo_dataset_id,
    ref_profile=demo_ref_profile,
    ref_profile_id=ref_profile_id,
    type_hints=type_hints)
```
The same information might be used to recommend condition counts
```
conditions = recommend_condition_counts(type_hints)
schema = DeclarativeSchema(STANDARD_RESOLVER)
for c in conditions:
    schema.add_resolver_spec(column_name=c.column, metrics=[ConditionCountMetricSpec(c.condition)])
why.log(batch_df, schema=schema)
```

The same information might be used to recommend segments for fairness and bias
```
segments = recommend_segments(type_hints)
segmented_ref_profile = why.log(ref_df, schema=DatasetSchema(segments=segments))
```

The type hints can be stored as part of the dataset metadata, and retrieved when needed
```
# TODO
```

### Monitor restrictions

Restrict what recommendations are generated, by specifying which categories or metrics they want
monitored.

```
setup_recommended_monitors(
    org_id=demo_org_id,
    dataset_id=demo_dataset_id,
    ref_profile=demo_ref_profile,
    ref_profile_id=ref_profile_id,
    categories=[MetricCategory.DataQuality]
```

### Column restrictions

Limit which columns are included in the recommended monitors, for example based on feature weights.

```
setup_recommended_monitors(
    org_id=demo_org_id,
    dataset_id=demo_dataset_id,
    ref_profile=demo_ref_profile,
    ref_profile_id=ref_profile_id,
    analysis_metrics=[ComplexMetrics.histogram],
    columns=top_n_by_weight(demo_org_id, demo_dataset_id, includeDiscrete=False, includeNonDiscrete=True))
```


### Policy-adjusted recommendations

Specify policies to help tailor the recommendation.

```
setup_recommended_monitors(
    org_id=demo_org_id,
    dataset_id=demo_dataset_id,
    ref_profile=demo_ref_profile,
    ref_profile_id=ref_profile_id,
    policies=[FalsePositiveTolerance('low'), SeasonalData(period='weekly')],
    )
```

### Explanations

Ask for explanations of the recommendations.

```
explain_recommended_monitors(
    org_id=demo_org_id,
    dataset_id=demo_dataset_id,
    ref_profile=demo_ref_profile,
    ref_profile_id=ref_profile_id)

[
   { 'monitor': 'unique-count-no-more-than-ref', 'reasons': ['Column has low cardinality values that likely come from a fixed set of categorical values'] },
   ...
]
```



### Default recommendation with no reference profile

Ask for a default set of recommended monitors, with no reference data available.

In [None]:
# setup_recommended_monitors(org_id=demo_org_id, dataset_id=demo_dataset_id)