# Auto Segmentation Example

First let us create a sample dataset, like an output you might get from a classification model. 

In [65]:
import pandas as pd
df = pd.DataFrame({"target_var": ["cat", "dog", "pig", "dog", "cat", "dog"],
                   "confidence": [1.2, 3.4, 4.5, 0.1, 4.5, -0.19],
                   "prediction": ["good boy", "good boy", "good boy", "good boy", "bad", "bad"]})

In this case one interesting segmentation would be to see how the `confidence` could be segmented by the other features.

## Whylogs Auto Segmentation

In [80]:
from whylogs import get_or_create_session
whylogs_session = get_or_create_session()

WARN: Missing config


Before we start logging the data, we first run autosegmentation to infer what are the important segments in the data

In [81]:
sess.estimate_segments(df, name= "demo1", target_field="target_var", max_segments=3)

['prediction']

This creates a segmentation file `segments.json` in default path `output/{name}` where the `name` we gave above is `demo1`

In [82]:
!ls output/demo1/metadata/*

output/demo1/metadata/segments.json


Containing the features to be segmented.

In [83]:
!jq . output/demo1/metadata/segments.json

[1;39m[
  [0;32m"prediction"[0m[1;39m
[1;39m][0m


## Log Segmented data

You can then proced loging the data, whylogs will segment each of the features above automatically.

In [84]:
sess.log_dataframe(df, dataset_name="demo1")

In [85]:
!cat output/demo1/

cat: output/demo1/: Is a directory


In [86]:
!ls output/demo1/**/*

output/demo1/27cbaff0-c7c7-4805-ae89-9ed4f548eda7/flat_table/dataset_profile.csv
output/demo1/27cbaff0-c7c7-4805-ae89-9ed4f548eda7/freq_numbers/dataset_profile.json
output/demo1/27cbaff0-c7c7-4805-ae89-9ed4f548eda7/frequent_strings/dataset_profile.json
output/demo1/27cbaff0-c7c7-4805-ae89-9ed4f548eda7/histogram/dataset_profile.json
output/demo1/27cbaff0-c7c7-4805-ae89-9ed4f548eda7/json/dataset_profile.json
output/demo1/27cbaff0-c7c7-4805-ae89-9ed4f548eda7/protobuf/dataset_profile.bin
output/demo1/metadata/segments.json

output/demo1/27cbaff0-c7c7-4805-ae89-9ed4f548eda7:
[1m[36mflat_table[m[m       [1m[36mfrequent_strings[m[m [1m[36mjson[m[m
[1m[36mfreq_numbers[m[m     [1m[36mhistogram[m[m        [1m[36mprotobuf[m[m

output/demo1/27cbaff0-c7c7-4805-ae89-9ed4f548eda7/flat_table:
dataset_profile.csv

output/demo1/27cbaff0-c7c7-4805-ae89-9ed4f548eda7/freq_numbers:
dataset_profile.json

output/demo1/27cbaff0-c7c7-4805-ae89-9ed4f548eda7/frequent_stri

In [73]:
sess.close()

In [64]:
!rm -rf output/demo1/**/*