>### 🚩 *Create a free WhyLabs account to get more value out of whylogs!*<br> 
>*Did you know you can store, visualize, and monitor whylogs profiles with the [WhyLabs Observability Platform](https://whylabs.ai/whylogs-free-signup?utm_source=whylogs-Github&utm_medium=whylogs-example&utm_campaign=Getting_Started)? Sign up for a [free WhyLabs account](https://whylabs.ai/whylogs-free-signup?utm_source=whylogs-Github&utm_medium=whylogs-example&utm_campaign=Getting_Started) to leverage the power of whylogs and WhyLabs together!*

# Getting Started

[![Open in Colab](https://colab.research.google.com/assets/colab-badge.svg)](https://colab.research.google.com/github/whylabs/whylogs/blob/mainline/python/examples/basic/Getting_Started.ipynb)

whylogs provides a standard to log any kind of data.

With whylogs, we will show how to log data, generating statistical summaries called *profiles*. These profiles can be used in a number of ways, like:

* Data Visualization
* Data Validation
* Tracking changes in your datasets

## Table of Content

In this example, we'll explore the basics of logging data with whylogs:

- Installing whylogs
- Profiling data
- Interacting with the profile
- Writing/Reading profiles to/from disk

## Installing whylogs

In [None]:
# Note: you may need to restart the kernel to use updated packages.
%pip install whylogs
%pip install sklearn
%pip install pmdarima

import pandas as pd

## Load the iris dataset from sklearn

Before showing how we can log data, we first need the data itself. Let's create a simple Pandas DataFrame:

In [1]:
import pandas as pd
from pmdarima.datasets import load_msft

df = load_msft()
df['Date'] = pd.to_datetime(df['Date'])

## Profiling with whylogs

To obtain a profile of your data, you can simply use whylogs' `log` call, and navigate through the result to a specific profile with `profile()`:

In [2]:
import whylogs as why

results = why.log(df)
profile = results.profile()

  ints = non_null_series[int_mask].astype(int)


## Analyzing Profiles

Once you're done logging the data, you can generate a `Profile View` and inspect it in a Pandas Dataframe format:

In [3]:
prof_view = profile.view()
prof_view.to_pandas()

Unnamed: 0_level_0,cardinality/est,cardinality/lower_1,cardinality/upper_1,counts/inf,counts/n,counts/nan,counts/null,distribution/max,distribution/mean,distribution/median,...,type,types/boolean,types/fractional,types/integral,types/object,types/string,types/tensor,frequent_items/frequent_strings,ints/max,ints/min
column,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1,Unnamed: 7_level_1,Unnamed: 8_level_1,Unnamed: 9_level_1,Unnamed: 10_level_1,Unnamed: 11_level_1,Unnamed: 12_level_1,Unnamed: 13_level_1,Unnamed: 14_level_1,Unnamed: 15_level_1,Unnamed: 16_level_1,Unnamed: 17_level_1,Unnamed: 18_level_1,Unnamed: 19_level_1,Unnamed: 20_level_1,Unnamed: 21_level_1
Close,3675.406005,3628.467234,3723.515055,0,7983,0,0,84.56,18.9847,20.476,...,SummaryType.COLUMN,0,7983,0,0,0,0,,,
Date,,,,0,7983,0,0,,,,...,SummaryType.COLUMN,0,0,0,7983,0,0,,,
High,3665.277479,3618.46806,3713.253953,0,7983,0,0,86.2,19.18722,20.714,...,SummaryType.COLUMN,0,7983,0,0,0,0,,,
Low,3647.135967,3600.558235,3694.874978,0,7983,0,0,84.0825,18.77364,20.26,...,SummaryType.COLUMN,0,7983,0,0,0,0,,,
Open,3693.316842,3646.149332,3741.660336,0,7983,0,0,84.77,18.97786,20.477,...,SummaryType.COLUMN,0,7983,0,0,0,0,,,
OpenInt,1.0,1.0,1.00005,0,7983,0,0,0.0,0.0,0.0,...,SummaryType.COLUMN,0,0,7983,0,0,0,"[FrequentItem(value='0', est=7983, upper=7983,...",0.0,0.0
Volume,7774.780604,7675.488543,7876.548222,0,7983,0,0,1371331000.0,79458000.0,70660300.0,...,SummaryType.COLUMN,0,0,7983,0,0,0,[],1371331000.0,0.0


In [6]:
from whylogs.api.logger.experimental.multi_dataset_logger.multi_dataset_rolling_logger import MultiDatasetRollingLogger
from whylogs.api.logger.experimental.multi_dataset_logger.time_util import TimeGranularity

anonymous_writer = AnonymousSessionWriter()

logger = MultiDatasetRollingLogger(aggregate_by=TimeGranularity.Year, writers=[anonymous_writer])

In [18]:
# Profile all of the msft data
from whylogs.api.logger.experimental.multi_dataset_logger.multi_dataset_rolling_logger import MultiDatasetRollingLogger
from whylogs.api.logger.experimental.multi_dataset_logger.time_util import TimeGranularity

from tqdm import tqdm

logger = MultiDatasetRollingLogger(aggregate_by=TimeGranularity.Year, writers=[])

for row in tqdm(df.to_dict(orient="records")): 
    timestamp_ms = row['Date'].timestamp() * 1000
    logger.log(row, timestamp_ms=timestamp_ms, sync=True)


100%|██████████| 7983/7983 [00:04<00:00, 1919.18it/s]


In [25]:
# Write the profiles to the anonymous session
results = logger.get_profile_views()

# TODO prints a link to the profile page after writing
print('https://observatory.development.whylabsdev.com/assets/model-1/summary?sessionToken=session-F8p5qS&sortModelBy=LatestAlert&sortModelDirection=DESC')
profiles = sorted(results.items(), reverse=True)
profiles


https://observatory.development.whylabsdev.com/assets/model-1/summary?sessionToken=session-F8p5qS&sortModelBy=LatestAlert&sortModelDirection=DESC


[(1483228800000,
  [<whylogs.core.view.dataset_profile_view.DatasetProfileView at 0x7f2d24169810>]),
 (1451606400000,
  [<whylogs.core.view.dataset_profile_view.DatasetProfileView at 0x7f2d24169510>]),
 (1420070400000,
  [<whylogs.core.view.dataset_profile_view.DatasetProfileView at 0x7f2d24169210>]),
 (1388534400000,
  [<whylogs.core.view.dataset_profile_view.DatasetProfileView at 0x7f2d24168f10>]),
 (1356998400000,
  [<whylogs.core.view.dataset_profile_view.DatasetProfileView at 0x7f2d24168c10>]),
 (1325376000000,
  [<whylogs.core.view.dataset_profile_view.DatasetProfileView at 0x7f2d24168910>]),
 (1293840000000,
  [<whylogs.core.view.dataset_profile_view.DatasetProfileView at 0x7f2d2415e0e0>]),
 (1262304000000,
  [<whylogs.core.view.dataset_profile_view.DatasetProfileView at 0x7f2d2415fe50>]),
 (1230768000000,
  [<whylogs.core.view.dataset_profile_view.DatasetProfileView at 0x7f2d2415fdf0>]),
 (1199145600000,
  [<whylogs.core.view.dataset_profile_view.DatasetProfileView at 0x7f2d241

In [26]:
# Take the two most recent profiles and visualize them here
from whylogs.viz import NotebookProfileVisualizer

profile_2017 = profiles[0][1][0]
profile_2016 = profiles[1][1][0]

print(f'Using profiles for dates {profile_2016.dataset_timestamp} and {profile_2017.dataset_timestamp}')

visualization = NotebookProfileVisualizer()
visualization.set_profiles(target_profile_view=profile_2016, reference_profile_view=profile_2017)

Using profiles for dates 2016-01-01 00:00:00+00:00 and 2017-01-01 00:00:00+00:00


In [28]:
# TODO prints a link to the profile page with these two profiles pre populated
print('https://observatory.development.whylabsdev.com/assets/model-1/summary?sessionToken=session-F8p5qS&sortModelBy=LatestAlert&sortModelDirection=DESC')

visualization.summary_drift_report()

https://observatory.development.whylabsdev.com/assets/model-1/summary?sessionToken=session-F8p5qS&sortModelBy=LatestAlert&sortModelDirection=DESC


In [50]:
visualization.double_histogram(feature_name="Close")


In [9]:
from whylogs.viz.drift.column_drift_algorithms import calculate_drift_scores

scores = calculate_drift_scores(target_view=profile_2016, reference_view=profile_2017, with_thresholds = True)

scores

{'Date': None,
 'Open': {'algorithm': 'ks',
  'pvalue': 3.5014812407320652e-90,
  'statistic': 0.8095238095238095,
  'thresholds': {'NO_DRIFT': (0.15, 1),
   'POSSIBLE_DRIFT': (0.05, 0.15),
   'DRIFT': (0, 0.05)},
  'drift_category': 'DRIFT'},
 'High': {'algorithm': 'ks',
  'pvalue': 3.5014812407320652e-90,
  'statistic': 0.8095238095238095,
  'thresholds': {'NO_DRIFT': (0.15, 1),
   'POSSIBLE_DRIFT': (0.05, 0.15),
   'DRIFT': (0, 0.05)},
  'drift_category': 'DRIFT'},
 'Low': {'algorithm': 'ks',
  'pvalue': 3.5014812407320652e-90,
  'statistic': 0.8095238095238095,
  'thresholds': {'NO_DRIFT': (0.15, 1),
   'POSSIBLE_DRIFT': (0.05, 0.15),
   'DRIFT': (0, 0.05)},
  'drift_category': 'DRIFT'},
 'Close': {'algorithm': 'ks',
  'pvalue': 3.5014812407320652e-90,
  'statistic': 0.8095238095238095,
  'thresholds': {'NO_DRIFT': (0.15, 1),
   'POSSIBLE_DRIFT': (0.05, 0.15),
   'DRIFT': (0, 0.05)},
  'drift_category': 'DRIFT'},
 'Volume': {'algorithm': 'ks',
  'pvalue': 3.6748919179147516e-19,
  

## What's Next?

There's a lot you can do with the profiles you just created. Keep getting your hands dirty with the following examples!

- Basic
    - [Visualizing Profiles](https://whylogs.readthedocs.io/en/stable/examples/basic/Notebook_Profile_Visualizer.html) - Compare profiles to detect distribution shifts, visualize histograms and bar charts and explore your data
    - [Logging Data](https://whylogs.readthedocs.io/en/stable/examples/basic/Logging_Different_Data.html) - See the different ways you can log your data with whylogs
    - [Inspecting Profiles](https://whylogs.readthedocs.io/en/stable/examples/basic/Inspecting_Profiles.html) - A deeper dive on the metrics generated by whylogs
    - [Schema Configuration for Tracking Metrics](https://whylogs.readthedocs.io/en/stable/examples/basic/Schema_Configuration.html) - Configure tracking metrics according to data type or column features
    - [Data Constraints](https://whylogs.readthedocs.io/en/stable/examples/advanced/Metric_Constraints.html) - Set constraints to your data to ensure its quality
    - [Merging Profiles](https://whylogs.readthedocs.io/en/stable/examples/basic/Merging_Profiles.html) - Merge your profiles logged across different computing instances, time periods or data segments
- Integrations
    - [WhyLabs](https://whylogs.readthedocs.io/en/stable/examples/integrations/writers/Writing_to_WhyLabs.html) - Monitor your profiles continuously with the WhyLabs Observability Platform
    - [Pyspark](https://whylogs.readthedocs.io/en/stable/examples/integrations/Pyspark_Profiling.html) - Use whylogs with pyspark
    - [Writing Profiles](https://whylogs.readthedocs.io/en/stable/examples/integrations/writers/Writing_Profiles.html) - See different ways and locations to output your profiles
    - [Flask](https://whylogs.readthedocs.io/en/stable/examples/integrations/flask_streaming/flask_with_whylogs.html) - See how you can create a Flask app with whylogs and WhyLabs integration
    - [Feature Stores](https://whylogs.readthedocs.io/en/stable/examples/integrations/Feature_Stores_and_whylogs.html) - Learn how to log features from your Feature Store with feast and whylogs
    - [BigQuery](https://whylogs.readthedocs.io/en/stable/examples/integrations/BigQuery_Example.html) - Profile data queried from a Google BigQuery table
    - [MLflow](https://whylogs.readthedocs.io/en/stable/examples/integrations/Mlflow_Logging.html) - Log your whylogs profiles to an MLflow environment

Or go to the [examples page](https://whylogs.readthedocs.io/en/stable/examples.html) for the complete list of examples!