Skip to content

Latest commit

 

History

History
73 lines (58 loc) · 3.09 KB

DESCRIPTION.md

File metadata and controls

73 lines (58 loc) · 3.09 KB


The open standard for data logging

License PyPi Version Code style: black PyPi Downloads CI Maintainability

What is whylogs

whylogs is an open source library for logging any kind of data. With whylogs, users are able to generate summaries of their datasets (called whylogs profiles) which they can use to:

  1. Track changes in their dataset
  2. Create data constraints to know whether their data looks the way it should
  3. Quickly visualize key summary statistics about their datasets

These three functionalities enable a variety of use cases for data scientists, machine learning engineers, and data engineers:

  • Detect data drift in model input features
  • Detect training-serving skew, concept drift, and model performance degradation
  • Validate data quality in model inputs or in a data pipeline
  • Perform exploratory data analysis of massive datasets
  • Track data distributions & data quality for ML experiments
  • Enable data auditing and governance across the organization
  • Standardize data documentation practices across the organization
  • And more

Quickstart

Install whylogs using the pip package manager in a terminal by running:

pip install whylogs

Then you can log data in python as simply as this:

import whylogs as why
import pandas as pd

df = pd.read_csv("path/to/file.csv")
results = why.log(df)

And voilà, you now have a whylogs profile. To learn more about what a whylogs profile is and what you can do with it, check out our docs and our examples.