1 Line of code data quality profiling & exploratory data analysis for Pandas and Spark DataFrames.
-
Updated
Jun 7, 2024 - Python
1 Line of code data quality profiling & exploratory data analysis for Pandas and Spark DataFrames.
Always know what to expect from your data.
The Open Source Feature Store for Machine Learning
The standard data-centric AI package for data quality and machine learning with messy, real-world data and labels.
The open-source tool for building high-quality datasets and computer vision models
Compare tables within or across databases
⚡ Data quality testing for the modern data stack (SQL, Spark, and Pandas) https://www.soda.io
数据治理、数据质量检核/监控平台(Django+jQuery+MySQL)
Automatically find issues in image datasets and practice data-centric computer vision.
Great Expectations Airflow operator
FeatHub - A stream-batch unified feature store for real-time machine learning
A collection of scripts written to complete DQLab Data Analyst Career Track 📊
Engine for ML/Data tracking, visualization, explainability, drift detection, and dashboards for Polyaxon.
re_data - fix data issues before your users & CEO would discover them 😊
The Lakehouse Engine is a configuration driven Spark framework, written in Python, serving as a scalable and distributed engine for several lakehouse algorithms, data flows and utilities for Data Products.
The toolkit to test, validate, and evaluate your models and surface, curate, and prioritize the most valuable data for labeling.
Code review for data in dbt
Possibly the fastest DataFrame-agnostic quality check library in town.
Swiple enables you to easily observe, understand, validate and improve the quality of your data
hive_compared_bq compares/validates 2 (SQL like) tables, and graphically shows the rows/columns that are different.
Add a description, image, and links to the data-quality topic page so that developers can more easily learn about it.
To associate your repository with the data-quality topic, visit your repo's landing page and select "manage topics."