1 Line of code data quality profiling & exploratory data analysis for Pandas and Spark DataFrames.
-
Updated
Jun 11, 2024 - Python
1 Line of code data quality profiling & exploratory data analysis for Pandas and Spark DataFrames.
Always know what to expect from your data.
The standard data-centric AI package for data quality and machine learning with messy, real-world data and labels.
Visualize and compare datasets, target values and associations, with one line of code.
⚡ Data quality testing for the modern data stack (SQL, Spark, and Pandas) https://www.soda.io
🚚 Agile Data Preparation Workflows made easy with Pandas, Dask, cuDF, Dask-cuDF, Vaex and PySpark
Automatically find issues in image datasets and practice data-centric computer vision.
Engine for ML/Data tracking, visualization, explainability, drift detection, and dashboards for Polyaxon.
Monitor the stability of a Pandas or Spark dataframe ⚙︎
Code review for data in dbt
Lineage metadata API, artifacts streams, sandbox, API, and spaces for Polyaxon
Swiple enables you to easily observe, understand, validate and improve the quality of your data
Installer for DataKitchen's Open Source Data Observability Products. Data breaks. Servers break. Your toolchain breaks. Ensure your team is the first to know and the first to solve with visibility across and down your data estate. Save time with simple, fast data quality test generation and execution. Trust your data, tools, and systems end to end.
Open-source metadata collector based on ODD Specification
Dataset search engine, discovering data from a variety of sources, profiling it, and allowing advanced queries on the index
Metadata and data identification tool and Python library. Identifies PII, common identifiers, language specific identifiers. Fully customizable and flexible rules
Client interface for all things Cleanlab Studio
A project for exploring how Great Expectations can be used to ensure data quality and validate batches within a data pipeline defined in Airflow.
Drift detection module for machine learning pipelines.
Add a description, image, and links to the data-profiling topic page so that developers can more easily learn about it.
To associate your repository with the data-profiling topic, visit your repo's landing page and select "manage topics."