Deequ is a library built on top of Apache Spark for defining "unit tests for data", which measure data quality in large datasets.
-
Updated
Oct 24, 2019 - Scala
Deequ is a library built on top of Apache Spark for defining "unit tests for data", which measure data quality in large datasets.
DISTOD algorithm: Distributed discovery of bidirectional order dependencies
Example API implementation for Data Caterer
Add a description, image, and links to the data-profiling topic page so that developers can more easily learn about it.
To associate your repository with the data-profiling topic, visit your repo's landing page and select "manage topics."