The standard data-centric AI package for data quality and machine learning with messy, real-world data and labels.
-
Updated
Jul 9, 2024 - Python
The standard data-centric AI package for data quality and machine learning with messy, real-world data and labels.
The open-source tool for building high-quality datasets and computer vision models
A Doctor for your data
fastdup is a powerful free tool designed to rapidly extract valuable insights from your image & video datasets. Assisting you to increase your dataset images & labels quality and reduce your data operations costs at an unparalleled scale.
Metamapper is a data discovery and documentation platform for improving how teams understand and interact with their data.
A library for detecting problematic data segments in structured and unstructured data with few lines of code.
Learn2Clean: Optimizing the Sequence of Tasks for Data Preparation and Cleaning
A tool for downloading from public image boards (which allow scraping) / preview your images & tags / edit your images & tags. Additional tabs for downloading other desired code repositories as well as S.O.T.A. diffusion and auto-tag/caption models for your purposes. Custom datasets can be added!
Code and data for "Target-oriented Proactive Dialogue Systems with Personalization: Problem Formulation and Dataset Curation" (EMNLP 2023)
Client interface for all things Cleanlab Studio
Curation of BIDS (CuBIDS): A sanity-preserving software package for processing BIDS datasets.
AqSolDB: A curated aqueous solubility dataset contains 9.982 unique compounds.
🧼🔎 A holistic self-supervised data cleaning strategy to detect irrelevant samples, near duplicates and label errors.
Data Cleaning and Data Profiling Library for Python
Rebalancing chemical reaction
tranSMART Arborist ETL toolkit
Archeologická mapa České republiky
HISDAC-ES: Creating historical settlement data for Spain (1900-2020) based on cadastral building footprint data
Python package to make URL extraction, generalization, validation, and filtration easy.
Add a description, image, and links to the data-curation topic page so that developers can more easily learn about it.
To associate your repository with the data-curation topic, visit your repo's landing page and select "manage topics."