The standard data-centric AI package for data quality and machine learning with messy, real-world data and labels.
-
Updated
Jun 20, 2024 - Python
The standard data-centric AI package for data quality and machine learning with messy, real-world data and labels.
The open-source tool for building high-quality datasets and computer vision models
A light-weight, flexible, and expressive statistical data testing library
🚚 Agile Data Preparation Workflows made easy with Pandas, Dask, cuDF, Dask-cuDF, Vaex and PySpark
Prepping tables for machine learning
🗣️ A book and repo to get you started programming voice computing applications in Python (10 chapters and 200+ scripts).
An open-source educational chat model from ICALK, East China Normal University. 开源中英教育对话大模型。(通用基座模型,GPU部署,数据清理) 致敬: LLaMA, MOSS, BELLE, Ziya, vLLM
Easy to use Python library of customized functions for cleaning and analyzing data.
LLM-based text extraction from unstructured data like PDFs, Words and HTMLs. Transform and cluster the text into your desired format. Less information loss, more interpretation, and faster R&D!
🤖 An automated machine learning framework for audio, text, image, video, or .CSV files (50+ featurizers and 15+ model trainers). Python 3.6 required.
Deal with bad samples in your dataset dynamically, use Transforms as Filters, and more!
A full data warehouse infrastructure with ETL pipelines running inside docker on Apache Airflow for data orchestration, AWS Redshift for cloud data warehouse and Metabase to serve the needs of data visualizations such as analytical dashboards.
This is a repo for my Bertelsmann Data Science Scholarship Challenge: notes, exercises, quizzes.
The toolkit to test, validate, and evaluate your models and surface, curate, and prioritize the most valuable data for labeling.
A Machine Learning System for Data Enrichment.
A package for MD, Docking and Machine learning drug discovery pipeline
the list of ~2000 ukrainian stopwords (with numbers)
Wikidata and Wikipedia language data extraction
Neural Machine Translation on the Nepali-English language pair
Data Cleaning with Python
Add a description, image, and links to the data-cleaning topic page so that developers can more easily learn about it.
To associate your repository with the data-cleaning topic, visit your repo's landing page and select "manage topics."