The easiest tool for fine-tuning LLM models, synthetic data generation, and collaborating on datasets.
-
Updated
Jul 2, 2025 - Python
The easiest tool for fine-tuning LLM models, synthetic data generation, and collaborating on datasets.
NFStream: a Flexible Network Data Analysis Framework.
A plugin for GTAV that transforms it into a vision-based self-driving car research environment.
🎯🗯 Dataset generation for AI chatbots, NLP tasks, named entity recognition or text classification models using a simple DSL!
Convert face dataset to masked dataset
Computer vision utils for Blender (generate instance annoatation, depth and 6D pose by one line code)
Compose multimodal datasets 🎹
언어모델을 학습하기 위한 공개 한국어 instruction dataset들을 모아두었습니다.
[IMC 2020 (Best Paper Finalist)] Using GANs for Sharing Networked Time Series Data: Challenges, Initial Promise, and Open Questions
A command-line interface to generate textual and conversational datasets with LLMs.
A library for preparing data for machine translation research (monolingual preprocessing, bitext mining, etc.) built by the FAIR NLLB team.
🔊 Create labeled datasets, enhance audio quality, identify speakers, support diverse dataset types. 🎧👥📊 Advanced audio processing.
Image Aesthetics Toolkit - includes Fisher Vector implementation, AVA (Image Aesthetic Visual Analysis) dataset and fast multi-threaded downloader
Creates an index of images, queries a local LLM and adds tags to the image metadata
Data release for the ImageInWords (IIW) paper.
DataGene - Identify How Similar TS Datasets Are to One Another (by @firmai)
👊 Prepare VOC format datasets for ultralytics/yolov3 & yolov5
[IJCV] Bamboo: 4 times larger than ImageNet; 2 time larger than Object365; Built by active learning.
Add a description, image, and links to the dataset-generation topic page so that developers can more easily learn about it.
To associate your repository with the dataset-generation topic, visit your repo's landing page and select "manage topics."