Lists (3)
Sort Name ascending (A-Z)
Stars
Synthetic data curation for post-training and structured data extraction
Distilabel is a framework for synthetic data and AI feedback for engineers who need fast, reliable and scalable pipelines based on verified research papers.
SGLang is a fast serving framework for large language models and vision language models.
Toolkit for linearizing PDFs for LLM datasets/training
🦛 CHONK your texts with Chonkie ✨ - The no-nonsense RAG chunking library
Cache-Augmented Generation: A Simple, Efficient Alternative to RAG
An AI-powered research assistant that performs iterative, deep research on any topic by combining search engines, web scraping, and large language models. The goal of this repo is to provide the si…
Empowering RAG with a memory-based data interface for all-purpose applications!
Govern, Secure, and Optimize your AI Traffic. AI Gateway provides unified interface to all LLMs using OpenAI API format with a focus on performance and reliability. Built in Rust.
Integrate cutting-edge LLM technology quickly and easily into your apps
Build Multimodal AI Agents with memory, knowledge and tools. Simple, fast and model-agnostic.
⚡️SwanLab - an open-source, modern-design AI training tracking and visualization tool. Supports Cloud / Self-hosted use. Integrated with PyTorch / Transformers / LLaMA Factory / Swift / Ultralytics…
KAG is a logical form-guided reasoning and retrieval framework based on OpenSPG engine and LLMs. It is used to build logical reasoning and factual Q&A solutions for professional domain knowledge ba…
The RedStone repository includes code for preparing extensive datasets used in training large language models.
Making large AI models cheaper, faster and more accessible
Finetune Llama 3.3, DeepSeek-R1 & Reasoning LLMs 2x faster with 70% less memory! 🦥
A Repository for Single- and Multi-modal Speaker Verification, Speaker Recognition and Speaker Diarization
🚀🚀 「大模型」2小时完全从0训练26M的小参数GPT!🌏 Train a 26M-parameter GPT from scratch in just 2h!
💬 Open source machine learning framework to automate text- and voice-based conversations: NLU, dialogue management, connect to Slack, Facebook, and more - Create chatbots and voice assistants
Customizable implementation of the self-instruct paper.
Open source annotation tool for machine learning practitioners.
A pure Rust Excel/OpenDocument SpreadSheets file reader: rust on metal sheets
A text extraction library supporting PDFs, images, office documents and more
📄 A curated list of awesome .cursorrules files