Stars
Fully open reproduction of DeepSeek-R1
A high-quality tool for convert PDF to Markdown and JSON.一站式开源高质量数据提取工具,将PDF转换成Markdown和JSON格式。
ONNX Runtime: cross-platform, high performance ML inferencing and training accelerator
强化学习中文教程(蘑菇书🍄),在线阅读地址:https://datawhalechina.github.io/easy-rl/
User-friendly Desktop Client App for AI Models/LLMs (GPT, Claude, Gemini, Ollama...)
🚀🚀 「大模型」2小时完全从0训练26M的小参数GPT!🌏 Train a 26M-parameter GPT from scratch in just 2h!
Use PEFT or Full-parameter to finetune 450+ LLMs (Qwen2.5, InternLM3, GLM4, Llama3.3, Mistral, Yi1.5, Baichuan2, DeepSeek-R1, ...) and 150+ MLLMs (Qwen2.5-VL, Qwen2-Audio, Llama3.2-Vision, Llava, I…
【ArXiv】PDF-Wukong: A Large Multimodal Model for Efficient Long PDF Reading with End-to-End Sparse Sampling
✨✨Latest Advances on Multimodal Large Language Models
The code used to train and run inference with the ColVision models, e.g. ColPali, ColQwen2, and ColSmol.
SGLang is a fast serving framework for large language models and vision language models.
Famous Vision Language Models and Their Architectures
This repository offers a comprehensive collection of tutorials on state-of-the-art computer vision models and techniques. Explore everything from foundational architectures like ResNet to cutting-e…
[NAACL 2024] MMC: Advancing Multimodal Chart Understanding with LLM Instruction Tuning
MINT-1T: A one trillion token multimodal interleaved dataset.
[EMNLP'21] Visual News: Benchmark and Challenges in News Image Captioning
[ECCV 2024] Official implementation of the paper "Grounding DINO: Marrying DINO with Grounded Pre-Training for Open-Set Object Detection"
The official repo of Qwen2-Audio chat & pretrained large audio language model proposed by Alibaba Cloud.
A Comprehensive Toolkit for High-Quality PDF Content Extraction
官方推荐的 ChatTTS 资源汇总项目,整理了全网相关资源和常见问题 || Officially recommended ChatTTS resource collection project
pix2tex: Using a ViT to convert images of equations into LaTeX code.
official code for "Fox: Focus Anywhere for Fine-grained Multi-page Document Understanding"
[ECCV 2024] Official code implementation of Vary: Scaling Up the Vision Vocabulary of Large Vision Language Models.
Convert PDF to markdown + JSON quickly with high accuracy
[CVPR2024] The code for "Osprey: Pixel Understanding with Visual Instruction Tuning"