Stars
Qwen2.5-VL is the multimodal large language model series developed by Qwen team, Alibaba Cloud.
A generative world for general-purpose robotics & embodied AI learning.
HART: Efficient Visual Generation with Hybrid Autoregressive Transformer
PhD Dissertation Template for UNC Computer Science
Implementation of the paper: "Answering Questions by Meta-Reasoning over Multiple Chains of Thought"
Open-source evaluation toolkit of large multi-modality models (LMMs), support 220+ LMMs, 80+ benchmarks
Jupyter notebook server extension to proxy web services.
PyTorch code for System-1.x: Learning to Balance Fast and Slow Planning with Language Models
Easily use and train state of the art late-interaction retrieval methods (ColBERT) in any RAG pipeline. Designed for modularity and ease-of-use, backed by research.
LLM Comparator is an interactive data visualization tool for evaluating and analyzing LLM responses side-by-side, developed by the PAIR team.
Official implementation of SEED-LLaMA (ICLR 2024).
Official implementation of Ctrl-Adapter: An Efficient and Versatile Framework for Adapting Diverse Controls to Any Diffusion Model (ICLR 2025 Oral)
[NeurIPS 2024 Best Paper][GPT beats diffusion🔥] [scaling laws in visual generation📈] Official impl. of "Visual Autoregressive Modeling: Scalable Image Generation via Next-Scale Prediction". An *ult…
Rethinking Interactive Image Segmentation with Low Latency, High Quality, and Diverse Prompts (CVPR 2024)
Official repo for "Mini-Gemini: Mining the Potential of Multi-modality Vision Language Models"
OCR, layout analysis, reading order, table recognition in 90+ languages
Official Code Repository for EnvGen: Generating and Adapting Environments via LLMs for Training Embodied Agents (COLM 2024)
Code and Data for Paper: SELMA: Learning and Merging Skill-Specific Text-to-Image Experts with Auto-Generated Data
Code for the paper "pix2gestalt: Amodal Segmentation by Synthesizing Wholes" (CVPR 2024)
MM-Interleaved: Interleaved Image-Text Generative Modeling via Multi-modal Feature Synchronizer
Machine Theory of Mind Reading List. Built upon EMNLP Findings 2023 Paper: Towards A Holistic Landscape of Situated Theory of Mind in Large Language Models
[CVPR 2024] Official PyTorch implementation of "ECLIPSE: Revisiting the Text-to-Image Prior for Efficient Image Generation"
[CVPR 24] The repository provides code for running inference and training for "Segment and Caption Anything" (SCA) , links for downloading the trained model checkpoints, and example notebooks / gra…