Lists (4)
Sort Name ascending (A-Z)
Starred repositories
[CVPR 2025] EnvGS: Modeling View-Dependent Appearance with Environment Gaussian
[CVPR 2025] VGGT: Visual Geometry Grounded Transformer
High-quality single file implementation of Deep Reinforcement Learning algorithms with research-friendly features (PPO, DQN, C51, DDPG, TD3, SAC, PPG)
Repo of "GoalFlow: Goal-Driven Flow Matching for Multimodal Trajectories Generation in End-to-End Autonomous Driving"
Explore the Multimodal “Aha Moment” on 2B Model
Wan: Open and Advanced Large-Scale Video Generative Models
Latest Advances on System-2 Reasoning
Official Code for Boost 3D Reconstruction using Diffusion-based Monocular Camera Calibration
AC3D: Analyzing and Improving 3D Camera Control in Video Diffusion Transformers
🚀🚀 「大模型」2小时完全从0训练26M的小参数GPT!🌏 Train a 26M-parameter GPT from scratch in just 2h!
[CVPR 2025] VideoWorld is a simple generative model that learns purely from unlabeled videos—much like how babies learn by observing their environment.
A library for accelerating Transformer models on NVIDIA GPUs, including using 8-bit floating point (FP8) precision on Hopper and Ada GPUs, to provide better performance with lower memory utilizatio…
Video Generation Foundation Models: https://saiyan-world.github.io/goku/
Fully open reproduction of DeepSeek-R1
Call Arxiv API and automatically update paper list
A PyTorch native library for large model training
Explore Python's charms by asking WHY questions
This repo contains the code for 1D tokenizer and generator
[CVPR 2025🔥] Enhancing Video VAE by Wavelet-Driven Energy Flow for Latent Video Diffusion Model
Machine Learning Engineering Open Book
Awesome-LLM: a curated list of Large Language Model
Cosmos is a world model development platform that consists of world foundation models, tokenizers and video processing pipeline to accelerate the development of Physical AI at Robotics & AV labs. C…
Download the NuPlan Dataset directly from the terminal
✨✨VITA-1.5: Towards GPT-4o Level Real-Time Vision and Speech Interaction
The official repo of Qwen (通义千问) chat & pretrained large language model proposed by Alibaba Cloud.
Code for "DrivingWorld: Constructing World Model for Autonomous Driving via Video GPT"
ScholArxiv is an open-source, aesthetic, minimal and AI powered app that allows users to search, read, bookmark, share, download and view summaries of academic papers from the arXiv repository.
Hunyuan-DiT : A Powerful Multi-Resolution Diffusion Transformer with Fine-Grained Chinese Understanding