
Starred repositories
Transformer Explained Visually: Learn How LLM Transformer Models Work with Interactive Visualization
Calculate token/s & GPU memory requirement for any LLM. Supports llama.cpp/ggml/bnb/QLoRA quantization
🐶 Kubernetes CLI To Manage Your Clusters In Style!
FUSE-based file system backed by Amazon S3
NVIDIA Data Center GPU Manager (DCGM) is a project for gathering telemetry and measuring the health of NVIDIA GPUs
Asynchronous HTTP client/server framework for asyncio and Python
KuntaiDu / vllm
Forked from vllm-project/vllmA high-throughput and memory-efficient inference and serving engine for LLMs
PyTorch/TorchScript/FX compiler for NVIDIA GPUs using TensorRT
Real-ESRGAN aims at developing Practical Algorithms for General Image/Video Restoration.
An easy to use PyTorch to TensorRT converter
A Kubernetes media gateway for WebRTC. Contact: info@l7mp.io
Open-Source Low-Latency Accelerated Linux WebRTC HTML5 Remote Desktop Streaming Platform for Self-Hosting, Containers, Kubernetes, or Cloud/HPC
**Official** 李宏毅 (Hung-yi Lee) 機器學習 Machine Learning 2022 Spring
Model Compression Toolbox for Large Language Models and Diffusion Models
[ICLR2025 Spotlight] SVDQuant: Absorbing Outliers by Low-Rank Components for 4-Bit Diffusion Models
TensorRT-LLM provides users with an easy-to-use Python API to define Large Language Models (LLMs) and build TensorRT engines that contain state-of-the-art optimizations to perform inference efficie…
xDiT: A Scalable Inference Engine for Diffusion Transformers (DiTs) with Massive Parallelism
OneDiff: An out-of-the-box acceleration library for diffusion models.
NVIDIA® TensorRT™ is an SDK for high-performance deep learning inference on NVIDIA GPUs. This repository contains the open source components of TensorRT.
Transformers-compatible library for applying various compression algorithms to LLMs for optimized deployment with vLLM
PyTorch native quantization and sparsity for training and inference
FlashInfer: Kernel Library for LLM Serving
A modern replacement for Redis and Memcached
Open, Multi-Cloud, Multi-Cluster Kubernetes Orchestration
Hackable and optimized Transformers building blocks, supporting a composable construction.