Pinned Loading
Repositories
Showing 10 of 16 repositories
- llm-compressor Public
Transformers-compatible library for applying various compression algorithms to LLMs for optimized deployment with vLLM
- production-stack Public
vLLM’s reference system for K8S-native cluster-wide deployment with community-driven performance optimization
- flash-attention Public Forked from Dao-AILab/flash-attention
Fast and memory-efficient exact attention
- vllm-project.github.io Public
- vllm-openvino Public
- buildkite-ci Public