The stack deployment of vLLM
What's Changed
- feat(helm) add PDB per deployment by @enneitex in #851
- Add production-ready vLLM CoreWeave CKS terraform stack by @brokedba in #834
- [Feat][Router] Add disaggregated prefill orchestrated routing by @yahavb in #777
- [bugfix] deprecate disable log request by @ruizhang0101 in #885
- feat(helm): add configurable NodePort to router service by @keyuchen21 in #875
- fix(benchmark/multi-round-qa): fix TTFT None Type crash caused by reasoning models (reasoning_content) by @brokedba in #873
- [bugfix] fix cache server start command by @ruizhang0101 in #872
- feat(helm) add monitoring conf as a sub chart by @enneitex in #860
- [Bugfix] Forward backend Content-Type in StreamingResponse by @shernshiou in #880
- fix(service_discovery): correctly return 503 on missing endpoints by @nejch in #889
- bugfix: omit replicas field when autoscaling is enabled by @Isakgicu in #891
- [Feat][Operator] Add prefixaware and kvaware routing options to VLLMRouter CRD by @keyuchen21 in #881
- [Bugfix] Reduce RBAC permissions for secrets to least privilege by @EzgiTastan in #894
- [CI/Build] Add .dockerignore to exclude test files from Docker builds by @EzgiTastan in #895
- [Feat] Add generic cache-server resources support for InfiniBand/RDMA by @happytreees in #898
- [Feat][Router] make healthcheck values configurable by @max-wittig in #906
- [Router] Add reply and heartbeat port options for KV-aware routing by @can-sun in #908
- [helm] document every values, udpate json schema and various fix by @enneitex in #886
- [BugFix] Omit .spec.replicas when KEDA is enabled to prevent field ownership conflict by @lriverawong in #907
- [Feat] Support KEDA Auto Scaling in Production Stack Operator by @aeon-x in #903
- [Feat] Helm: add support for per-model tolerations by @AlexanderSing in #897
- [CI/Build] Pin GitHub Actions to commit SHAs by @xiaotian-yu in #909
- [CI] remove local registry and add runner cleanup by @ruizhang0101 in #922
- [Feat] Implement OpenAI external provider by @shernshiou in #902
- [Minor Improvements] Vllmruntime Autoscaling in Operator by @aeon-x in #918
- [Bugfix][Router] Fix router auth for transcription proxy by @yzhan1 in #914
- fix(helm) fix default values for cache deployment by @enneitex in #917
- [Bugfix] fix(vllm-router): keep roundrobin state per endpoint set / model by @Killusions in #916
- Feat/implement streaming path in audio transcription by @WaelRabah in #926
- [Feat][Router] Add per-model request latency histogram by @banlor in #940
- [Bugfix] to shared storage not working with dynamic PV provisioning by @NiccoloTosato in #933
- [Bugfix][Router] Preserve full backend model metadata in /v1/models by @yzhan1 in #927
- [Misc] Bump chart version to 0.1.11 by @ruizhang0101 in #942
New Contributors
- @yahavb made their first contribution in #777
- @Isakgicu made their first contribution in #891
- @EzgiTastan made their first contribution in #894
- @happytreees made their first contribution in #898
- @can-sun made their first contribution in #908
- @lriverawong made their first contribution in #907
- @aeon-x made their first contribution in #903
- @AlexanderSing made their first contribution in #897
- @xiaotian-yu made their first contribution in #909
- @yzhan1 made their first contribution in #914
- @Killusions made their first contribution in #916
- @WaelRabah made their first contribution in #926
- @banlor made their first contribution in #940
- @NiccoloTosato made their first contribution in #933
Full Changelog: vllm-stack-0.1.10...vllm-stack-0.1.11