Skip to content

vllm-stack-0.1.11

Latest

Choose a tag to compare

@github-actions github-actions released this 07 May 18:23
4fcec0d

The stack deployment of vLLM

What's Changed

  • feat(helm) add PDB per deployment by @enneitex in #851
  • Add production-ready vLLM CoreWeave CKS terraform stack by @brokedba in #834
  • [Feat][Router] Add disaggregated prefill orchestrated routing by @yahavb in #777
  • [bugfix] deprecate disable log request by @ruizhang0101 in #885
  • feat(helm): add configurable NodePort to router service by @keyuchen21 in #875
  • fix(benchmark/multi-round-qa): fix TTFT None Type crash caused by reasoning models (reasoning_content) by @brokedba in #873
  • [bugfix] fix cache server start command by @ruizhang0101 in #872
  • feat(helm) add monitoring conf as a sub chart by @enneitex in #860
  • [Bugfix] Forward backend Content-Type in StreamingResponse by @shernshiou in #880
  • fix(service_discovery): correctly return 503 on missing endpoints by @nejch in #889
  • bugfix: omit replicas field when autoscaling is enabled by @Isakgicu in #891
  • [Feat][Operator] Add prefixaware and kvaware routing options to VLLMRouter CRD by @keyuchen21 in #881
  • [Bugfix] Reduce RBAC permissions for secrets to least privilege by @EzgiTastan in #894
  • [CI/Build] Add .dockerignore to exclude test files from Docker builds by @EzgiTastan in #895
  • [Feat] Add generic cache-server resources support for InfiniBand/RDMA by @happytreees in #898
  • [Feat][Router] make healthcheck values configurable by @max-wittig in #906
  • [Router] Add reply and heartbeat port options for KV-aware routing by @can-sun in #908
  • [helm] document every values, udpate json schema and various fix by @enneitex in #886
  • [BugFix] Omit .spec.replicas when KEDA is enabled to prevent field ownership conflict by @lriverawong in #907
  • [Feat] Support KEDA Auto Scaling in Production Stack Operator by @aeon-x in #903
  • [Feat] Helm: add support for per-model tolerations by @AlexanderSing in #897
  • [CI/Build] Pin GitHub Actions to commit SHAs by @xiaotian-yu in #909
  • [CI] remove local registry and add runner cleanup by @ruizhang0101 in #922
  • [Feat] Implement OpenAI external provider by @shernshiou in #902
  • [Minor Improvements] Vllmruntime Autoscaling in Operator by @aeon-x in #918
  • [Bugfix][Router] Fix router auth for transcription proxy by @yzhan1 in #914
  • fix(helm) fix default values for cache deployment by @enneitex in #917
  • [Bugfix] fix(vllm-router): keep roundrobin state per endpoint set / model by @Killusions in #916
  • Feat/implement streaming path in audio transcription by @WaelRabah in #926
  • [Feat][Router] Add per-model request latency histogram by @banlor in #940
  • [Bugfix] to shared storage not working with dynamic PV provisioning by @NiccoloTosato in #933
  • [Bugfix][Router] Preserve full backend model metadata in /v1/models by @yzhan1 in #927
  • [Misc] Bump chart version to 0.1.11 by @ruizhang0101 in #942

New Contributors

Full Changelog: vllm-stack-0.1.10...vllm-stack-0.1.11