Skip to content

vllm-stack-0.1.10

Choose a tag to compare

@github-actions github-actions released this 27 Feb 23:44
62e8137

The stack deployment of vLLM

What's Changed

  • Add servingEngineSpec environment variable by @shernshiou in #799
  • [Fix] Handle missing max_tokens in disaggregated prefill requests by @keyuchen21 in #797
  • [Router]: add routes for Image and Audio API by @nmiguel in #820
  • [Router][Fix]: fixed name of images/edits endpoint by @nmiguel in #822
  • Update contact information in README.md by @ruizhang0101 in #821
  • Fix OCI OKE deployment script (entry_point.sh) — end-to-end tested by @fede-kamel in #811
  • mention resources at the values.yaml as valid option by @eladmotola in #806
  • [Doc] Update README for global env on servingEngineSpec by @shernshiou in #814
  • feat(helm): add standard Kubernetes labels to deployments and services by @keyuchen21 in #810
  • [BugFix][Feat]: fix serviceEngineSpec probe field and improve probe management in helm template by @emanuelecassese in #809
  • [Bugfix] Increase router default memory size by @ruizhang0101 in #804
  • [FEAT] Add per-model token and error Prometheus metrics (part of #699) by @ardecode in #813
  • [CI/CD] Add stable router image by @ruizhang0101 in #823
  • [Feat] Add toleration for vllmRunTimes by @mahmoudk1000 in #825
  • [Feat] Operator : add GPUType for resources to replace "nvidia.com/gpu" in vllmruntime by @dotmobo in #829
  • [Bugfix] Update aiohttp and python-multipart by @shernshiou in #831
  • fix: make --log-level CLI argument actually control router log levels by @keyuchen21 in #832
  • fix: Exclude content-length from response headers in route_general_transcriptions by @fidoriel in #733
  • [Feat] Reorder hfTokenSecret for vllmRunTimes by @mahmoudk1000 in #826
  • feat(router): add initial support for anthropic messages endpoint by @nejch in #775
  • [Feat] Add token redaction for logger debug by @shernshiou in #824
  • refactor: replace logging.getLogger() with init_logger() across codebase by @keyuchen21 in #835
  • [CI/CD] add ci/cd for production stack operator by @ruizhang0101 in #843
  • fix: filter hop-by-hop headers from streaming responses by @keyuchen21 in #836
  • fix: upgrade h11 to 0.16.0 to resolve GHSA-vqfr-h8mv-ghfj by @keyuchen21 in #837
  • Increase timeout values in e2e test workflow by @ruizhang0101 in #848
  • [Feat][Router] Add request migration with configurable failover reroute attempts by @ikaadil in #839
  • feat(helm) add support for extra manifests and annotation on pvc by @enneitex in #847
  • feat: add --root-path CLI option for hosting router under a subpath by @keyuchen21 in #844
  • [Misc] Expose LMCache log level as configurable Helm value and default to INFO. by @NargiT in #846
  • [Feat] Add --log-format json option for structured logging by @keyuchen21 in #849
  • [Router]: image edit routes multi-part form request by @nmiguel in #850
  • [Docs] Update readme by @ruizhang0101 in #856
  • Bump chart version to 0.1.10 by @ruizhang0101 in #859

New Contributors

Full Changelog: vllm-stack-0.1.9...vllm-stack-0.1.10