vllm-stack-0.1.10
The stack deployment of vLLM
What's Changed
- Add servingEngineSpec environment variable by @shernshiou in #799
- [Fix] Handle missing max_tokens in disaggregated prefill requests by @keyuchen21 in #797
- [Router]: add routes for Image and Audio API by @nmiguel in #820
- [Router][Fix]: fixed name of images/edits endpoint by @nmiguel in #822
- Update contact information in README.md by @ruizhang0101 in #821
- Fix OCI OKE deployment script (entry_point.sh) — end-to-end tested by @fede-kamel in #811
- mention resources at the values.yaml as valid option by @eladmotola in #806
- [Doc] Update README for global env on servingEngineSpec by @shernshiou in #814
- feat(helm): add standard Kubernetes labels to deployments and services by @keyuchen21 in #810
- [BugFix][Feat]: fix serviceEngineSpec probe field and improve probe management in helm template by @emanuelecassese in #809
- [Bugfix] Increase router default memory size by @ruizhang0101 in #804
- [FEAT] Add per-model token and error Prometheus metrics (part of #699) by @ardecode in #813
- [CI/CD] Add stable router image by @ruizhang0101 in #823
- [Feat] Add toleration for vllmRunTimes by @mahmoudk1000 in #825
- [Feat] Operator : add GPUType for resources to replace "nvidia.com/gpu" in vllmruntime by @dotmobo in #829
- [Bugfix] Update aiohttp and python-multipart by @shernshiou in #831
- fix: make --log-level CLI argument actually control router log levels by @keyuchen21 in #832
- fix: Exclude content-length from response headers in route_general_transcriptions by @fidoriel in #733
- [Feat] Reorder hfTokenSecret for vllmRunTimes by @mahmoudk1000 in #826
- feat(router): add initial support for anthropic messages endpoint by @nejch in #775
- [Feat] Add token redaction for logger debug by @shernshiou in #824
- refactor: replace logging.getLogger() with init_logger() across codebase by @keyuchen21 in #835
- [CI/CD] add ci/cd for production stack operator by @ruizhang0101 in #843
- fix: filter hop-by-hop headers from streaming responses by @keyuchen21 in #836
- fix: upgrade h11 to 0.16.0 to resolve GHSA-vqfr-h8mv-ghfj by @keyuchen21 in #837
- Increase timeout values in e2e test workflow by @ruizhang0101 in #848
- [Feat][Router] Add request migration with configurable failover reroute attempts by @ikaadil in #839
- feat(helm) add support for extra manifests and annotation on pvc by @enneitex in #847
- feat: add --root-path CLI option for hosting router under a subpath by @keyuchen21 in #844
- [Misc] Expose LMCache log level as configurable Helm value and default to INFO. by @NargiT in #846
- [Feat] Add --log-format json option for structured logging by @keyuchen21 in #849
- [Router]: image edit routes multi-part form request by @nmiguel in #850
- [Docs] Update readme by @ruizhang0101 in #856
- Bump chart version to 0.1.10 by @ruizhang0101 in #859
New Contributors
- @nmiguel made their first contribution in #820
- @emanuelecassese made their first contribution in #809
- @dotmobo made their first contribution in #829
- @fidoriel made their first contribution in #733
- @nejch made their first contribution in #775
- @enneitex made their first contribution in #847
Full Changelog: vllm-stack-0.1.9...vllm-stack-0.1.10