Releases · vllm-project/aibrix

21 May 21:42

github-actions

v0.3.0

ecc3529

v0.3.0 Latest

Latest

Automatically generated release for tag v0.3.0.

🚀 New Features Highlights

AIBrix KVCache Offloading Framework: Introduces a pluggable multi-tier KVCache architecture with support for DRAM and remote backends, enabling efficient offloading of KV states to reduce GPU memory pressure and increase deployment density. (#1057, #1061, #1062, #1063, #1064, #1068, #1069, #1080, #1107)
New KVCache orchestration API: Refactors the orchestration layer to support distributed hashing based caching solutions. (#971, #984, #985, #1037, #1055, #1071, #1114)
Prefix Cache and Load aware Routing: Uses hash token-based prefix matching and load awareness to reduce latency by increasing prefix cache hit rate and routing efficiency (#838, #774, #933, #1067)
Preble Routing (ICLR’25): An implementation of Preble, it balances KV cache reuse and GPU load by comparing prefix lengths and computing prompt-aware cost scores for optimal routing. (#678, #719, #730, #1024)
Fairness-oriented Routing (OSDI’24 VTC): Introduces the vtc-basic router with Windowed Adaptive Fairness Routing, which dynamically tracks token usage and ensures fair load distribution across pods. (#964, #1011, #1065)

📊 Feature Enhancements

Gateway Enhancements

Support for OpenAI-compatible APIs, including streaming responses, usage reporting, asynchronous handling, and standardized error responses for seamless end-to-end integration. (#703, #788, #799)
Introduced the /v1/models endpoint for compatibility with OpenAI-style API clients. (#802)
Refactored gateway-plugins with an extensible ext-proc server architecture, laying the foundation for pluggable policies. (#810)
Improved concurrency safety and routing stability through major cache and router redesigns (#878, #884)

Control Plane:

Added Kubernetes webhook validation for CRDs, providing early error feedback during resource creation (#748, #786).
Improve RayClusterFleet to fully support Deepseek-r1/v3 models (#789, #826, #835, #914, #954).
Add scale subresource in RayClusterFleet CRD and enable HPA support (#1082, #1109)

Installation Experiences:

Introduced Terraform modules for GCP and Kubernetes deployment (#823).
Added setup guides for Minikube on Lambda Cloud and AWS in the documentation (#1020).
Enabled standalone controller installation for simplified system bootstrapping.(#930, #931)
Streamlined upgrade workflows by introducing kubectl apply support. CRDs are now split and applied with --server-side, avoiding annotation size limits and enabling smooth incremental updates. (#793)
Enabled container image publishing to Github Container Registry (GHCR) (#1041).
Support ARM container Images (#1090)

Observability & Stability:

Shipped prebuilt Grafana dashboards covering control plane, gateway, and KV cache components for out-of-the-box observability. (#1048)
Tuned Envoy proxy memory and buffer configurations for better performance under high concurrency. (#825)
Tuned Envoy proxy configurations for memory and buffer management under high concurrency (#967).
Added graceful shutdown, liveness, and readiness probes to improve service resilience (#962).
Delivered production-ready monitoring setups for all major system components (#1048).

New Contributors

@gaocegege made their first contribution in #731
@eltociear made their first contribution in #736
@terrytangyuan made their first contribution in #746
@jolfr made their first contribution in #744
@Abirdcfly made their first contribution in #763
@pierDipi made their first contribution in #764
@Xunzhuo made their first contribution in #810
@zjd0112 made their first contribution in #849
@SongGuyang made their first contribution in #850
@vaaandark made their first contribution in #856
@vie-serendipity made their first contribution in #860
@nurali-techie made their first contribution in #867
@legendtkl made their first contribution in #870
@ronaldosaheki made their first contribution in #886
@nadongjun made their first contribution in #890
@cr7258 made their first contribution in #893
@thomasjpfan made their first contribution in #883
@runzhen made their first contribution in #896
@my-git9 made their first contribution in #895
@googs1025 made their first contribution in #908
@Iceber made their first contribution in #926
@ModiIntel made their first contribution in #954
@Venkat2811 made their first contribution in #964
@SuperMohit made their first contribution in #992
@weapons97 made their first contribution in #990
@zhixian82 made their first contribution in #1082

What's Changed

Full Changelog: v0.2.0...v0.3.0

[Docs] fix format of the dist kv cache doc by @DwyaneShi in #714
complete the 'make generate' command by @kerthcet in #711
Update organization reference in code base by @Jeffwan in #717
[Misc] Update the documentation link by @Jeffwan in #720
Initial implementation of radix tree-based cache by @gangmuk in #678
Add model adapter e2e tests by @varungup90 in #701
Add vllm cpu alternative for local development by @varungup90 in #721
Add white paper file by @Jeffwan in #724
Adding streaming client for AIbrix experiments by @happyandslow in #676
[Docs] Update Readme with new links and blog post, and update white paper by @xieus in #725
Recording failed requests in benchmark client by @gangmuk in #727
Process response headers in gateway by @varungup90 in #703
[misc] Fix white paper link by @Jeffwan in #728
Prefix and load aware routing with radix tree kv cache by @gangmuk in #719
Fix slack link in README.md by @Jeffwan in #729
[readme] Fix wrong link by @gaocegege in #731
[Misc] update scheduler.py by @eltociear in #736
Improve thread safety for TreeNode data structure and refactor related codes by @gangmuk in #730
Fix CacheSpec api scheme by @kerthcet in #740
docs: Fix link to license by @terrytangyuan in #746
Use native codegen cmd generating client-go by @kerthcet in #741
[Docs]: Fixed kubectl commands for install of components by @jolfr in #744
[fix] fixing bug in using AsyncOpenAI client (header setting, token counting, etc) by @gangmuk in #738
Add webhook framework by @kerthcet in #748
Use random seed for xxhash by @varungup90 in #752
Create SECURITY.md to enable security policy by @xieus in #756
[CI] Add integration test by @kerthcet in #759
[Bug] fix: correct non-inherited context by @Abirdcfly in #763
[Misc] Parametrize Makefile for mocked vLLM apps by @pierDipi in #764
Support benchmarking script by using real application trace by @nwangfw in #737
Maintaining common benchmarks utils in a separate dir by @gangmuk in #770
Ignore worker pods for gateway routing by @varungup90 in #776
Disable ENABLE_PROBES_INJECTION in correct way by @Jeffwan in #779
Make stream include usage as optional by @varungup90 in #788
Append ray head label selector in PodAutoscaler by @Jeffwan in #789
Remove redundant install crds in makefile by @varungup90 in #792
Update request message processing for /v1/completion input by @varungup90 in #794
Added target...

Contributors

zhangjyr, ronaldosaheki, and 33 other contributors

Assets 4

21 May 04:13

github-actions

v0.3.0-rc.2

c3bb240

v0.3.0-rc.2 Pre-release

Pre-release

Automatically generated release for tag v0.3.0-rc.2.

What's Changed

[Bug] fix: condition nil panic in FindStatusCondition func by @googs1025 in #1078
Refactor request body processing and add multi-turn conversation support by @varungup90 in #1067
Upload arm build images with git.ref_name by @varungup90 in #1090
Update documentation and add openai sdk samples by @varungup90 in #1092
Rename preble based prefix routing strategy by @varungup90 in #1104
Add v0.3.0 ps performance regression test scenario by @Jeffwan in #1099
Migrating benchmark entrypoints to python client by @happyandslow in #1066
[Misc] Add demo manifests for volcano engine by @Jeffwan in #1105
[Integration] KVCache: update vLLM integration by @DwyaneShi in #1107
[Bug]fix: add scale subresource to rayclusterfleet by @zhixian82 in #1082
[Feature] KVCache: Suppport InfiniStore GID and enhance cluster mode by @DwyaneShi in #1106
[Chore] fix: regenerate crd by @zhixian82 in #1109
[Chore] KVCache: enhance format and dependencies by @DwyaneShi in #1108
Polish benchmark manifests and VE samples by @Jeffwan in #1113
[API] Support customized template for cache by @Jeffwan in #1114
Bump version to v0.3.0-rc.2 by @Jeffwan in #1115
[Fix] Move pdb from patch to resources by @Jeffwan in #1117

New Contributors

@zhixian82 made their first contribution in #1082

Full Changelog: v0.3.0-rc.1...v0.3.0-rc.2

Contributors

Jeffwan, DwyaneShi, and 4 other contributors

Assets 4

13 May 07:12

github-actions

v0.3.0-rc.1

575aa5d

v0.3.0-rc.1 Pre-release

Pre-release

What's Changed

[Docs] fix format of the dist kv cache doc by @DwyaneShi in #714
complete the 'make generate' command by @kerthcet in #711
Update organization reference in code base by @Jeffwan in #717
[Misc] Update the documentation link by @Jeffwan in #720
Initial implementation of radix tree-based cache by @gangmuk in #678
Add model adapter e2e tests by @varungup90 in #701
Add vllm cpu alternative for local development by @varungup90 in #721
Add white paper file by @Jeffwan in #724
Adding streaming client for AIbrix experiments by @happyandslow in #676
[Docs] Update Readme with new links and blog post, and update white paper by @xieus in #725
Recording failed requests in benchmark client by @gangmuk in #727
Process response headers in gateway by @varungup90 in #703
[misc] Fix white paper link by @Jeffwan in #728
Prefix and load aware routing with radix tree kv cache by @gangmuk in #719
Fix slack link in README.md by @Jeffwan in #729
[readme] Fix wrong link by @gaocegege in #731
[Misc] update scheduler.py by @eltociear in #736
Improve thread safety for TreeNode data structure and refactor related codes by @gangmuk in #730
Fix CacheSpec api scheme by @kerthcet in #740
docs: Fix link to license by @terrytangyuan in #746
Use native codegen cmd generating client-go by @kerthcet in #741
[Docs]: Fixed kubectl commands for install of components by @jolfr in #744
[fix] fixing bug in using AsyncOpenAI client (header setting, token counting, etc) by @gangmuk in #738
Add webhook framework by @kerthcet in #748
Use random seed for xxhash by @varungup90 in #752
Create SECURITY.md to enable security policy by @xieus in #756
[CI] Add integration test by @kerthcet in #759
[Bug] fix: correct non-inherited context by @Abirdcfly in #763
[Misc] Parametrize Makefile for mocked vLLM apps by @pierDipi in #764
Support benchmarking script by using real application trace by @nwangfw in #737
Maintaining common benchmarks utils in a separate dir by @gangmuk in #770
Ignore worker pods for gateway routing by @varungup90 in #776
Disable ENABLE_PROBES_INJECTION in correct way by @Jeffwan in #779
Make stream include usage as optional by @varungup90 in #788
Append ray head label selector in PodAutoscaler by @Jeffwan in #789
Remove redundant install crds in makefile by @varungup90 in #792
Update request message processing for /v1/completion input by @varungup90 in #794
Added target pod to client result and made clients consistent by @gangmuk in #799
Enable CI tests for release branch by @Jeffwan in #805
Move modelAdapter runtime validation to webhook by @kerthcet in #786
[Misc] Adding model field to each request by @happyandslow in #812
[Refactor]: gateway-plugins ext-proc server codebase by @Xunzhuo in #810
[CI]: update release tags pattern by @Xunzhuo in #815
[Docs]: fix vllm mock app Unauthorized response by @Xunzhuo in #817
Reconfigure workload generator for predefined synthetic patterns by @happyandslow in #771
Workload generation scripts for prefix aware routing by @gangmuk in #820
Fix the paths in lambda cloud doc by @gangmuk in #824
[Bug] Added Startup Probe in Quickstart Model by @jolfr in #773
Add /v1/models endpoint to gateway by @varungup90 in #802
Increase envoy proxy memory config and client connection buffersize by @varungup90 in #825
Support to create default HttpRoute for RayClusterFleet by @Jeffwan in #826
[Misc] Fix CI issue on release branch and clean up logs by @Jeffwan in #837
Fix repeated initialization of gateway routers and add unit test for prefix cache by @varungup90 in #838
Add deepseek-r1 671B deployment sample and docs by @Jeffwan in #835
Bump AIBrix version to v0.2.1 in manifests by @Jeffwan in #839
[Docs] Update Slack link by @gaocegege in #841
[Docs] Remove repeated lines by @zjd0112 in #849
Bump AIBrix version to v0.2.1 for standalone distributed inference by @SongGuyang in #850
Support OpenAI api style /v1/models response by @Jeffwan in #829
[Misc] Resolve symlink ambiguity when generating codes by @vaaandark in #856
Introduce RoutingContext in Route interface and clean up stale codes by @Jeffwan in #855
[Misc]: sync hpa status to podAutoScaler by @vie-serendipity in #860
Generate workload based on prefix sharing synthetic data by @happyandslow in #840
Fixing missing image link in #840 by @happyandslow in #871
Cite Melange paper in heterogeneous feature by @Jeffwan in #872
[Misc] support linux for vllm cpu local development by @nurali-techie in #867
Refactor make deploy to use apply instead of create by @varungup90 in #793
Use string based tokenizer in prefix cache by @varungup90 in #774
Add profiling support for gateway plugins and bug fix to close stream decoder by @varungup90 in #857
Add flag to enable/disable GPU Optimizer tracing by @varungup90 in #875
[Docs] fix typo in runtime feature page by @legendtkl in #870
chore: clean-up mock yaml by @Xunzhuo in #877
Fixing image link error in workload generator README.md by @happyandslow in #888
Update Synthetic Load Prodefined Config for Geneerator by @happyandslow in #889
[Misc] Fix plot_workload to pass dirname to makedirs by @ronaldosaheki in #886
[Misc] Fix client.py in case workload has model null and client has default_model by @ronaldosaheki in #887
[WIP] Adding input/output distribution argument to constant load generator by @happyandslow in #882
[Docs] Fix broken contributing guidelines link in README by @nadongjun in #890
[Bug] fix install script PATH environment variable by @cr7258 in #893
[Docs] Link to dynamic lora from docs by @thomasjpfan in #883
[API] Refactor: core cache design and impl by @Xunzhuo in #878
Added antiaffinity in kvcache crd by @gangmuk in #865
[Docs] Fix tpm and rpm typo in gateway-plugins.rst by @runzhen in #896
[Misc] Remove unused function in pkg/utils by @my-git9 in #895
Remove model name from client and generator by @happyandslow in #894
[Misc] Add PS benchmark manifests and scripts by @Jeffwan in #899
Add release overlays to update control plane config for production deployment by @varungup90 in https://github.com/vllm-project/aibri...

Contributors

zhangjyr, ronaldosaheki, and 32 other contributors

Assets 4

09 Mar 13:25

github-actions

v0.2.1

858ec82

v0.2.1

Automatically generated release for tag v0.2.1.

What's Changed

Cherry-pick Enable CI tests for release branch (#805) by @Jeffwan in #808
Cherry pick #776 #779 #788 #789 #794 to release branch by @Jeffwan @varungup90 in #809
Cherry-pick #825 #826 part of #717 in release branch by @varungup90 @Jeffwan in #828
Update version and tags to v0.2.1 by @Jeffwan in #833

Full Changelog: v0.2.0...v0.2.1

Contributors

Jeffwan and varungup90

Assets 5

19 Feb 18:31

github-actions

v0.2.0

0a21d77

v0.2.0

Automatically generated release for tag v0.2.0.

🚀 New Features Highlights

Distributed KV Cache: Implemented support for managing KV cache across multiple nodes, enhancing performance.
Cost-Driven Heterogenous Serving: Improved scheduling and inference strategies for mixed GPU environments, optimizing cost and resource utilization. (#371 #430, #509, #598, #554, #598)
Optimizer Based Autoscaling: Leverage offline profiles of inference server to calculate the number of replicas. (#430, #500, #692, #508)
Prefix Cache Aware Routing: Added support for routing decisions based on prefix cache hits, improving inference efficiency. (#641, #657)

📊 Feature Enhancements

LoRA Scheduling Enhancements: Introduced multiple scheduling strategies, including bin packing, least latency, least throughput, and random. (#544)
Prefix Cache Aware Routing: Added support for routing decisions based on prefix cache hits, improving inference efficiency. (#641)
Gateway Enhancements: Improved request handling efficiency by enabling streaming in the Envoy gateway. (#377) Enhanced the handling of model registration and invalid cache scenarios. (#542), Introduced fallback strategies to ensure robust request allocation. (#445) Optimized cache store retrieval, reducing unnecessary overhead. (#639) Addressed missing Prometheus config preventing gateway startup. (#441)
PodAutoscaler Scaling improvements: Improved scaling logic to handle edge cases more efficiently. (#508, #515)

🛠Infrastructure & CI/CD Upgrades

Parallelized Build Tasks: CI efficiency improvements by running builds in parallel. (#398)
CrashLoopBackOff Detection in CI: Added monitoring for pod failures in testing workflows. (#444)
Improved GitHub Actions Cost Efficiency: Optimized triggers and removed unnecessary nightly builds. (#411, #422)
Integration Tests for Core Components: Added integration tests for autoscalers, routing policies, and deployment configurations. (#616, #620)

What's Changed

Add envoy gateway streaming support by @varungup90 in #377
Add client traffic policy to increase per connection buffer size from 32kb to 256kb by @varungup90 in #395
Misc: add support to metricsSources property of podautoscaler by @zhangjyr in #371
[Misc] Update runtime server startup command in v0.1.0 by @brosoul in #396
[CI] improve the ci efficiency by parallelizing the build tasks by @nwangfw in #398
Fix the ticker interval by removing unnecessary ms by @Jeffwan in #415
[Misc] Disable specific endpoints logs by @Jeffwan in #418
[CI] Github Action trigger condition optimized for cost saving by @nwangfw in #411
[Misc] Fix the mocked app role permission issue by @Jeffwan in #416
[CI] Nightly tag removed for release branch by @nwangfw in #422
Enable setting PodAutoscaler configuration via YAML labels by @kr11 in #409
Update manifest to adopt v0.1.1 images by @Jeffwan in #429
[Bug]: duplicated http in rest metrics fetcher (#408) by @zhangjyr in #421
[MISC]: Improve Request Trace Granularity with Version Control by @zhangjyr in #431
Support histogram metrics from engine in cache by @Jeffwan in #424
Support fetching metrics from remote Prometheus server by @Jeffwan in #433
[CI] Add python wheel to release artifact by @Jeffwan in #434
Fix update cache pod issue and refactor updatePod handler by @Jeffwan in #439
Extract common metrics structure to types and utils by @Jeffwan in #438
Fix gateway startup issue due to missing prometheus config by @Jeffwan in #441
[feat]: GPU Optimizer and Simulator development app by @zhangjyr in #430
Add selectrandom fallback in routing and only scraping healthy pods by @Jeffwan in #445
AIBrix Workload Generator / Scenario Simulator by @happyandslow in #428
CrashLoopBackOff status detection in CI by @nwangfw in #444
Support installing individual controllers from giant controller-manager by @nwangfw in #442
Refactor Scaler: Resolve Issues with Metric Parameter Updates in Multiple KPAs by @kr11 in #437
Support metrics multi labels for different models by @brosoul in #450
Add health check api interface for runtime by @Jeffwan in #451
Fix the service name override issue in rolebindings by @Jeffwan in #453
Reorganize docs/development and docs/tutorial structure by @Jeffwan in #455
Move tools to separate folders and update mocked app README.md by @Jeffwan in #457
Fix multi models metric result in PromQL by @brosoul in #458
Support Azure LLM trace in workload generator by @happyandslow in #462
Fix autoscaler scalingstrategy switching logic by @nwangfw in #475
Fix missing handle of PromQL scope is PodMetricScope by @brosoul in #479
[Misc] Consolidate app and simulator by @zhangjyr in #477
[Bug] Avoid including sensitive info in Dockerfile ENV by @zhangjyr in #487
Refactor generator to generate time-based traces by @happyandslow in #478
[CI] Update deploy workload script in installation test by @nwangfw in #499
[Bug] handle metricKey creation with MetricsSources by @nwangfw in #498
Adding Client for Workload Generator Workload File by @happyandslow in #501
[Feat] Integrate deployment configurations and fix autoscaler/gpu optimizer connectivity by @zhangjyr in #500
Fix some simulator format issue and add some TODOs by @Jeffwan in #505
[Bug] Fix the way how podautoscaler handle 0 pods. by @zhangjyr in #508
[Misc] Improve gpu optimizer debugging on podautoscaler. by @zhangjyr in #509
Optimize kustomize overlay for volcano engine deployment by @Jeffwan in #512
[perf] Refact tos downloader in Runtime by @brosoul in #510
Refactor metric source for customized protocol, port and path by @kr11 in #511
[Bug] Fixed the yaml of deployments in heterogenous GPU settings to make KPA scaling work as expected. by @zhangjyr in #513
[Misc] Heterogeneous GPU Optimizer Logging Clean Up by @nwangfw in #514
Fix KPA bug, and an elaborate KPA test case by @kr11 in #515
Cut v0.2.0-rc.1 release by @Jeffwan in #516
[Bug] Accumulated bug fix on controller manager, mock app configuration, and gpu optimizer. by @zhangjyr in #522
[Misc] Reduced runtime's container image size by @nwangfw in #518
clean memory scaler object when pa crd is deleted by @kr11 in #520
Configure autoscaler http client to skip certificate check by @Jeffwan in #530
[Doc] Update aibrix documentation by @Jeffwan in #533
Refactor the gateway-plugin and metadata service manifests by @Jeffwan in #531
Fix the GITHUB_WORKSPACE artifact sharing issue in release workflow by @Jeffwan in #532
[Misc] Polish the benchmark scripts by @Jeffwan in #525
Fix APA bugs in creation, add test and demo yaml by @kr11 in #536
Add VKE IPv4 Testing Cluster Config by @nwangfw in #537
Support for request length internal trace by @happyandslow in #538
[Feat] Add download status into runtime downloader by @brosoul in #539
[Feat] Add runtime model management api by @brosoul in #540
[gateway] handle the wrong model name and cache inconsistency case by @Jeffwan in #542
[Docs] fix: update the parameters instruction in readme by @scarlet25151 in #548
add lora schedulers - bin pack, least latency, least throughput, random by @Aspirin96 in #544
add request routers - least kv cache, least expected latency by @Aspirin96 in #543
[Docs] heterogenous gpu docs added by ...

Contributors

zhangjyr, Jeffwan, and 10 other contributors

Assets 5

23 Jan 22:23

github-actions

v0.2.0-rc.2

6ee2f11

v0.2.0-rc.2 Pre-release

Pre-release

Automatically generated release for tag v0.2.0-rc.2.

What's Changed

[Bug] Accumulated bug fix on controller manager, mock app configuration, and gpu optimizer. by @zhangjyr in #522
[Misc] Reduced runtime's container image size by @nwangfw in #518
clean memory scaler object when pa crd is deleted by @kr11 in #520
Configure autoscaler http client to skip certificate check by @Jeffwan in #530
[Doc] Update aibrix documentation by @Jeffwan in #533
Refactor the gateway-plugin and metadata service manifests by @Jeffwan in #531
Fix the GITHUB_WORKSPACE artifact sharing issue in release workflow by @Jeffwan in #532
[Misc] Polish the benchmark scripts by @Jeffwan in #525
Fix APA bugs in creation, add test and demo yaml by @kr11 in #536
Add VKE IPv4 Testing Cluster Config by @nwangfw in #537
Support for request length internal trace by @happyandslow in #538
[Feat] Add download status into runtime downloader by @brosoul in #539
[Feat] Add runtime model management api by @brosoul in #540
[gateway] handle the wrong model name and cache inconsistency case by @Jeffwan in #542
[Docs] fix: update the parameters instruction in readme by @scarlet25151 in #548
add lora schedulers - bin pack, least latency, least throughput, random by @Aspirin96 in #544
add request routers - least kv cache, least expected latency by @Aspirin96 in #543
[Docs] heterogenous gpu docs added by @nwangfw in #545
Fix race condition in cache by @varungup90 in #550
Fix pod internal cache delete handling by @varungup90 in #552
Handle terminating pod for request routing by @varungup90 in #549
Support absolute path as lora adapter artifact path by @Jeffwan in #556
Deadlock fix for cache by @varungup90 in #557
Mock app log fix for missing metrics warning by @varungup90 in #564
Add vllm graceful termination configuration by @nwangfw in #568
Enhance dynamic lora adapter support for auth enabled scenario by @Jeffwan in #571
Update pyproject.toml to support python 3.12 by @Jeffwan in #579
[Docs ]Update ai runtime management api and downloader docs by @Jeffwan in #577
Check the HPA ownerReference in request enqueue by @Jeffwan in #582
Add request length for traces by @happyandslow in #569
Support model registration flow using aibrix runtime api by @Jeffwan in #580
Gateway plugin report total incoming requests and pending requests by @zhangjyr in #554
Support distributed kv cache orchestration by @Jeffwan in #583
Grant workflow action permission to write packages by @Jeffwan in #586
Update routers to use GetPodModelMetric api and misc cleanup in metri… by @varungup90 in #590
Update upload/download artifact github actions version to v4 by @varungup90 in #591
Update version in aibrix/python to 0.2.0-rc.2 by @varungup90 in #594

New Contributors

@scarlet25151 made their first contribution in #548
@Aspirin96 made their first contribution in #544

Full Changelog: v0.2.0-rc.1...v0.2.0-rc.2

Contributors

zhangjyr, Jeffwan, and 7 other contributors

Assets 5

09 Jan 06:44

Jeffwan

v0.1.2

b0766a9

v0.1.2

What's Changed

Support absolute path as lora adapter artifact path (#556) by @Jeffwan in #558
Cherry pick streaming and client traffic policy by @varungup90 in #560
Cut v0.1.2 release by @Jeffwan in #561

Full Changelog: v0.1.1...v0.1.2

Contributors

Jeffwan and varungup90

Assets 4

10 Dec 20:16

Jeffwan

v0.2.0-rc.1

0d40fbd

v0.2.0-rc.1 Pre-release

Pre-release

What's Changed

Add envoy gateway streaming support by @varungup90 in #377
Add client traffic policy to increase per connection buffer size from 32kb to 256kb by @varungup90 in #395
Misc: add support to metricsSources property of podautoscaler by @zhangjyr in #371
[Misc] Update runtime server startup command in v0.1.0 by @brosoul in #396
[CI] improve the ci efficiency by parallelizing the build tasks by @nwangfw in #398
Fix the ticker interval by removing unnecessary ms by @Jeffwan in #415
[Misc] Disable specific endpoints logs by @Jeffwan in #418
[CI] Github Action trigger condition optimized for cost saving by @nwangfw in #411
[Misc] Fix the mocked app role permission issue by @Jeffwan in #416
[CI] Nightly tag removed for release branch by @nwangfw in #422
Enable setting PodAutoscaler configuration via YAML labels by @kr11 in #409
Update manifest to adopt v0.1.1 images by @Jeffwan in #429
[Bug]: duplicated http in rest metrics fetcher (#408) by @zhangjyr in #421
[MISC]: Improve Request Trace Granularity with Version Control by @zhangjyr in #431
Support histogram metrics from engine in cache by @Jeffwan in #424
Support fetching metrics from remote Prometheus server by @Jeffwan in #433
[CI] Add python wheel to release artifact by @Jeffwan in #434
Fix update cache pod issue and refactor updatePod handler by @Jeffwan in #439
Extract common metrics structure to types and utils by @Jeffwan in #438
Fix gateway startup issue due to missing prometheus config by @Jeffwan in #441
[feat]: GPU Optimizer and Simulator development app by @zhangjyr in #430
Add selectrandom fallback in routing and only scraping healthy pods by @Jeffwan in #445
AIBrix Workload Generator / Scenario Simulator by @happyandslow in #428
CrashLoopBackOff status detection in CI by @nwangfw in #444
Support installing individual controllers from giant controller-manager by @nwangfw in #442
Refactor Scaler: Resolve Issues with Metric Parameter Updates in Multiple KPAs by @kr11 in #437
Support metrics multi labels for different models by @brosoul in #450
Add health check api interface for runtime by @Jeffwan in #451
Fix the service name override issue in rolebindings by @Jeffwan in #453
Reorganize docs/development and docs/tutorial structure by @Jeffwan in #455
Move tools to separate folders and update mocked app README.md by @Jeffwan in #457
Fix multi models metric result in PromQL by @brosoul in #458
Support Azure LLM trace in workload generator by @happyandslow in #462
Fix autoscaler scalingstrategy switching logic by @nwangfw in #475
Fix missing handle of PromQL scope is PodMetricScope by @brosoul in #479
[Misc] Consolidate app and simulator by @zhangjyr in #477
[Bug] Avoid including sensitive info in Dockerfile ENV by @zhangjyr in #487
Refactor generator to generate time-based traces by @happyandslow in #478
[CI] Update deploy workload script in installation test by @nwangfw in #499
[Bug] handle metricKey creation with MetricsSources by @nwangfw in #498
Adding Client for Workload Generator Workload File by @happyandslow in #501
[Feat] Integrate deployment configurations and fix autoscaler/gpu optimizer connectivity by @zhangjyr in #500
Fix some simulator format issue and add some TODOs by @Jeffwan in #505
[Bug] Fix the way how podautoscaler handle 0 pods. by @zhangjyr in #508
[Misc] Improve gpu optimizer debugging on podautoscaler. by @zhangjyr in #509
Optimize kustomize overlay for volcano engine deployment by @Jeffwan in #512
[perf] Refact tos downloader in Runtime by @brosoul in #510
Refactor metric source for customized protocol, port and path by @kr11 in #511
[Bug] Fixed the yaml of deployments in heterogenous GPU settings to make KPA scaling work as expected. by @zhangjyr in #513
[Misc] Heterogeneous GPU Optimizer Logging Clean Up by @nwangfw in #514
Fix KPA bug, and an elaborate KPA test case by @kr11 in #515
Cut v0.2.0-rc.1 release by @Jeffwan in #516

Full Changelog: v0.1.1...v0.2.0-rc.1

Contributors

zhangjyr, Jeffwan, and 5 other contributors

Assets 6

21 Nov 23:02

github-actions

v0.1.1

1e7b918

v0.1.1

Automatically generated release for tag v0.1.1.

What's Changed

Cherry-pick - Fix the ticker interval by removing unnecessary ms by @Jeffwan in #425
Cut v0.1.1 release by @Jeffwan in #427

Full Changelog: v0.1.0...v0.1.1

Contributors

Jeffwan

Assets 4

12 Nov 22:33

github-actions

v0.1.0

d885131

v0.1.0

Feature Highlights

1. Dynamic LoRa Adapter

The Dynamic LoRa Adapter introduces a flexible approach to model adaptation, allowing dynamic management of LoRa models within Kubernetes. This new functionality includes efficient handling of model registration, unloading, and routing, significantly enhancing operational control and scalability for production environments.

2. Gateway Extension Server with Multi-Algorithm Routing Support

We extend the Envoy Gateway through an extension server and the external processing service can inspect and mutate requests and responses. We use this way to extend some features not directly supported in kubernetes service like various routing algorithms, such as least request, least throughput, and random and rate limit feature. This flexibility allows users to fine-tune routing strategies based on their specific application needs, ultimately improving traffic distribution and system performance.

3. LLM-specific Autoscaler

This release integrates multiple autoscaling algorithms, including HPA (Horizontal Pod Autoscaler), KPA (Knative Pod Autoscaler), and APA (AIBrix Pod Autoscaler). The autoscaling framework now features a direct connection to fetch metrics from pods, enabling real-time adjustments based on load and optimized resource utilization.

4. Unified AI Runtime

The AI runtime has been created to support faster model downloading through GPU streaming way, streamlined metrics aggregation, and efficient LoRa request delegation to abstract underlying engine complexities. This runtime provides an optimized environment for deploying and managing machine learning models, making it easier to handle high-volume requests.

Additional Enhancements:

Doc website: Updated documents, including quick-start guides, installation instructions, and tutorials for autoscaling, make setup and onboarding smoother.
Benchmarking and Performance Analysis Tools: Integrated tools for benchmarking autoscalers, gateways and lora to monitor and improve system efficiency and performance.
CI/CD Workflow: The new CI/CD pipeline includes automated image builds, GitHub Actions for testing and linting, and release pipelines for simplified deployment.

What's Changed

Add common project documents and skeleton folders by @Jeffwan in #4
Scaffolding aibrix project using kubebuilder by @Jeffwan in #17
Optimize project layouts by moving controllers to pkg folder by @Jeffwan in #21
Create Lora api and controller by @Jeffwan in #23
Rename LoraAdapter to ModelAdapter by @Jeffwan in #25
Add ModelAdapter API by @Jeffwan in #26
Use better way to set up controller with Manager by @Jeffwan in #27
Initial model adapter controller implementation by @Jeffwan in #32
Add mocked model container for lora adapter fast prototyping by @Jeffwan in #33
[Misc] Add the PR and issues template by @jsw-zorro in #38
[Docs] Add example to run vLLM distributed inference using Ray by @Jeffwan in #39
[Doc] Improve the model adapter mock service by @Jeffwan in #45
[Misc] Simplify the feature/bug/enhancement template. by @jsw-zorro in #48
[Misc] Make model adapter controller e2e work by @Jeffwan in #50
[Docs] A draft version of the contributing guideline document by @kr11 in #47
[Core] Improve model adapter controller by handling existing resources by @Jeffwan in #54
[Feat] Initial Implementation of PodAutoscaler Reconciler by @kr11 in #55
[Docs] Move the sample mocked application to common folder by @Jeffwan in #64
[Misc] Minor refactor the PodAutoscaler codes by @Jeffwan in #68
[Core] Add model router controller by @varungup90 in #57
Add rbac rules in model router by @varungup90 in #71
[bugs] Add autoscaler RBAC to successfully list horizontalpodautoscalers by @kr11 in #72
[Misc] Update license info; Add license check by @happyandslow in #73
add github workflow to lint & test code by @M00nF1sh in #74
[CI] Fix the golang lint issues by @Jeffwan in #77
[CI] fix the failures from make test by @Jeffwan in #80
[Misc] Add code-generator and openapi-gen as dependencies by @Jeffwan in #59
[Misc] Reconcile hpa, kpa and apa separately by @Jeffwan in #83
[feat] Add rpm/tpm extension proc plugin by @varungup90 in #79
Add kpa scale algorithm implementation by @kr11 in #87
Add host override to query specific pod by @varungup90 in #86
[Core] init aibrix runtime framework by @brosoul in #88
Support kpa/apa autoscaling workflow part I by @Jeffwan in #85
Fix Dockerfile Packaging Issues Related to Go Version and Missing Utils by @kr11 in #92
Autoscaling Workflow Enhancement - Part 2 by @kr11 in #94
Add custom CRD clientset by @varungup90 in #97
Autoscaling Workflow Enhancement - Part 3 by @kr11 in #101
[Core] Add Downloader implementation for runtime by @brosoul in #96
Add RayClusterReplicaSet and RayClusterFleet apis by @Jeffwan in #103
Apply crd:maxDescLen=0 in manifest generation by @Jeffwan in #108
Apply filter to objects owned by model adapters by @varungup90 in #111
Add custom cache and interface for model adapter scheduling by @varungup90 in #100
Refactor gateway package by @varungup90 in #112
BatchAPI storage component together with test by @xinchen384 in #104
Update the installation guidance and README.md by @Jeffwan in #115
[CI] Package AI Runtime by @brosoul in #118
Add gateway installation by @varungup90 in #122
[CI] Support container image build and push in CI by @Jeffwan in #120
[CI] Fix nightly image push error by @Jeffwan in #127
[Bug] Fix download bugs during download benchmark by @brosoul in #134
Autoscaling Workflow Enhancement - Part 4: Integrating MetricClient into Autoscaling Workflow by @kr11 in #116
Update make generate by @varungup90 in #132
Model adapter controller improvement and refactor by @Jeffwan in #135
Improve the aibrix installation scripts by @Jeffwan in #141
[CI] Support python package publish by @brosoul in #138
Fix some typo and naming issues by @Jeffwan in #150
Fix gateway bootstrap issues by @varungup90 in #154
Add kubeconfig flag for cache initialization by @varungup90 in #155
Using sphinx to generate html pages for our project static site by @xinchen384 in #153
Add finalizer and handle the model unload requests by @Jeffwan in #152
Fix kubeConfig redefined issue and update imagePullPolicy by @Jeffwan in #158
Add expectation lib to allows us to set and wait on expectations by @Jeffwan in #164
Add routing algorithms by @varungup90 in #143
Add readthedocs configuration for CI builds and update theme by @Jeffwan in #169
Add RayClusterReplicaSet initial implementation by @Jeffwan in #165
Add template page for the docs by @Jeffwan in #170
Remove myst_parser from sphinx extensions by @Jeffwan in #172
Update quickstart in the doc by @Jeffwan in #174
Metric standardizing in ai runtime by @brosoul in #163
[Misc] Rename env in runtime by @brosoul in #176
Add readiness check for redis in gateway plugin by @varungup90 in #173
[batch] job manager handles job state transition by @xinchen384 in #180
Add users CRUD API by @varungup90 in #181
Add routing for model adapter by @varungup90 in https:/...

Contributors

Jeffwan, kr11, and 9 other contributors

Assets 4

Releases: vllm-project/aibrix

v0.3.0

🚀 New Features Highlights

📊 Feature Enhancements

Gateway Enhancements

Control Plane:

Installation Experiences:

Observability & Stability:

New Contributors

What's Changed

Contributors

Uh oh!

v0.3.0-rc.2

What's Changed

New Contributors

Contributors

Uh oh!

v0.3.0-rc.1

What's Changed

Contributors

Uh oh!

v0.2.1

What's Changed

Contributors

Uh oh!

v0.2.0

🚀 New Features Highlights

📊 Feature Enhancements

🛠Infrastructure & CI/CD Upgrades

What's Changed

Contributors

Uh oh!

v0.2.0-rc.2

What's Changed

New Contributors

Contributors

Uh oh!

v0.1.2

What's Changed

Contributors

Uh oh!

v0.2.0-rc.1

What's Changed

Contributors

Uh oh!

v0.1.1

What's Changed

Contributors

Uh oh!

v0.1.0

Feature Highlights

1. Dynamic LoRa Adapter

2. Gateway Extension Server with Multi-Algorithm Routing Support

3. LLM-specific Autoscaler

4. Unified AI Runtime

Additional Enhancements:

What's Changed

Contributors

Uh oh!