AIBrix v0.7.0 is here! This release lands 242 merged PRs over three months and pushes AIBrix toward a composable, self-service inference platform. The theme this cycle is composability across the operational, workload, engine, and gateway layers: a new web Console for self-service operations, a production OpenAI-compatible Batch API, first-class multi-engine support (vLLM, SGLang, TensorRT-LLM), a KV-cache-centric P/D disaggregation data plane, and a highly-available gateway with pluggable, blendable routing.
📖 Read the full release blog: https://aibrix.github.io/posts/2026-06-16-v0.7.0-release/
⚠️ Maturity note: The Console, Batch API, and Resource Manager / Cloud GPU execution are new or rebuilt in this cycle and are evolving quickly. Treat them as preview features for now — APIs and behavior may change in v0.8.0.
🚀 New Features Highlights
AIBrix Management Console (Preview)
- Web-based control plane: A new React frontend + Go backend that lets users register models, deploy from reusable versioned templates, submit and track batch jobs, and download results — no
kubectl/YAML required. (#2094, #2095, #2176) - Model & template-centric UX:
ModelDeploymentTemplatewith model-centric workflows,CreateModelAPI with HDFS path support, and provider-agnostic templates. (#2141, #2144, #2175, #2214) - Enterprise & auth: OIDC login/callback with real user avatar rendering, MySQL backend, file proxy, and feature flags to gate Deployments/Playground. (#2100, #2177, #2178, #2187, #2188, #2314)
- Batch experience in console: job execution details, pagination, owner filtering, owner-only downloads, cursor-based listing, and an error-injection framework for resilience testing. (#2244, #2268, #2272, #2274, #2317, #2336)
- Storage backends: URI-based store factory with pure-Go SQLite driver and a hardened DB schema for self-hosted deployments. (#2174, #2182, #2208, #2209, #2212)
OpenAI-Compatible Batch API (Rebuilt for Production)
- Wire-compatible Batch API: Self-hosted async batch processing for
/v1/chat/completions,/v1/completions, and/v1/embeddings, backed by a persistent metadata store and an async job state machine. (#2136, #2147, #2185, #2203) - Config-driven deployment:
ModelDeploymentTemplate+BatchProfile, inline template specs end-to-end, and the optionalaibrix.model_templateextension to specify deployment. (#2134, #2207, #2236, #2306) - Execution engine rebuild: Reworked around a
Runtime+compute.providermodel (retiring kopf/JobCache), with SSH-launch runtimes for cloud providers and a smart client with transport retry. (#2257, #2261, #2267, #2339) - Robust job lifecycle: scheduling concurrency fixes, double-release prevention, job informer + pagination, and improved resilience across scheduler, console, and engine adapter. (#2217, #2218, #2226, #2240, #2270, #2322)
- New API surface: OpenAI Responses API support. (#2312)
Resource Manager & Cloud GPU Execution (Preview)
- Pluggable provider model: Resource Manager interfaces with a GORM-backed store and k8s-backed provisioning. (#2171, #2172, #2183)
- Cloud GPU providers: Lambda Cloud and RunPod via a registry/provider pattern, enabling batch jobs to burst to cloud GPUs. (#2248)
- Non-blocking planner: policy-plugin planner with exponential backoff on provisioning failures and provider-agnostic core. (#2239, #2280, #2319)
Multi-Engine Support (vLLM, SGLang, TensorRT-LLM)
- TensorRT-LLM as a first-class engine: tensor-rt inference engine support, TRT-LLM v1.1.0 metrics integration, and PD support for TRT-LLM 1.3.x. (#2000, #2005, #2043)
- Engine-aware routing & metrics: model validation and routing context carry engine information; per-engine metrics fixes for load- and KV-aware routing. (#2022, #2118)
- Cross-engine PD validation: PD disaggregation e2e tests across vLLM, SGLang, and TRT-LLM. (#2080)
- vLLM-Omni / multimodal: vLLM-Omni endpoints in mock + Dockerfile, multi-model per-service config, and v0.14.0 integration. (#2036, #2037, #2056, #2129)
KV-Cache-Centric P/D Disaggregation
- Unified KV data plane: L2 KVCache zero-copy APIs and vLLM v0.14.0 integration over a single
aibrix_kvcachesubstrate (L1 DRAM + pluggable L2, PrisKV production backend). (#2056, #2060) - Connectors:
AIBrixPDReuseConnectorwith prefix-cache support, per-pod KV connector type selection via pod labels, and Type2-inherits-Type1 connector refactor. (#2092, #2125, #2238) - Pluggable PD routing: pluggable prefill score policies (least_request, prefix_cache), configurable decode scorers, decode pod load balancing, and a
KVTransferAgentabstraction (Mooncake stub). (#2070, #2087, #2105, #2284) - PD refactors & hardening: split router into focused files, extracted
EngineHandler/PodSelectorandPrefillExecutor, and fixed stale-handle/slot_mapping issues in connector type2. (#2121, #2232, #2308, #2320)
Highly-Available Gateway with Composable Routing
- Cross-replica state sync: Redis-backed state sync for the in-memory gateway cache, aggregating running requests and prefix-cache state across gateway instances for consistent routing. (#1989, #2159)
- Composable, blendable routing: multi-strategy routing with normalized soft-scoring and weighting (e.g.
"least-request:2,throughput:1"), routing profiles, and a power-of-two router with request-tracker callbacks. (#1944, #2024, #2124) - Production hardening: per-model RPS rate limiting, always-on prefix-cache metrics, HTTPRoute status caching to drop per-request API calls, GOMAXPROCS tuning, and graceful ext_proc shutdown. (#2137, #2200, #2283, #2313, #2334)
📊 Feature Enhancements
- Local mode: Run gateway, router, and KV cache without Kubernetes, with Redis now optional and a local
/v1/modelsendpoint. (#2039, #2055, #2058) - Anthropic compatibility: New
/v1/messagesendpoint. (#2115) - OpenTelemetry tracing: Optional end-to-end tracing with upstream
x-request-idpreservation. (#2157, #2255, #2271) - Pluggable service discovery: Unified
Providerinterface (static / Consul / etcd) with refreshed static discovery. (#2034, #2035) - Autoscaling: KV cache usage percentage added to APA, plus documented PodAutoscaler annotations. (#2057, #2282)
- Chat app: backend service, Dockerfile/compose/k8s manifests, image attachments, edit/retry persistence and UI cleanup. (#1971, #1996, #2102, #2278)
- brixbench: benchmark provisioning harness for release validation and regression testing, with PD routing scenarios and docs. (#2165, #2273, #2298, #2352)
📦 Installation & Tooling & CI
- Helm: external Redis config with component-level password validation, controller-manager env support, router idleTimeout, and CRDs separated from operator manifests. (#2201, #2216, #2222, #2230, #2234)
- CI: build & preload AIBrix images into kind for chart-testing, multi-arch vllm-mock builds, reduced e2e workflow time, ruff bump/format, and CI action upgrades. (#2059, #2081, #2093, #2219, #2259, #2349)
- Docs: production gateway deployment guide, expanded routing/PD guides, vLLM semantic router integration, local-mode, console production setup, batch inference, and brixbench usage. (#2189, #2192, #2193, #2337, #2347, #2348)
🐞 Critical Bug Fixes
- Fix per-model metrics cross-talk on multi-model pods and drop duplicate metric-label sanitization. (#2228, #2331)
- Fix session-affinity routing by preserving the
x-session-idheader and prevent nil-pointer panic in request tracking on context cancellation. (#2122, #2338) - Fix several data races:
TreeNode.lastAccessin prefix cache, SLO router fallback init, and sharedSyncPrefixHashTableinstance. (#2096, #2106, #2327) - Fix
/v1/modelsreturning 404 without a trailing slash and removemin_tokensfrom PD prefill requests to avoid vLLM validation failure. (#2194, #2237) - Prevent goroutine leaks in periodical sync loops and make the gRPC max message size configurable via env var. (#2077, #2364)
- Clean up orphan resources when RoleSet
podGroupSizechanges. (#2131)
New Contributors
- @jasonlee-1024 made their first contribution in #1990
- @Lucas-Qian6 made their first contribution in #1996
- @xvchris made their first contribution in #2007
- @NJX-njx made their first contribution in #1982
- @DhyeyTr made their first contribution in #2057
- @gabrnavarro made their first contribution in #2069
- @tmchow made their first contribution in #2076
- @Peakpine made their first contribution in #2108
- @naroam1 made their first contribution in #2119
- @Yang1032 made their first contribution in #2122
- @DaveLi8086 made their first contribution in #2153
- @Genmin made their first contribution in #2168
- @ianliuy made their first contribution in #2118
- @HeyZackWang made their first contribution in #2157
- @zhutong196 made their first contribution in #2194
- @justinchen033 made their first contribution in #2226
- @Jing-ze made their first contribution in #2228
- @NelZyhh made their first contribution in #2230
- @JustAnotherDevv made their first contribution in #2259
- @xiaoyu-xyz made their first contribution in #2279
- @arnavnagzirkar made their first contribution in #2264
- @SarthakB11 made their first contribution in #2296
- @whalepark made their first contribution in #2165
- @JinKim48 made their first contribution in #2232
- @jan-stanek made their first contribution in #2312
- @V-3604 made their first contribution in #2331
- @DebugSy made their first contribution in #2131
What's Changed
Full Changelog: v0.6.0...v0.7.0
- feat: add sglang gateway metrics and gateway dashboard by @scarlet25151 in #1959
- feat: add support for routing-profiles by @varungup90 in #1944
- Feat: Support vllm new kvevent format by @penfree in #1962
- feat: add queue for prometheus query by @scarlet25151 in #1964
- fix: update metric name for routing-algorithms by @varungup90 in #1968
- fix: add shared path for the downloaded artifacts by @scarlet25151 in #1972
- fix: panic inconsistent label cardinality in emit metrics by @varungup90 in #1976
- chore: add s3 example by @scarlet25151 in #1988
- [feat] Add backend for chat app by @Jeffwan in #1971
- Misc: replace deprecated vllm entrypoint with vllm serve by @omerap12 in #1987
- Cut v0.6.0 release by @varungup90 in #1986
- Misc: replace deprecated vllm entrypoint with vllm serve by @omerap12 in #1991
- fix: add missing imports in chat app routers by @jasonlee-1024 in #1990
- [Bug]: broken binary search in GetSignature func by @omerap12 in #1993
- [API] Support image attachments in chat flow with backend images handling by @Lucas-Qian6 in #1996
- fix: gateway metrics initialization and nit refactoring in gateway.go by @varungup90 in #1997
- [Misc] use DescribeTable for GetSignature tests by @omerap12 in #1998
- feat: add tensor-rt inference engine support by @varungup90 in #2000
- [Bug]: fix flaky TestLRUStore_TTL by using injectable clock in Put by @xvchris in #2007
- feat: integrate trtllm v1.1.0 metrics by @varungup90 in #2005
- Samples and readme for audio endpoints by @dittops in #1973
- Fix: Support Chat Template Tokenization with vLLM Parameters in Prefix Cache Router by @penfree in #2002
- chore: enhance model validation and routing context with engine information by @varungup90 in #2022
- refactor: add AlgorithmConfig to ModelConfigProfile by @Jeffwan in #2027
- feat(batch): add multi-endpoint body validation and testing by @NJX-njx in #1982
- feat(metadata): introduce MetadataStore abstraction layer by @NJX-njx in #1981
- refactor: Update static service discovery by @Jeffwan in #2034
- [Misc] Fix ruff issues and address review comments by @Jeffwan in #2030
- feat(mock): add vLLM-Omni endpoint support to mock app by @Jeffwan in #2036
- refactor: per-service URL/key config for vLLM-Omni multi-model setup by @Jeffwan in #2037
- Feat: Support running AIBrix in local mode by @Jeffwan in #2039
- fix: crashloop issue in metadata service by @varungup90 in #2044
- feat: add pd support for trtllm 1.3.x by @varungup90 in #2043
- fix: for trtllm update input prompt with prompt_token_ids in /v1/completions by @varungup90 in #2047
- [App][API] Centralize default model names in config for easy switching by @Lucas-Qian6 in #2048
- Feat: Support RequestTracker callback & add power of two router by @penfree in #2024
- [bug] Converted tree from recursive to iterative by @Jeffwan in #2052
- [Feature] AIBrix L2 KVCache Zero-Copy APIs and vLLM v0.14.0 integration by @DwyaneShi in #2056
- feat: add /v1/models endpoint to gateway plugin for local mode by @Jeffwan in #2055
- [Feat] Make Redis optional in local mode by @Lucas-Qian6 in #2058
- [Bug] Add KV Cache Usage Percentage to APA by @DhyeyTr in #2057
- [Feature] vLLM integration by @DwyaneShi in #2060
- [bug] Fix flaky test TestRandomRouting by @googs1025 in #2059
- Fix: Non-blocking metrics worker pool by @Jeffwan in #2063
- Fix: Replace fmt.Sprintf("%d", n) with strconv.Itoa(n) by @gabrnavarro in #2069
- Fix lint and types error for apps/chat by @Jeffwan in #2072
- refactor: replace fmt.Sprintf("%d") with strconv by @tmchow in #2076
- fix(controller): handle io.ReadAll errors in lora_client.go by @tmchow in #2075
- fix(gateway): handle strconv.Atoi error in response header processing by @tmchow in #2074
- feat: improve decode pod load balancing in PD disaggregation by @varungup90 in #2070
- fix(controller): prevent goroutine leaks in periodical sync loops by @googs1025 in #2077
- fix(test): fix flaky TestPrefixCacheRouting by using distinct message prefix by @googs1025 in #2079
- [Misc]: PD disaggregation e2e tests (vLLM, SGLang, TRT-LLM) by @varungup90 in #2080
- Add lint and type check for multi-modality chat app by @Jeffwan in #2073
- [CI] Reduce installation e2e workflow time by @varungup90 in #2081
- perf(gateway): faster chat-completions request body validation by @varungup90 in #2084
- fix(test): fix flaky TestVTCHighUtilizationFairness by @googs1025 in #2083
- fix: Optimize MatchPrefix hot path with pre-sized result map and deferred percent calculation by @varungup90 in #2085
- refactor(pd): pluggable prefill score policy with least_request and prefix_cache impls by @varungup90 in #2087
- test(gateway): add PD disaggregation benchmark suite for routing hot paths by @varungup90 in #2088
- [Fix] PD reuse connector supports prefix cache enabled by @DwyaneShi in #2092
- fix preble prefix cache crashes from map race and other issues by @Jeffwan in #2091
- ci: build vllm-mock as multi-arch image (linux/amd64,linux/arm64) by @googs1025 in #2093
- feat: Unify service discovery with Provider interface by @Jeffwan in #2035
- [feat] Add AIBrix management console frontend by @Jeffwan in #2094
- fix: fix the failed after add mdoeladapter by @scarlet25151 in #2097
- [feat] Add backend service for aibrix console by @Jeffwan in #2095
- feat: add enterprise features to console (MySQL, auth, file proxy) by @Jeffwan in #2100
- fix(console): address review findings (CORS, ListJobs bug, interval cleanup) by @Jeffwan in #2101
- fix: ensure single shared SyncPrefixHashTable instance across Store a… by @penfree in #2096
- Add Dockerfile, docker-compose, and Kubernetes manifest for chat app by @Jeffwan in #2102
- Fix data race on TreeNode.lastAccess in prefix cache MatchPrefix path by @varungup90 in #2106
- feat: add configurable decode scorer policies by @varungup90 in #2105
- [Bug] Modify the way container environment variables are rendered by @Peakpine in #2108
- [Bug]: The podset built-in envs is placed before container envs.(#2113) by @Peakpine in #2114
- feat:ignore pods with label podGroupIndex > 0 (#2111) by @rayne-Li in #2112
- feat: add /v1/messages endpoint by @varungup90 in #2115
- refactor: split PD disaggregation router into focused files by @varungup90 in #2121
- fix: Dockerfile for new vLLM versions + implement get_num_new_matched_tokens by @naroam1 in #2119
- [Bug]: Fix session-affinity routing by preserving x-session-id header by @Yang1032 in #2122
- refactor(kv_connector): Type2 Connector inherits from Type1 Connector by @naroam1 in #2125
- Revert "fix: Dockerfile for new vLLM versions + implement get_num_new… by @DwyaneShi in #2132
- feat: add semantic routing e2e sample with Envoy ext_proc and vLLM backends by @varungup90 in #2120
- feat(batch): Config driven ModelDeploymentTemplate and BatchProfile by @Jeffwan in #2134
- feat(batch): expose OpenAI Batch usage + model fields, flatten state enum by @Jeffwan in #2136
- feat(console): make /api/v1/jobs a BFF over metadata service /v1/batches by @Jeffwan in #2139
- feat(console): introduce ModelDeploymentTemplate with model-centric UX by @Jeffwan in #2141
- refactor(batch): migrate extra_body.aibrix to nested structure by @Jeffwan in #2142
- feat(console): batch flow picks a deployment template after the model by @Jeffwan in #2144
- feat: replace k8s annotation with data store as source of truth by @Jeffwan in #2147
- Update OpenAI compatible file and batch interface tests by @Jeffwan in #2150
- feat(batch): --dry-run mode + fix request_counts.total by @Jeffwan in #2151
- refactor(batch): collapse BatchJobStore into batch metastore helpers by @Jeffwan in #2152
- feat(gateway): add per-model requests-per-second rate limiting by @varungup90 in #2137
- test: run metadata-service in --dry-run mode under config/test by @Jeffwan in #2155
- fix: update metric name in throughput routing strategy by @DaveLi8086 in #2153
- fix(storage): make S3 put_object work with non-tellable Readers by @Jeffwan in #2156
- fix(batch): unblock end-to-end K8s submission path by @Jeffwan in #2158
- chore(batch): drop dead K8s-API path from aibrix_batch_worker by @Jeffwan in #2160
- fix(batch): disable sevice links to avoid env-naming collision by @Jeffwan in #2161
- refactor(batch): always persist BatchJob to metastore by @Jeffwan in #2162
- fix(batch): compute usage from output file and add in_progress_at time by @Jeffwan in #2163
- feat(batch): Support worker level REDIS override by @Jeffwan in #2164
- feat(console/web): batch overrides and model/template UX by @Jeffwan in #2170
- chore: migrate OpenAI Go SDK to v3 by @Genmin in #2168
- feat(console/web): JSONL validation hardening, playground API wiring, template fix, nav cleanup by @Jeffwan in #2173
- feat(console): URI-based store factory, serving_name for JSONL validation by @Jeffwan in #2174
- refactor(console,batch): keep ModelDeploymentTemplate provider-agnostic by @Jeffwan in #2175
- feat(console): introduce URL-based routing with React Router and SPA fallback by @Jeffwan in #2176
- feat(console/auth): implement OIDC login and callback handlers by @Jeffwan in #2177
- chore(console): batch e2e plumbing + OIDC auth hardening by @Jeffwan in #2178
- feat(rm): resource manager interfaces by @DwyaneShi in #2171
- fix: resolve trtllm metrics showing undefined model_name and engine_type by @ianliuy in #2118
- feat(rm): provision result DB CRUD & GORM-backed store impl by @DwyaneShi in #2172
- console: Dockerfile + default sqlite store + BFF↔MDS HTTP logging by @Jeffwan in #2182
- feat(rm): k8s-backed resource manager by @DwyaneShi in #2183
- feat(batch): basic planner passthrough for integration test by @nwangfw in #2184
- console: show real user in sidebar/header, add login button when unauthenticated by @Jeffwan in #2187
- console: expose OIDC username + picture, render avatar by @Jeffwan in #2188
- feat(cache): aggregate running requests across gateway instances via Redis snapshots by @varungup90 in #2159
- docs: add production gateway deployment guide and expand routing algorithm docs by @varungup90 in #2189
- feat: support preserving upstream x-request-id for e2e tracing(#2157) by @HeyZackWang in #2157
- docs: add vLLM semantic router integration guide by @varungup90 in #2192
- chore: docs: restructure and expand gateway, PD disaggregation, and production deployment guides by @varungup90 in #2193
- feat(gateway): implement multi-strategy routing by @DaveLi8086 in #2124
- [Bugfix]: Remove min_tokens from PD prefill requests to avoid vLLM validation failure by @zhutong196 in #2194
- feat(batch): planner-batch-intergation test by @nwangfw in #2186
- fix(rm): fix provision store apis by @DwyaneShi in #2195
- optimize: gateway-plugin cpu usage optimize when stream is true #2196 by @rayne-Li in #2196
- perf(gateway): optimize GOMAXPROCS for K8s limits to reduce futex contention by @rayne-Li in #2200
- chart: add router idleTimeout in chart by @rayne-Li in #2201
- feat(batch): implement async planner for batch orchestration by @nwangfw in #2197
- Batch refactoring to support dynamic worker. Deployment can be used as job worker now. by @zhangjyr in #2185
- batch(console): persisted job state machine with MDS lazy sync by @Jeffwan in #2203
- fix[batch]: several job status and db issues by @Jeffwan in #2206
- batch: inline ModelDeploymentTemplate spec end-to-end by @Jeffwan in #2207
- fix(store): enhance db schema by @DwyaneShi in #2209
- fix(store): use pure go sqlite driver by @DwyaneShi in #2208
- fix(console): update key to snake_case by @nwangfw in #2211
- fix(store): fix schema by @DwyaneShi in #2212
- feat(console): add CreateModel API and support hdfs path by @Jeffwan in #2214
- feat: add state sync for in-memory cache of aibrix-gateway instances by @varungup90 in #1989
- ci(chart): build and preload Aibrix images into kind for chart-testing by @varungup90 in #2219
- chore(console): update gpu list and add provisioner config by @nwangfw in #2221
- [Bug] Fixed scheduling logic to avoid repeat job scheduling. by @zhangjyr in #2218
- [Bug] Job entity manager supports full async methods by @zhangjyr in #2217
- [Bug] Prevent double-release on submitted job cancellation by @justinchen033 in #2226
- bugfix: add completed check after process envoy request by @rayne-Li in #2225
- [Bug] Drop duplicate sanitizeMetricValueLabels call in worker by @Jing-ze in #2228
- feat(helm): support external Redis config and component-level password validation by @NelZyhh in #2230
- chart: optimize redis passwd logic in helper.tpl by @rayne-Li in #2234
- chart: add env for controller-manager by @rayne-Li in #2216
- [Bug] Fix /v1/models returning 404 without a trailing slash by @Jing-ze in #2237
- [Misc] Batch: apply inline template specs support by @zhangjyr in #2236
- [bug] fix console and planner job fetching interaction logics for terminal jobs by @nwangfw in #2235
- fix(batch): improve batch service resilience across scheduler, console, and engine adapter by @Jeffwan in #2240
- fix: fix unused fields of provision results by @DwyaneShi in #2242
- fix(chart): fix helm chart-testing CI failures by @Jeffwan in #2241
- fix(install): separate CRDs from operator manifests by @Jeffwan in #2222
- batch: upstreamable storage, drivers, resource schema, and model discovery by @Jeffwan in #2243
- refactor[planner] split provider-agnostic core from planner by @nwangfw in #2239
- feat(console): batch jobs list pagination, owner filter, and owner-only downloads by @Jeffwan in #2244
- feat(RM): add extension support by @DwyaneShi in #2245
- feat(RM): add Lambda Cloud & RunPod providers via registry/provider pattern by @Jeffwan in #2248
- refactor(rm): refactor k8s provider by @DwyaneShi in #2251
- fix(rm): make k8s clientset self-contained by @DwyaneShi in #2252
- [CI]Add vLLM-Omni support to Dockerfile.vllm and sample deployment by @Lucas-Qian6 in #2129
- [feat]: Support per-pod KV connector type selection via pod labels by @zhutong196 in #2238
- chore: increase warm up period for gateway state sync in e2e tests to prevent flakiness by @varungup90 in #2256
- refactor(batch): rebuild execution around Runtime + compute.provider; retire kopf/JobCache by @Jeffwan in #2257
- Refactor batch AIBrix runtime payload and Kubernetes execution flow by @Jeffwan in #2261
- feat(batch): support SSH-launch runtimes for cloud providers by @Jeffwan in #2267
- feat: add openTelemetry support by @rayne-Li in #2255
- docs: adjust doc level and add guide of enable openTelemetry by @rayne-Li in #2271
- feat(batch): Expose batch job execution details in console by @Jeffwan in #2268
- [CI] chore(python): bump ruff to 0.15.12 and apply format by @JustAnotherDevv in #2259
- [Bug] Fix scheduler concurrency scheduling by @zhangjyr in #2270
- fix(console): improve batch creation, file selection, and job controls by @Jeffwan in #2272
- fix(console): improve batch job creation UX by @Jeffwan in #2274
- Fix: lazy import redis dependencies in mds by @DwyaneShi in #2277
- fix(rm): use UTC time by @DwyaneShi in #2276
- fix(chat): edit/retry persistence, model selector, remove projects, UI cleanup by @Jeffwan in #2278
- [Misc] Bump Python Ruff dependency by @xiaoyu-xyz in #2279
- fix(gateway): interrupt idle ext_proc Recv on shutdown to fix slow pod termination by @varungup90 in #2283
- [Docs] Document PodAutoscaler annotations by @xiaoyu-xyz in #2282
- fix(batch): correct resource-failed job finalization and runtime display by @Jeffwan in #2291
- feat(batch): surface CREATED as 'scheduling' status by @Jeffwan in #2292
- docs: Examples should come with health and readiness checks by @arnavnagzirkar in #2264
- fix(console): order timeline events by lifecycle on same-second tie, fix dot colors by @Jeffwan in #2294
- fix(console): mark required fields in deployment template form by @Jeffwan in #2295
- feat(pd): introduce pluggable KVTransferAgent abstraction with Mooncake stub by @varungup90 in #2284
- feat(planner): non-blocking planner with policy plugin by @DwyaneShi in #2280
- [Bug] Validate lora_name in ArtifactDelegationService by @SarthakB11 in #2296
- fix(console): sort batch job list by creation time, page size 10 by @Jeffwan in #2299
- fix(console/batch): anchor batch model field to serving_name across the stack by @Jeffwan in #2300
- [Misc] Add brixbench benchmark module by @whalepark in #2165
- refactor(planner): refactor backend APIs by @DwyaneShi in #2303
- Chore(planner): fix provision result and enhance logging by @DwyaneShi in #2305
- fix(batch): accept aibrix.model on the request entry schema by @Jeffwan in #2306
- fix cn character error in auth header by @scarlet25151 in #2311
- feat(console/web): gate Deployments and Playground behind feature flags by @Jeffwan in #2314
- [Misc] Improve brixbench runner cleanup and vLLM argument handling by @whalepark in #2298
- [Misc] Add Qwen3-8B 4P4D PD routing benchmark scenarios by @whalepark in #2273
- chore: add license header to brixbench files by @varungup90 in #2315
- fix(console/web): page through all jobs via cursor instead of capping the list by @Jeffwan in #2317
- chore: fix race condition tests by @varungup90 in #2316
- fix(console): frontend passes through request count by @DwyaneShi in #2318
- [Misc] Add exponential backoff to provisioning failure. by @zhangjyr in #2319
- perf(gateway): cache HTTPRoute status to eliminate per-request Kubernetes API calls by @varungup90 in #2313
- feat(pd): extract EngineHandler and PodSelector abstractions for PD routing by @varungup90 in #2308
- chore: refine kvcache related dockerfile and docs by @DwyaneShi in #2297
- feat(rm): add time window to resource listing options by @DwyaneShi in #2325
- [Bug] Fix stale-handle KeyError and slot_mapping buffer overflow in connector type2 by @JinKim48 in #2232
- fix(console): add extraBody field to job struct by @DwyaneShi in #2321
- refactor(pd): extract PrefillExecutor into pd/prefill/ package by @varungup90 in #2320
- fix(gateway): fix SLO router fallback initialization race against global RouterManager by @varungup90 in #2327
- chore: nit fix in slo_test race condition test by @varungup90 in #2328
- [API] Add support for OpenAI Responses API by @jan-stanek in #2312
- Restore stand alone driver mode by @zhangjyr in #2323
- Decoupling redis client and redis libs. by @zhangjyr in #2324
- feat(gateway): always-on prefix cache metrics with routing selection, error, and load imbalance counters by @varungup90 in #2334
- [Docs] Update batch inference docs by @Jeffwan in #2337
- [Docs] Add local-mode doc and update stable install to v0.6.0 by @Jeffwan in #2347
- [Docs] Add console production setup docs by @Jeffwan in #2348
- chore: upgrade CI action versions by @Jeffwan in #2349
- Bump version to v0.7.0-rc.2 by @Jeffwan in #2350
- fix(race-test): slo queue router by @varungup90 in #2353
- fix(gateway): prevent nil pointer panic in request tracking on context cancellation by @Yang1032 in #2338
- [Docs] Add Brixbench usage documentation by @xiaoyu-xyz in #2352
- fix(docs): document AIBRIX_STATESYNC_ENABLED requirement and fix Helm chart env var name by @varungup90 in #2355
- fix(chart): remove redundant openTelemetry provider and prioritize backendRefs by @rayne-Li in #2356
- [docs] Add TRT-LLM support to multi-engine page by @varungup90 in #2357
- feat(batch): Enabling job informer + job list pagination by @zhangjyr in #2322
- feat(console): error injection framework by @DwyaneShi in #2336
- feat(batch): Add batch smart client transport retry foundation by @Jeffwan in #2339
- Fix smart client regression by @Jeffwan in #2361
- fix: fix error injection's lint errors by @DwyaneShi in #2363
- [Bug] Fix per-model metrics cross-talk on multi-model pods by @V-3604 in #2331
- fix(gateway): make gRPC max message size configurable via env var by @varungup90 in #2364
- fix(roleset): cleanup orphan resources when podGroupSize changes by @DebugSy in #2131
- [Misc] Fix the gofmt issue by @Jeffwan in #2369
- Bump version to v0.7.0-rc.3 by @Jeffwan in #2370
- Bump version to v0.7.0 by @Jeffwan in #2371