Release v0.7.0 · vllm-project/aibrix

AIBrix v0.7.0 is here! This release lands 242 merged PRs over three months and pushes AIBrix toward a composable, self-service inference platform. The theme this cycle is composability across the operational, workload, engine, and gateway layers: a new web Console for self-service operations, a production OpenAI-compatible Batch API, first-class multi-engine support (vLLM, SGLang, TensorRT-LLM), a KV-cache-centric P/D disaggregation data plane, and a highly-available gateway with pluggable, blendable routing.

📖 Read the full release blog: https://aibrix.github.io/posts/2026-06-16-v0.7.0-release/

⚠️ Maturity note: The Console, Batch API, and Resource Manager / Cloud GPU execution are new or rebuilt in this cycle and are evolving quickly. Treat them as preview features for now — APIs and behavior may change in v0.8.0.

🚀 New Features Highlights

AIBrix Management Console (Preview)

Web-based control plane: A new React frontend + Go backend that lets users register models, deploy from reusable versioned templates, submit and track batch jobs, and download results — no kubectl/YAML required. (#2094, #2095, #2176)
Model & template-centric UX: ModelDeploymentTemplate with model-centric workflows, CreateModel API with HDFS path support, and provider-agnostic templates. (#2141, #2144, #2175, #2214)
Enterprise & auth: OIDC login/callback with real user avatar rendering, MySQL backend, file proxy, and feature flags to gate Deployments/Playground. (#2100, #2177, #2178, #2187, #2188, #2314)
Batch experience in console: job execution details, pagination, owner filtering, owner-only downloads, cursor-based listing, and an error-injection framework for resilience testing. (#2244, #2268, #2272, #2274, #2317, #2336)
Storage backends: URI-based store factory with pure-Go SQLite driver and a hardened DB schema for self-hosted deployments. (#2174, #2182, #2208, #2209, #2212)

OpenAI-Compatible Batch API (Rebuilt for Production)

Wire-compatible Batch API: Self-hosted async batch processing for /v1/chat/completions, /v1/completions, and /v1/embeddings, backed by a persistent metadata store and an async job state machine. (#2136, #2147, #2185, #2203)
Config-driven deployment: ModelDeploymentTemplate + BatchProfile, inline template specs end-to-end, and the optional aibrix.model_template extension to specify deployment. (#2134, #2207, #2236, #2306)
Execution engine rebuild: Reworked around a Runtime + compute.provider model (retiring kopf/JobCache), with SSH-launch runtimes for cloud providers and a smart client with transport retry. (#2257, #2261, #2267, #2339)
Robust job lifecycle: scheduling concurrency fixes, double-release prevention, job informer + pagination, and improved resilience across scheduler, console, and engine adapter. (#2217, #2218, #2226, #2240, #2270, #2322)
New API surface: OpenAI Responses API support. (#2312)

Resource Manager & Cloud GPU Execution (Preview)

Pluggable provider model: Resource Manager interfaces with a GORM-backed store and k8s-backed provisioning. (#2171, #2172, #2183)
Cloud GPU providers: Lambda Cloud and RunPod via a registry/provider pattern, enabling batch jobs to burst to cloud GPUs. (#2248)
Non-blocking planner: policy-plugin planner with exponential backoff on provisioning failures and provider-agnostic core. (#2239, #2280, #2319)

Multi-Engine Support (vLLM, SGLang, TensorRT-LLM)

TensorRT-LLM as a first-class engine: tensor-rt inference engine support, TRT-LLM v1.1.0 metrics integration, and PD support for TRT-LLM 1.3.x. (#2000, #2005, #2043)
Engine-aware routing & metrics: model validation and routing context carry engine information; per-engine metrics fixes for load- and KV-aware routing. (#2022, #2118)
Cross-engine PD validation: PD disaggregation e2e tests across vLLM, SGLang, and TRT-LLM. (#2080)
vLLM-Omni / multimodal: vLLM-Omni endpoints in mock + Dockerfile, multi-model per-service config, and v0.14.0 integration. (#2036, #2037, #2056, #2129)

KV-Cache-Centric P/D Disaggregation

Unified KV data plane: L2 KVCache zero-copy APIs and vLLM v0.14.0 integration over a single aibrix_kvcache substrate (L1 DRAM + pluggable L2, PrisKV production backend). (#2056, #2060)
Connectors: AIBrixPDReuseConnector with prefix-cache support, per-pod KV connector type selection via pod labels, and Type2-inherits-Type1 connector refactor. (#2092, #2125, #2238)
Pluggable PD routing: pluggable prefill score policies (least_request, prefix_cache), configurable decode scorers, decode pod load balancing, and a KVTransferAgent abstraction (Mooncake stub). (#2070, #2087, #2105, #2284)
PD refactors & hardening: split router into focused files, extracted EngineHandler/PodSelector and PrefillExecutor, and fixed stale-handle/slot_mapping issues in connector type2. (#2121, #2232, #2308, #2320)

Highly-Available Gateway with Composable Routing

Cross-replica state sync: Redis-backed state sync for the in-memory gateway cache, aggregating running requests and prefix-cache state across gateway instances for consistent routing. (#1989, #2159)
Composable, blendable routing: multi-strategy routing with normalized soft-scoring and weighting (e.g. "least-request:2,throughput:1"), routing profiles, and a power-of-two router with request-tracker callbacks. (#1944, #2024, #2124)
Production hardening: per-model RPS rate limiting, always-on prefix-cache metrics, HTTPRoute status caching to drop per-request API calls, GOMAXPROCS tuning, and graceful ext_proc shutdown. (#2137, #2200, #2283, #2313, #2334)

📊 Feature Enhancements

Local mode: Run gateway, router, and KV cache without Kubernetes, with Redis now optional and a local /v1/models endpoint. (#2039, #2055, #2058)
Anthropic compatibility: New /v1/messages endpoint. (#2115)
OpenTelemetry tracing: Optional end-to-end tracing with upstream x-request-id preservation. (#2157, #2255, #2271)
Pluggable service discovery: Unified Provider interface (static / Consul / etcd) with refreshed static discovery. (#2034, #2035)
Autoscaling: KV cache usage percentage added to APA, plus documented PodAutoscaler annotations. (#2057, #2282)
Chat app: backend service, Dockerfile/compose/k8s manifests, image attachments, edit/retry persistence and UI cleanup. (#1971, #1996, #2102, #2278)
brixbench: benchmark provisioning harness for release validation and regression testing, with PD routing scenarios and docs. (#2165, #2273, #2298, #2352)

📦 Installation & Tooling & CI

Helm: external Redis config with component-level password validation, controller-manager env support, router idleTimeout, and CRDs separated from operator manifests. (#2201, #2216, #2222, #2230, #2234)
CI: build & preload AIBrix images into kind for chart-testing, multi-arch vllm-mock builds, reduced e2e workflow time, ruff bump/format, and CI action upgrades. (#2059, #2081, #2093, #2219, #2259, #2349)
Docs: production gateway deployment guide, expanded routing/PD guides, vLLM semantic router integration, local-mode, console production setup, batch inference, and brixbench usage. (#2189, #2192, #2193, #2337, #2347, #2348)

🐞 Critical Bug Fixes

Fix per-model metrics cross-talk on multi-model pods and drop duplicate metric-label sanitization. (#2228, #2331)
Fix session-affinity routing by preserving the x-session-id header and prevent nil-pointer panic in request tracking on context cancellation. (#2122, #2338)
Fix several data races: TreeNode.lastAccess in prefix cache, SLO router fallback init, and shared SyncPrefixHashTable instance. (#2096, #2106, #2327)
Fix /v1/models returning 404 without a trailing slash and remove min_tokens from PD prefill requests to avoid vLLM validation failure. (#2194, #2237)
Prevent goroutine leaks in periodical sync loops and make the gRPC max message size configurable via env var. (#2077, #2364)
Clean up orphan resources when RoleSet podGroupSize changes. (#2131)

New Contributors

@jasonlee-1024 made their first contribution in #1990
@Lucas-Qian6 made their first contribution in #1996
@xvchris made their first contribution in #2007
@NJX-njx made their first contribution in #1982
@DhyeyTr made their first contribution in #2057
@gabrnavarro made their first contribution in #2069
@tmchow made their first contribution in #2076
@Peakpine made their first contribution in #2108
@naroam1 made their first contribution in #2119
@Yang1032 made their first contribution in #2122
@DaveLi8086 made their first contribution in #2153
@Genmin made their first contribution in #2168
@ianliuy made their first contribution in #2118
@HeyZackWang made their first contribution in #2157
@zhutong196 made their first contribution in #2194
@justinchen033 made their first contribution in #2226
@Jing-ze made their first contribution in #2228
@NelZyhh made their first contribution in #2230
@JustAnotherDevv made their first contribution in #2259
@xiaoyu-xyz made their first contribution in #2279
@arnavnagzirkar made their first contribution in #2264
@SarthakB11 made their first contribution in #2296
@whalepark made their first contribution in #2165
@JinKim48 made their first contribution in #2232
@jan-stanek made their first contribution in #2312
@V-3604 made their first contribution in #2331
@DebugSy made their first contribution in #2131

What's Changed

Full Changelog: v0.6.0...v0.7.0

feat: add sglang gateway metrics and gateway dashboard by @scarlet25151 in #1959
feat: add support for routing-profiles by @varungup90 in #1944
Feat: Support vllm new kvevent format by @penfree in #1962
feat: add queue for prometheus query by @scarlet25151 in #1964
fix: update metric name for routing-algorithms by @varungup90 in #1968
fix: add shared path for the downloaded artifacts by @scarlet25151 in #1972
fix: panic inconsistent label cardinality in emit metrics by @varungup90 in #1976
chore: add s3 example by @scarlet25151 in #1988
[feat] Add backend for chat app by @Jeffwan in #1971
Misc: replace deprecated vllm entrypoint with vllm serve by @omerap12 in #1987
Cut v0.6.0 release by @varungup90 in #1986
Misc: replace deprecated vllm entrypoint with vllm serve by @omerap12 in #1991
fix: add missing imports in chat app routers by @jasonlee-1024 in #1990
[Bug]: broken binary search in GetSignature func by @omerap12 in #1993
[API] Support image attachments in chat flow with backend images handling by @Lucas-Qian6 in #1996
fix: gateway metrics initialization and nit refactoring in gateway.go by @varungup90 in #1997
[Misc] use DescribeTable for GetSignature tests by @omerap12 in #1998
feat: add tensor-rt inference engine support by @varungup90 in #2000
[Bug]: fix flaky TestLRUStore_TTL by using injectable clock in Put by @xvchris in #2007
feat: integrate trtllm v1.1.0 metrics by @varungup90 in #2005
Samples and readme for audio endpoints by @dittops in #1973
Fix: Support Chat Template Tokenization with vLLM Parameters in Prefix Cache Router by @penfree in #2002
chore: enhance model validation and routing context with engine information by @varungup90 in #2022
refactor: add AlgorithmConfig to ModelConfigProfile by @Jeffwan in #2027
feat(batch): add multi-endpoint body validation and testing by @NJX-njx in #1982
feat(metadata): introduce MetadataStore abstraction layer by @NJX-njx in #1981
refactor: Update static service discovery by @Jeffwan in #2034
[Misc] Fix ruff issues and address review comments by @Jeffwan in #2030
feat(mock): add vLLM-Omni endpoint support to mock app by @Jeffwan in #2036
refactor: per-service URL/key config for vLLM-Omni multi-model setup by @Jeffwan in #2037
Feat: Support running AIBrix in local mode by @Jeffwan in #2039
fix: crashloop issue in metadata service by @varungup90 in #2044
feat: add pd support for trtllm 1.3.x by @varungup90 in #2043
fix: for trtllm update input prompt with prompt_token_ids in /v1/completions by @varungup90 in #2047
[App][API] Centralize default model names in config for easy switching by @Lucas-Qian6 in #2048
Feat: Support RequestTracker callback & add power of two router by @penfree in #2024
[bug] Converted tree from recursive to iterative by @Jeffwan in #2052
[Feature] AIBrix L2 KVCache Zero-Copy APIs and vLLM v0.14.0 integration by @DwyaneShi in #2056
feat: add /v1/models endpoint to gateway plugin for local mode by @Jeffwan in #2055
[Feat] Make Redis optional in local mode by @Lucas-Qian6 in #2058
[Bug] Add KV Cache Usage Percentage to APA by @DhyeyTr in #2057
[Feature] vLLM integration by @DwyaneShi in #2060
[bug] Fix flaky test TestRandomRouting by @googs1025 in #2059
Fix: Non-blocking metrics worker pool by @Jeffwan in #2063
Fix: Replace fmt.Sprintf("%d", n) with strconv.Itoa(n) by @gabrnavarro in #2069
Fix lint and types error for apps/chat by @Jeffwan in #2072
refactor: replace fmt.Sprintf("%d") with strconv by @tmchow in #2076
fix(controller): handle io.ReadAll errors in lora_client.go by @tmchow in #2075
fix(gateway): handle strconv.Atoi error in response header processing by @tmchow in #2074
feat: improve decode pod load balancing in PD disaggregation by @varungup90 in #2070
fix(controller): prevent goroutine leaks in periodical sync loops by @googs1025 in #2077
fix(test): fix flaky TestPrefixCacheRouting by using distinct message prefix by @googs1025 in #2079
[Misc]: PD disaggregation e2e tests (vLLM, SGLang, TRT-LLM) by @varungup90 in #2080
Add lint and type check for multi-modality chat app by @Jeffwan in #2073
[CI] Reduce installation e2e workflow time by @varungup90 in #2081
perf(gateway): faster chat-completions request body validation by @varungup90 in #2084
fix(test): fix flaky TestVTCHighUtilizationFairness by @googs1025 in #2083
fix: Optimize MatchPrefix hot path with pre-sized result map and deferred percent calculation by @varungup90 in #2085
refactor(pd): pluggable prefill score policy with least_request and prefix_cache impls by @varungup90 in #2087
test(gateway): add PD disaggregation benchmark suite for routing hot paths by @varungup90 in #2088
[Fix] PD reuse connector supports prefix cache enabled by @DwyaneShi in #2092
fix preble prefix cache crashes from map race and other issues by @Jeffwan in #2091
ci: build vllm-mock as multi-arch image (linux/amd64,linux/arm64) by @googs1025 in #2093
feat: Unify service discovery with Provider interface by @Jeffwan in #2035
[feat] Add AIBrix management console frontend by @Jeffwan in #2094
fix: fix the failed after add mdoeladapter by @scarlet25151 in #2097
[feat] Add backend service for aibrix console by @Jeffwan in #2095
feat: add enterprise features to console (MySQL, auth, file proxy) by @Jeffwan in #2100
fix(console): address review findings (CORS, ListJobs bug, interval cleanup) by @Jeffwan in #2101
fix: ensure single shared SyncPrefixHashTable instance across Store a… by @penfree in #2096
Add Dockerfile, docker-compose, and Kubernetes manifest for chat app by @Jeffwan in #2102
Fix data race on TreeNode.lastAccess in prefix cache MatchPrefix path by @varungup90 in #2106
feat: add configurable decode scorer policies by @varungup90 in #2105
[Bug] Modify the way container environment variables are rendered by @Peakpine in #2108
[Bug]: The podset built-in envs is placed before container envs.(#2113) by @Peakpine in #2114
feat:ignore pods with label podGroupIndex > 0 (#2111) by @rayne-Li in #2112
feat: add /v1/messages endpoint by @varungup90 in #2115
refactor: split PD disaggregation router into focused files by @varungup90 in #2121
fix: Dockerfile for new vLLM versions + implement get_num_new_matched_tokens by @naroam1 in #2119
[Bug]: Fix session-affinity routing by preserving x-session-id header by @Yang1032 in #2122
refactor(kv_connector): Type2 Connector inherits from Type1 Connector by @naroam1 in #2125
Revert "fix: Dockerfile for new vLLM versions + implement get_num_new… by @DwyaneShi in #2132
feat: add semantic routing e2e sample with Envoy ext_proc and vLLM backends by @varungup90 in #2120
feat(batch): Config driven ModelDeploymentTemplate and BatchProfile by @Jeffwan in #2134
feat(batch): expose OpenAI Batch usage + model fields, flatten state enum by @Jeffwan in #2136
feat(console): make /api/v1/jobs a BFF over metadata service /v1/batches by @Jeffwan in #2139
feat(console): introduce ModelDeploymentTemplate with model-centric UX by @Jeffwan in #2141
refactor(batch): migrate extra_body.aibrix to nested structure by @Jeffwan in #2142
feat(console): batch flow picks a deployment template after the model by @Jeffwan in #2144
feat: replace k8s annotation with data store as source of truth by @Jeffwan in #2147
Update OpenAI compatible file and batch interface tests by @Jeffwan in #2150
feat(batch): --dry-run mode + fix request_counts.total by @Jeffwan in #2151
refactor(batch): collapse BatchJobStore into batch metastore helpers by @Jeffwan in #2152
feat(gateway): add per-model requests-per-second rate limiting by @varungup90 in #2137
test: run metadata-service in --dry-run mode under config/test by @Jeffwan in #2155
fix: update metric name in throughput routing strategy by @DaveLi8086 in #2153
fix(storage): make S3 put_object work with non-tellable Readers by @Jeffwan in #2156
fix(batch): unblock end-to-end K8s submission path by @Jeffwan in #2158
chore(batch): drop dead K8s-API path from aibrix_batch_worker by @Jeffwan in #2160
fix(batch): disable sevice links to avoid env-naming collision by @Jeffwan in #2161
refactor(batch): always persist BatchJob to metastore by @Jeffwan in #2162
fix(batch): compute usage from output file and add in_progress_at time by @Jeffwan in #2163
feat(batch): Support worker level REDIS override by @Jeffwan in #2164
feat(console/web): batch overrides and model/template UX by @Jeffwan in #2170
chore: migrate OpenAI Go SDK to v3 by @Genmin in #2168
feat(console/web): JSONL validation hardening, playground API wiring, template fix, nav cleanup by @Jeffwan in #2173
feat(console): URI-based store factory, serving_name for JSONL validation by @Jeffwan in #2174
refactor(console,batch): keep ModelDeploymentTemplate provider-agnostic by @Jeffwan in #2175
feat(console): introduce URL-based routing with React Router and SPA fallback by @Jeffwan in #2176
feat(console/auth): implement OIDC login and callback handlers by @Jeffwan in #2177
chore(console): batch e2e plumbing + OIDC auth hardening by @Jeffwan in #2178
feat(rm): resource manager interfaces by @DwyaneShi in #2171
fix: resolve trtllm metrics showing undefined model_name and engine_type by @ianliuy in #2118
feat(rm): provision result DB CRUD & GORM-backed store impl by @DwyaneShi in #2172
console: Dockerfile + default sqlite store + BFF↔MDS HTTP logging by @Jeffwan in #2182
feat(rm): k8s-backed resource manager by @DwyaneShi in #2183
feat(batch): basic planner passthrough for integration test by @nwangfw in #2184
console: show real user in sidebar/header, add login button when unauthenticated by @Jeffwan in #2187
console: expose OIDC username + picture, render avatar by @Jeffwan in #2188
feat(cache): aggregate running requests across gateway instances via Redis snapshots by @varungup90 in #2159
docs: add production gateway deployment guide and expand routing algorithm docs by @varungup90 in #2189
feat: support preserving upstream x-request-id for e2e tracing(#2157) by @HeyZackWang in #2157
docs: add vLLM semantic router integration guide by @varungup90 in #2192
chore: docs: restructure and expand gateway, PD disaggregation, and production deployment guides by @varungup90 in #2193
feat(gateway): implement multi-strategy routing by @DaveLi8086 in #2124
[Bugfix]: Remove min_tokens from PD prefill requests to avoid vLLM validation failure by @zhutong196 in #2194
feat(batch): planner-batch-intergation test by @nwangfw in #2186
fix(rm): fix provision store apis by @DwyaneShi in #2195
optimize: gateway-plugin cpu usage optimize when stream is true #2196 by @rayne-Li in #2196
perf(gateway): optimize GOMAXPROCS for K8s limits to reduce futex contention by @rayne-Li in #2200
chart: add router idleTimeout in chart by @rayne-Li in #2201
feat(batch): implement async planner for batch orchestration by @nwangfw in #2197
Batch refactoring to support dynamic worker. Deployment can be used as job worker now. by @zhangjyr in #2185
batch(console): persisted job state machine with MDS lazy sync by @Jeffwan in #2203
fix[batch]: several job status and db issues by @Jeffwan in #2206
batch: inline ModelDeploymentTemplate spec end-to-end by @Jeffwan in #2207
fix(store): enhance db schema by @DwyaneShi in #2209
fix(store): use pure go sqlite driver by @DwyaneShi in #2208
fix(console): update key to snake_case by @nwangfw in #2211
fix(store): fix schema by @DwyaneShi in #2212
feat(console): add CreateModel API and support hdfs path by @Jeffwan in #2214
feat: add state sync for in-memory cache of aibrix-gateway instances by @varungup90 in #1989
ci(chart): build and preload Aibrix images into kind for chart-testing by @varungup90 in #2219
chore(console): update gpu list and add provisioner config by @nwangfw in #2221
[Bug] Fixed scheduling logic to avoid repeat job scheduling. by @zhangjyr in #2218
[Bug] Job entity manager supports full async methods by @zhangjyr in #2217
[Bug] Prevent double-release on submitted job cancellation by @justinchen033 in #2226
bugfix: add completed check after process envoy request by @rayne-Li in #2225
[Bug] Drop duplicate sanitizeMetricValueLabels call in worker by @Jing-ze in #2228
feat(helm): support external Redis config and component-level password validation by @NelZyhh in #2230
chart: optimize redis passwd logic in helper.tpl by @rayne-Li in #2234
chart: add env for controller-manager by @rayne-Li in #2216
[Bug] Fix /v1/models returning 404 without a trailing slash by @Jing-ze in #2237
[Misc] Batch: apply inline template specs support by @zhangjyr in #2236
[bug] fix console and planner job fetching interaction logics for terminal jobs by @nwangfw in #2235
fix(batch): improve batch service resilience across scheduler, console, and engine adapter by @Jeffwan in #2240
fix: fix unused fields of provision results by @DwyaneShi in #2242
fix(chart): fix helm chart-testing CI failures by @Jeffwan in #2241
fix(install): separate CRDs from operator manifests by @Jeffwan in #2222
batch: upstreamable storage, drivers, resource schema, and model discovery by @Jeffwan in #2243
refactor[planner] split provider-agnostic core from planner by @nwangfw in #2239
feat(console): batch jobs list pagination, owner filter, and owner-only downloads by @Jeffwan in #2244
feat(RM): add extension support by @DwyaneShi in #2245
feat(RM): add Lambda Cloud & RunPod providers via registry/provider pattern by @Jeffwan in #2248
refactor(rm): refactor k8s provider by @DwyaneShi in #2251
fix(rm): make k8s clientset self-contained by @DwyaneShi in #2252
[CI]Add vLLM-Omni support to Dockerfile.vllm and sample deployment by @Lucas-Qian6 in #2129
[feat]: Support per-pod KV connector type selection via pod labels by @zhutong196 in #2238
chore: increase warm up period for gateway state sync in e2e tests to prevent flakiness by @varungup90 in #2256
refactor(batch): rebuild execution around Runtime + compute.provider; retire kopf/JobCache by @Jeffwan in #2257
Refactor batch AIBrix runtime payload and Kubernetes execution flow by @Jeffwan in #2261
feat(batch): support SSH-launch runtimes for cloud providers by @Jeffwan in #2267
feat: add openTelemetry support by @rayne-Li in #2255
docs: adjust doc level and add guide of enable openTelemetry by @rayne-Li in #2271
feat(batch): Expose batch job execution details in console by @Jeffwan in #2268
[CI] chore(python): bump ruff to 0.15.12 and apply format by @JustAnotherDevv in #2259
[Bug] Fix scheduler concurrency scheduling by @zhangjyr in #2270
fix(console): improve batch creation, file selection, and job controls by @Jeffwan in #2272
fix(console): improve batch job creation UX by @Jeffwan in #2274
Fix: lazy import redis dependencies in mds by @DwyaneShi in #2277
fix(rm): use UTC time by @DwyaneShi in #2276
fix(chat): edit/retry persistence, model selector, remove projects, UI cleanup by @Jeffwan in #2278
[Misc] Bump Python Ruff dependency by @xiaoyu-xyz in #2279
fix(gateway): interrupt idle ext_proc Recv on shutdown to fix slow pod termination by @varungup90 in #2283
[Docs] Document PodAutoscaler annotations by @xiaoyu-xyz in #2282
fix(batch): correct resource-failed job finalization and runtime display by @Jeffwan in #2291
feat(batch): surface CREATED as 'scheduling' status by @Jeffwan in #2292
docs: Examples should come with health and readiness checks by @arnavnagzirkar in #2264
fix(console): order timeline events by lifecycle on same-second tie, fix dot colors by @Jeffwan in #2294
fix(console): mark required fields in deployment template form by @Jeffwan in #2295
feat(pd): introduce pluggable KVTransferAgent abstraction with Mooncake stub by @varungup90 in #2284
feat(planner): non-blocking planner with policy plugin by @DwyaneShi in #2280
[Bug] Validate lora_name in ArtifactDelegationService by @SarthakB11 in #2296
fix(console): sort batch job list by creation time, page size 10 by @Jeffwan in #2299
fix(console/batch): anchor batch model field to serving_name across the stack by @Jeffwan in #2300
[Misc] Add brixbench benchmark module by @whalepark in #2165
refactor(planner): refactor backend APIs by @DwyaneShi in #2303
Chore(planner): fix provision result and enhance logging by @DwyaneShi in #2305
fix(batch): accept aibrix.model on the request entry schema by @Jeffwan in #2306
fix cn character error in auth header by @scarlet25151 in #2311
feat(console/web): gate Deployments and Playground behind feature flags by @Jeffwan in #2314
[Misc] Improve brixbench runner cleanup and vLLM argument handling by @whalepark in #2298
[Misc] Add Qwen3-8B 4P4D PD routing benchmark scenarios by @whalepark in #2273
chore: add license header to brixbench files by @varungup90 in #2315
fix(console/web): page through all jobs via cursor instead of capping the list by @Jeffwan in #2317
chore: fix race condition tests by @varungup90 in #2316
fix(console): frontend passes through request count by @DwyaneShi in #2318
[Misc] Add exponential backoff to provisioning failure. by @zhangjyr in #2319
perf(gateway): cache HTTPRoute status to eliminate per-request Kubernetes API calls by @varungup90 in #2313
feat(pd): extract EngineHandler and PodSelector abstractions for PD routing by @varungup90 in #2308
chore: refine kvcache related dockerfile and docs by @DwyaneShi in #2297
feat(rm): add time window to resource listing options by @DwyaneShi in #2325
[Bug] Fix stale-handle KeyError and slot_mapping buffer overflow in connector type2 by @JinKim48 in #2232
fix(console): add extraBody field to job struct by @DwyaneShi in #2321
refactor(pd): extract PrefillExecutor into pd/prefill/ package by @varungup90 in #2320
fix(gateway): fix SLO router fallback initialization race against global RouterManager by @varungup90 in #2327
chore: nit fix in slo_test race condition test by @varungup90 in #2328
[API] Add support for OpenAI Responses API by @jan-stanek in #2312
Restore stand alone driver mode by @zhangjyr in #2323
Decoupling redis client and redis libs. by @zhangjyr in #2324
feat(gateway): always-on prefix cache metrics with routing selection, error, and load imbalance counters by @varungup90 in #2334
[Docs] Update batch inference docs by @Jeffwan in #2337
[Docs] Add local-mode doc and update stable install to v0.6.0 by @Jeffwan in #2347
[Docs] Add console production setup docs by @Jeffwan in #2348
chore: upgrade CI action versions by @Jeffwan in #2349
Bump version to v0.7.0-rc.2 by @Jeffwan in #2350
fix(race-test): slo queue router by @varungup90 in #2353
fix(gateway): prevent nil pointer panic in request tracking on context cancellation by @Yang1032 in #2338
[Docs] Add Brixbench usage documentation by @xiaoyu-xyz in #2352
fix(docs): document AIBRIX_STATESYNC_ENABLED requirement and fix Helm chart env var name by @varungup90 in #2355
fix(chart): remove redundant openTelemetry provider and prioritize backendRefs by @rayne-Li in #2356
[docs] Add TRT-LLM support to multi-engine page by @varungup90 in #2357
feat(batch): Enabling job informer + job list pagination by @zhangjyr in #2322
feat(console): error injection framework by @DwyaneShi in #2336
feat(batch): Add batch smart client transport retry foundation by @Jeffwan in #2339
Fix smart client regression by @Jeffwan in #2361
fix: fix error injection's lint errors by @DwyaneShi in #2363
[Bug] Fix per-model metrics cross-talk on multi-model pods by @V-3604 in #2331
fix(gateway): make gRPC max message size configurable via env var by @varungup90 in #2364
fix(roleset): cleanup orphan resources when podGroupSize changes by @DebugSy in #2131
[Misc] Fix the gofmt issue by @Jeffwan in #2369
Bump version to v0.7.0-rc.3 by @Jeffwan in #2370
Bump version to v0.7.0 by @Jeffwan in #2371

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

v0.7.0

Choose a tag to compare

Sorry, something went wrong.