Skip to content

v0.7.0

Latest

Choose a tag to compare

@github-actions github-actions released this 18 Jun 00:23
c546589

AIBrix v0.7.0 is here! This release lands 242 merged PRs over three months and pushes AIBrix toward a composable, self-service inference platform. The theme this cycle is composability across the operational, workload, engine, and gateway layers: a new web Console for self-service operations, a production OpenAI-compatible Batch API, first-class multi-engine support (vLLM, SGLang, TensorRT-LLM), a KV-cache-centric P/D disaggregation data plane, and a highly-available gateway with pluggable, blendable routing.

📖 Read the full release blog: https://aibrix.github.io/posts/2026-06-16-v0.7.0-release/

⚠️ Maturity note: The Console, Batch API, and Resource Manager / Cloud GPU execution are new or rebuilt in this cycle and are evolving quickly. Treat them as preview features for now — APIs and behavior may change in v0.8.0.

🚀 New Features Highlights

AIBrix Management Console (Preview)

  • Web-based control plane: A new React frontend + Go backend that lets users register models, deploy from reusable versioned templates, submit and track batch jobs, and download results — no kubectl/YAML required. (#2094, #2095, #2176)
  • Model & template-centric UX: ModelDeploymentTemplate with model-centric workflows, CreateModel API with HDFS path support, and provider-agnostic templates. (#2141, #2144, #2175, #2214)
  • Enterprise & auth: OIDC login/callback with real user avatar rendering, MySQL backend, file proxy, and feature flags to gate Deployments/Playground. (#2100, #2177, #2178, #2187, #2188, #2314)
  • Batch experience in console: job execution details, pagination, owner filtering, owner-only downloads, cursor-based listing, and an error-injection framework for resilience testing. (#2244, #2268, #2272, #2274, #2317, #2336)
  • Storage backends: URI-based store factory with pure-Go SQLite driver and a hardened DB schema for self-hosted deployments. (#2174, #2182, #2208, #2209, #2212)

OpenAI-Compatible Batch API (Rebuilt for Production)

  • Wire-compatible Batch API: Self-hosted async batch processing for /v1/chat/completions, /v1/completions, and /v1/embeddings, backed by a persistent metadata store and an async job state machine. (#2136, #2147, #2185, #2203)
  • Config-driven deployment: ModelDeploymentTemplate + BatchProfile, inline template specs end-to-end, and the optional aibrix.model_template extension to specify deployment. (#2134, #2207, #2236, #2306)
  • Execution engine rebuild: Reworked around a Runtime + compute.provider model (retiring kopf/JobCache), with SSH-launch runtimes for cloud providers and a smart client with transport retry. (#2257, #2261, #2267, #2339)
  • Robust job lifecycle: scheduling concurrency fixes, double-release prevention, job informer + pagination, and improved resilience across scheduler, console, and engine adapter. (#2217, #2218, #2226, #2240, #2270, #2322)
  • New API surface: OpenAI Responses API support. (#2312)

Resource Manager & Cloud GPU Execution (Preview)

  • Pluggable provider model: Resource Manager interfaces with a GORM-backed store and k8s-backed provisioning. (#2171, #2172, #2183)
  • Cloud GPU providers: Lambda Cloud and RunPod via a registry/provider pattern, enabling batch jobs to burst to cloud GPUs. (#2248)
  • Non-blocking planner: policy-plugin planner with exponential backoff on provisioning failures and provider-agnostic core. (#2239, #2280, #2319)

Multi-Engine Support (vLLM, SGLang, TensorRT-LLM)

  • TensorRT-LLM as a first-class engine: tensor-rt inference engine support, TRT-LLM v1.1.0 metrics integration, and PD support for TRT-LLM 1.3.x. (#2000, #2005, #2043)
  • Engine-aware routing & metrics: model validation and routing context carry engine information; per-engine metrics fixes for load- and KV-aware routing. (#2022, #2118)
  • Cross-engine PD validation: PD disaggregation e2e tests across vLLM, SGLang, and TRT-LLM. (#2080)
  • vLLM-Omni / multimodal: vLLM-Omni endpoints in mock + Dockerfile, multi-model per-service config, and v0.14.0 integration. (#2036, #2037, #2056, #2129)

KV-Cache-Centric P/D Disaggregation

  • Unified KV data plane: L2 KVCache zero-copy APIs and vLLM v0.14.0 integration over a single aibrix_kvcache substrate (L1 DRAM + pluggable L2, PrisKV production backend). (#2056, #2060)
  • Connectors: AIBrixPDReuseConnector with prefix-cache support, per-pod KV connector type selection via pod labels, and Type2-inherits-Type1 connector refactor. (#2092, #2125, #2238)
  • Pluggable PD routing: pluggable prefill score policies (least_request, prefix_cache), configurable decode scorers, decode pod load balancing, and a KVTransferAgent abstraction (Mooncake stub). (#2070, #2087, #2105, #2284)
  • PD refactors & hardening: split router into focused files, extracted EngineHandler/PodSelector and PrefillExecutor, and fixed stale-handle/slot_mapping issues in connector type2. (#2121, #2232, #2308, #2320)

Highly-Available Gateway with Composable Routing

  • Cross-replica state sync: Redis-backed state sync for the in-memory gateway cache, aggregating running requests and prefix-cache state across gateway instances for consistent routing. (#1989, #2159)
  • Composable, blendable routing: multi-strategy routing with normalized soft-scoring and weighting (e.g. "least-request:2,throughput:1"), routing profiles, and a power-of-two router with request-tracker callbacks. (#1944, #2024, #2124)
  • Production hardening: per-model RPS rate limiting, always-on prefix-cache metrics, HTTPRoute status caching to drop per-request API calls, GOMAXPROCS tuning, and graceful ext_proc shutdown. (#2137, #2200, #2283, #2313, #2334)

📊 Feature Enhancements

  • Local mode: Run gateway, router, and KV cache without Kubernetes, with Redis now optional and a local /v1/models endpoint. (#2039, #2055, #2058)
  • Anthropic compatibility: New /v1/messages endpoint. (#2115)
  • OpenTelemetry tracing: Optional end-to-end tracing with upstream x-request-id preservation. (#2157, #2255, #2271)
  • Pluggable service discovery: Unified Provider interface (static / Consul / etcd) with refreshed static discovery. (#2034, #2035)
  • Autoscaling: KV cache usage percentage added to APA, plus documented PodAutoscaler annotations. (#2057, #2282)
  • Chat app: backend service, Dockerfile/compose/k8s manifests, image attachments, edit/retry persistence and UI cleanup. (#1971, #1996, #2102, #2278)
  • brixbench: benchmark provisioning harness for release validation and regression testing, with PD routing scenarios and docs. (#2165, #2273, #2298, #2352)

📦 Installation & Tooling & CI

  • Helm: external Redis config with component-level password validation, controller-manager env support, router idleTimeout, and CRDs separated from operator manifests. (#2201, #2216, #2222, #2230, #2234)
  • CI: build & preload AIBrix images into kind for chart-testing, multi-arch vllm-mock builds, reduced e2e workflow time, ruff bump/format, and CI action upgrades. (#2059, #2081, #2093, #2219, #2259, #2349)
  • Docs: production gateway deployment guide, expanded routing/PD guides, vLLM semantic router integration, local-mode, console production setup, batch inference, and brixbench usage. (#2189, #2192, #2193, #2337, #2347, #2348)

🐞 Critical Bug Fixes

  • Fix per-model metrics cross-talk on multi-model pods and drop duplicate metric-label sanitization. (#2228, #2331)
  • Fix session-affinity routing by preserving the x-session-id header and prevent nil-pointer panic in request tracking on context cancellation. (#2122, #2338)
  • Fix several data races: TreeNode.lastAccess in prefix cache, SLO router fallback init, and shared SyncPrefixHashTable instance. (#2096, #2106, #2327)
  • Fix /v1/models returning 404 without a trailing slash and remove min_tokens from PD prefill requests to avoid vLLM validation failure. (#2194, #2237)
  • Prevent goroutine leaks in periodical sync loops and make the gRPC max message size configurable via env var. (#2077, #2364)
  • Clean up orphan resources when RoleSet podGroupSize changes. (#2131)

New Contributors

What's Changed

Full Changelog: v0.6.0...v0.7.0

  • feat: add sglang gateway metrics and gateway dashboard by @scarlet25151 in #1959
  • feat: add support for routing-profiles by @varungup90 in #1944
  • Feat: Support vllm new kvevent format by @penfree in #1962
  • feat: add queue for prometheus query by @scarlet25151 in #1964
  • fix: update metric name for routing-algorithms by @varungup90 in #1968
  • fix: add shared path for the downloaded artifacts by @scarlet25151 in #1972
  • fix: panic inconsistent label cardinality in emit metrics by @varungup90 in #1976
  • chore: add s3 example by @scarlet25151 in #1988
  • [feat] Add backend for chat app by @Jeffwan in #1971
  • Misc: replace deprecated vllm entrypoint with vllm serve by @omerap12 in #1987
  • Cut v0.6.0 release by @varungup90 in #1986
  • Misc: replace deprecated vllm entrypoint with vllm serve by @omerap12 in #1991
  • fix: add missing imports in chat app routers by @jasonlee-1024 in #1990
  • [Bug]: broken binary search in GetSignature func by @omerap12 in #1993
  • [API] Support image attachments in chat flow with backend images handling by @Lucas-Qian6 in #1996
  • fix: gateway metrics initialization and nit refactoring in gateway.go by @varungup90 in #1997
  • [Misc] use DescribeTable for GetSignature tests by @omerap12 in #1998
  • feat: add tensor-rt inference engine support by @varungup90 in #2000
  • [Bug]: fix flaky TestLRUStore_TTL by using injectable clock in Put by @xvchris in #2007
  • feat: integrate trtllm v1.1.0 metrics by @varungup90 in #2005
  • Samples and readme for audio endpoints by @dittops in #1973
  • Fix: Support Chat Template Tokenization with vLLM Parameters in Prefix Cache Router by @penfree in #2002
  • chore: enhance model validation and routing context with engine information by @varungup90 in #2022
  • refactor: add AlgorithmConfig to ModelConfigProfile by @Jeffwan in #2027
  • feat(batch): add multi-endpoint body validation and testing by @NJX-njx in #1982
  • feat(metadata): introduce MetadataStore abstraction layer by @NJX-njx in #1981
  • refactor: Update static service discovery by @Jeffwan in #2034
  • [Misc] Fix ruff issues and address review comments by @Jeffwan in #2030
  • feat(mock): add vLLM-Omni endpoint support to mock app by @Jeffwan in #2036
  • refactor: per-service URL/key config for vLLM-Omni multi-model setup by @Jeffwan in #2037
  • Feat: Support running AIBrix in local mode by @Jeffwan in #2039
  • fix: crashloop issue in metadata service by @varungup90 in #2044
  • feat: add pd support for trtllm 1.3.x by @varungup90 in #2043
  • fix: for trtllm update input prompt with prompt_token_ids in /v1/completions by @varungup90 in #2047
  • [App][API] Centralize default model names in config for easy switching by @Lucas-Qian6 in #2048
  • Feat: Support RequestTracker callback & add power of two router by @penfree in #2024
  • [bug] Converted tree from recursive to iterative by @Jeffwan in #2052
  • [Feature] AIBrix L2 KVCache Zero-Copy APIs and vLLM v0.14.0 integration by @DwyaneShi in #2056
  • feat: add /v1/models endpoint to gateway plugin for local mode by @Jeffwan in #2055
  • [Feat] Make Redis optional in local mode by @Lucas-Qian6 in #2058
  • [Bug] Add KV Cache Usage Percentage to APA by @DhyeyTr in #2057
  • [Feature] vLLM integration by @DwyaneShi in #2060
  • [bug] Fix flaky test TestRandomRouting by @googs1025 in #2059
  • Fix: Non-blocking metrics worker pool by @Jeffwan in #2063
  • Fix: Replace fmt.Sprintf("%d", n) with strconv.Itoa(n) by @gabrnavarro in #2069
  • Fix lint and types error for apps/chat by @Jeffwan in #2072
  • refactor: replace fmt.Sprintf("%d") with strconv by @tmchow in #2076
  • fix(controller): handle io.ReadAll errors in lora_client.go by @tmchow in #2075
  • fix(gateway): handle strconv.Atoi error in response header processing by @tmchow in #2074
  • feat: improve decode pod load balancing in PD disaggregation by @varungup90 in #2070
  • fix(controller): prevent goroutine leaks in periodical sync loops by @googs1025 in #2077
  • fix(test): fix flaky TestPrefixCacheRouting by using distinct message prefix by @googs1025 in #2079
  • [Misc]: PD disaggregation e2e tests (vLLM, SGLang, TRT-LLM) by @varungup90 in #2080
  • Add lint and type check for multi-modality chat app by @Jeffwan in #2073
  • [CI] Reduce installation e2e workflow time by @varungup90 in #2081
  • perf(gateway): faster chat-completions request body validation by @varungup90 in #2084
  • fix(test): fix flaky TestVTCHighUtilizationFairness by @googs1025 in #2083
  • fix: Optimize MatchPrefix hot path with pre-sized result map and deferred percent calculation by @varungup90 in #2085
  • refactor(pd): pluggable prefill score policy with least_request and prefix_cache impls by @varungup90 in #2087
  • test(gateway): add PD disaggregation benchmark suite for routing hot paths by @varungup90 in #2088
  • [Fix] PD reuse connector supports prefix cache enabled by @DwyaneShi in #2092
  • fix preble prefix cache crashes from map race and other issues by @Jeffwan in #2091
  • ci: build vllm-mock as multi-arch image (linux/amd64,linux/arm64) by @googs1025 in #2093
  • feat: Unify service discovery with Provider interface by @Jeffwan in #2035
  • [feat] Add AIBrix management console frontend by @Jeffwan in #2094
  • fix: fix the failed after add mdoeladapter by @scarlet25151 in #2097
  • [feat] Add backend service for aibrix console by @Jeffwan in #2095
  • feat: add enterprise features to console (MySQL, auth, file proxy) by @Jeffwan in #2100
  • fix(console): address review findings (CORS, ListJobs bug, interval cleanup) by @Jeffwan in #2101
  • fix: ensure single shared SyncPrefixHashTable instance across Store a… by @penfree in #2096
  • Add Dockerfile, docker-compose, and Kubernetes manifest for chat app by @Jeffwan in #2102
  • Fix data race on TreeNode.lastAccess in prefix cache MatchPrefix path by @varungup90 in #2106
  • feat: add configurable decode scorer policies by @varungup90 in #2105
  • [Bug] Modify the way container environment variables are rendered by @Peakpine in #2108
  • [Bug]: The podset built-in envs is placed before container envs.(#2113) by @Peakpine in #2114
  • feat:ignore pods with label podGroupIndex > 0 (#2111) by @rayne-Li in #2112
  • feat: add /v1/messages endpoint by @varungup90 in #2115
  • refactor: split PD disaggregation router into focused files by @varungup90 in #2121
  • fix: Dockerfile for new vLLM versions + implement get_num_new_matched_tokens by @naroam1 in #2119
  • [Bug]: Fix session-affinity routing by preserving x-session-id header by @Yang1032 in #2122
  • refactor(kv_connector): Type2 Connector inherits from Type1 Connector by @naroam1 in #2125
  • Revert "fix: Dockerfile for new vLLM versions + implement get_num_new… by @DwyaneShi in #2132
  • feat: add semantic routing e2e sample with Envoy ext_proc and vLLM backends by @varungup90 in #2120
  • feat(batch): Config driven ModelDeploymentTemplate and BatchProfile by @Jeffwan in #2134
  • feat(batch): expose OpenAI Batch usage + model fields, flatten state enum by @Jeffwan in #2136
  • feat(console): make /api/v1/jobs a BFF over metadata service /v1/batches by @Jeffwan in #2139
  • feat(console): introduce ModelDeploymentTemplate with model-centric UX by @Jeffwan in #2141
  • refactor(batch): migrate extra_body.aibrix to nested structure by @Jeffwan in #2142
  • feat(console): batch flow picks a deployment template after the model by @Jeffwan in #2144
  • feat: replace k8s annotation with data store as source of truth by @Jeffwan in #2147
  • Update OpenAI compatible file and batch interface tests by @Jeffwan in #2150
  • feat(batch): --dry-run mode + fix request_counts.total by @Jeffwan in #2151
  • refactor(batch): collapse BatchJobStore into batch metastore helpers by @Jeffwan in #2152
  • feat(gateway): add per-model requests-per-second rate limiting by @varungup90 in #2137
  • test: run metadata-service in --dry-run mode under config/test by @Jeffwan in #2155
  • fix: update metric name in throughput routing strategy by @DaveLi8086 in #2153
  • fix(storage): make S3 put_object work with non-tellable Readers by @Jeffwan in #2156
  • fix(batch): unblock end-to-end K8s submission path by @Jeffwan in #2158
  • chore(batch): drop dead K8s-API path from aibrix_batch_worker by @Jeffwan in #2160
  • fix(batch): disable sevice links to avoid env-naming collision by @Jeffwan in #2161
  • refactor(batch): always persist BatchJob to metastore by @Jeffwan in #2162
  • fix(batch): compute usage from output file and add in_progress_at time by @Jeffwan in #2163
  • feat(batch): Support worker level REDIS override by @Jeffwan in #2164
  • feat(console/web): batch overrides and model/template UX by @Jeffwan in #2170
  • chore: migrate OpenAI Go SDK to v3 by @Genmin in #2168
  • feat(console/web): JSONL validation hardening, playground API wiring, template fix, nav cleanup by @Jeffwan in #2173
  • feat(console): URI-based store factory, serving_name for JSONL validation by @Jeffwan in #2174
  • refactor(console,batch): keep ModelDeploymentTemplate provider-agnostic by @Jeffwan in #2175
  • feat(console): introduce URL-based routing with React Router and SPA fallback by @Jeffwan in #2176
  • feat(console/auth): implement OIDC login and callback handlers by @Jeffwan in #2177
  • chore(console): batch e2e plumbing + OIDC auth hardening by @Jeffwan in #2178
  • feat(rm): resource manager interfaces by @DwyaneShi in #2171
  • fix: resolve trtllm metrics showing undefined model_name and engine_type by @ianliuy in #2118
  • feat(rm): provision result DB CRUD & GORM-backed store impl by @DwyaneShi in #2172
  • console: Dockerfile + default sqlite store + BFF↔MDS HTTP logging by @Jeffwan in #2182
  • feat(rm): k8s-backed resource manager by @DwyaneShi in #2183
  • feat(batch): basic planner passthrough for integration test by @nwangfw in #2184
  • console: show real user in sidebar/header, add login button when unauthenticated by @Jeffwan in #2187
  • console: expose OIDC username + picture, render avatar by @Jeffwan in #2188
  • feat(cache): aggregate running requests across gateway instances via Redis snapshots by @varungup90 in #2159
  • docs: add production gateway deployment guide and expand routing algorithm docs by @varungup90 in #2189
  • feat: support preserving upstream x-request-id for e2e tracing(#2157) by @HeyZackWang in #2157
  • docs: add vLLM semantic router integration guide by @varungup90 in #2192
  • chore: docs: restructure and expand gateway, PD disaggregation, and production deployment guides by @varungup90 in #2193
  • feat(gateway): implement multi-strategy routing by @DaveLi8086 in #2124
  • [Bugfix]: Remove min_tokens from PD prefill requests to avoid vLLM validation failure by @zhutong196 in #2194
  • feat(batch): planner-batch-intergation test by @nwangfw in #2186
  • fix(rm): fix provision store apis by @DwyaneShi in #2195
  • optimize: gateway-plugin cpu usage optimize when stream is true #2196 by @rayne-Li in #2196
  • perf(gateway): optimize GOMAXPROCS for K8s limits to reduce futex contention by @rayne-Li in #2200
  • chart: add router idleTimeout in chart by @rayne-Li in #2201
  • feat(batch): implement async planner for batch orchestration by @nwangfw in #2197
  • Batch refactoring to support dynamic worker. Deployment can be used as job worker now. by @zhangjyr in #2185
  • batch(console): persisted job state machine with MDS lazy sync by @Jeffwan in #2203
  • fix[batch]: several job status and db issues by @Jeffwan in #2206
  • batch: inline ModelDeploymentTemplate spec end-to-end by @Jeffwan in #2207
  • fix(store): enhance db schema by @DwyaneShi in #2209
  • fix(store): use pure go sqlite driver by @DwyaneShi in #2208
  • fix(console): update key to snake_case by @nwangfw in #2211
  • fix(store): fix schema by @DwyaneShi in #2212
  • feat(console): add CreateModel API and support hdfs path by @Jeffwan in #2214
  • feat: add state sync for in-memory cache of aibrix-gateway instances by @varungup90 in #1989
  • ci(chart): build and preload Aibrix images into kind for chart-testing by @varungup90 in #2219
  • chore(console): update gpu list and add provisioner config by @nwangfw in #2221
  • [Bug] Fixed scheduling logic to avoid repeat job scheduling. by @zhangjyr in #2218
  • [Bug] Job entity manager supports full async methods by @zhangjyr in #2217
  • [Bug] Prevent double-release on submitted job cancellation by @justinchen033 in #2226
  • bugfix: add completed check after process envoy request by @rayne-Li in #2225
  • [Bug] Drop duplicate sanitizeMetricValueLabels call in worker by @Jing-ze in #2228
  • feat(helm): support external Redis config and component-level password validation by @NelZyhh in #2230
  • chart: optimize redis passwd logic in helper.tpl by @rayne-Li in #2234
  • chart: add env for controller-manager by @rayne-Li in #2216
  • [Bug] Fix /v1/models returning 404 without a trailing slash by @Jing-ze in #2237
  • [Misc] Batch: apply inline template specs support by @zhangjyr in #2236
  • [bug] fix console and planner job fetching interaction logics for terminal jobs by @nwangfw in #2235
  • fix(batch): improve batch service resilience across scheduler, console, and engine adapter by @Jeffwan in #2240
  • fix: fix unused fields of provision results by @DwyaneShi in #2242
  • fix(chart): fix helm chart-testing CI failures by @Jeffwan in #2241
  • fix(install): separate CRDs from operator manifests by @Jeffwan in #2222
  • batch: upstreamable storage, drivers, resource schema, and model discovery by @Jeffwan in #2243
  • refactor[planner] split provider-agnostic core from planner by @nwangfw in #2239
  • feat(console): batch jobs list pagination, owner filter, and owner-only downloads by @Jeffwan in #2244
  • feat(RM): add extension support by @DwyaneShi in #2245
  • feat(RM): add Lambda Cloud & RunPod providers via registry/provider pattern by @Jeffwan in #2248
  • refactor(rm): refactor k8s provider by @DwyaneShi in #2251
  • fix(rm): make k8s clientset self-contained by @DwyaneShi in #2252
  • [CI]Add vLLM-Omni support to Dockerfile.vllm and sample deployment by @Lucas-Qian6 in #2129
  • [feat]: Support per-pod KV connector type selection via pod labels by @zhutong196 in #2238
  • chore: increase warm up period for gateway state sync in e2e tests to prevent flakiness by @varungup90 in #2256
  • refactor(batch): rebuild execution around Runtime + compute.provider; retire kopf/JobCache by @Jeffwan in #2257
  • Refactor batch AIBrix runtime payload and Kubernetes execution flow by @Jeffwan in #2261
  • feat(batch): support SSH-launch runtimes for cloud providers by @Jeffwan in #2267
  • feat: add openTelemetry support by @rayne-Li in #2255
  • docs: adjust doc level and add guide of enable openTelemetry by @rayne-Li in #2271
  • feat(batch): Expose batch job execution details in console by @Jeffwan in #2268
  • [CI] chore(python): bump ruff to 0.15.12 and apply format by @JustAnotherDevv in #2259
  • [Bug] Fix scheduler concurrency scheduling by @zhangjyr in #2270
  • fix(console): improve batch creation, file selection, and job controls by @Jeffwan in #2272
  • fix(console): improve batch job creation UX by @Jeffwan in #2274
  • Fix: lazy import redis dependencies in mds by @DwyaneShi in #2277
  • fix(rm): use UTC time by @DwyaneShi in #2276
  • fix(chat): edit/retry persistence, model selector, remove projects, UI cleanup by @Jeffwan in #2278
  • [Misc] Bump Python Ruff dependency by @xiaoyu-xyz in #2279
  • fix(gateway): interrupt idle ext_proc Recv on shutdown to fix slow pod termination by @varungup90 in #2283
  • [Docs] Document PodAutoscaler annotations by @xiaoyu-xyz in #2282
  • fix(batch): correct resource-failed job finalization and runtime display by @Jeffwan in #2291
  • feat(batch): surface CREATED as 'scheduling' status by @Jeffwan in #2292
  • docs: Examples should come with health and readiness checks by @arnavnagzirkar in #2264
  • fix(console): order timeline events by lifecycle on same-second tie, fix dot colors by @Jeffwan in #2294
  • fix(console): mark required fields in deployment template form by @Jeffwan in #2295
  • feat(pd): introduce pluggable KVTransferAgent abstraction with Mooncake stub by @varungup90 in #2284
  • feat(planner): non-blocking planner with policy plugin by @DwyaneShi in #2280
  • [Bug] Validate lora_name in ArtifactDelegationService by @SarthakB11 in #2296
  • fix(console): sort batch job list by creation time, page size 10 by @Jeffwan in #2299
  • fix(console/batch): anchor batch model field to serving_name across the stack by @Jeffwan in #2300
  • [Misc] Add brixbench benchmark module by @whalepark in #2165
  • refactor(planner): refactor backend APIs by @DwyaneShi in #2303
  • Chore(planner): fix provision result and enhance logging by @DwyaneShi in #2305
  • fix(batch): accept aibrix.model on the request entry schema by @Jeffwan in #2306
  • fix cn character error in auth header by @scarlet25151 in #2311
  • feat(console/web): gate Deployments and Playground behind feature flags by @Jeffwan in #2314
  • [Misc] Improve brixbench runner cleanup and vLLM argument handling by @whalepark in #2298
  • [Misc] Add Qwen3-8B 4P4D PD routing benchmark scenarios by @whalepark in #2273
  • chore: add license header to brixbench files by @varungup90 in #2315
  • fix(console/web): page through all jobs via cursor instead of capping the list by @Jeffwan in #2317
  • chore: fix race condition tests by @varungup90 in #2316
  • fix(console): frontend passes through request count by @DwyaneShi in #2318
  • [Misc] Add exponential backoff to provisioning failure. by @zhangjyr in #2319
  • perf(gateway): cache HTTPRoute status to eliminate per-request Kubernetes API calls by @varungup90 in #2313
  • feat(pd): extract EngineHandler and PodSelector abstractions for PD routing by @varungup90 in #2308
  • chore: refine kvcache related dockerfile and docs by @DwyaneShi in #2297
  • feat(rm): add time window to resource listing options by @DwyaneShi in #2325
  • [Bug] Fix stale-handle KeyError and slot_mapping buffer overflow in connector type2 by @JinKim48 in #2232
  • fix(console): add extraBody field to job struct by @DwyaneShi in #2321
  • refactor(pd): extract PrefillExecutor into pd/prefill/ package by @varungup90 in #2320
  • fix(gateway): fix SLO router fallback initialization race against global RouterManager by @varungup90 in #2327
  • chore: nit fix in slo_test race condition test by @varungup90 in #2328
  • [API] Add support for OpenAI Responses API by @jan-stanek in #2312
  • Restore stand alone driver mode by @zhangjyr in #2323
  • Decoupling redis client and redis libs. by @zhangjyr in #2324
  • feat(gateway): always-on prefix cache metrics with routing selection, error, and load imbalance counters by @varungup90 in #2334
  • [Docs] Update batch inference docs by @Jeffwan in #2337
  • [Docs] Add local-mode doc and update stable install to v0.6.0 by @Jeffwan in #2347
  • [Docs] Add console production setup docs by @Jeffwan in #2348
  • chore: upgrade CI action versions by @Jeffwan in #2349
  • Bump version to v0.7.0-rc.2 by @Jeffwan in #2350
  • fix(race-test): slo queue router by @varungup90 in #2353
  • fix(gateway): prevent nil pointer panic in request tracking on context cancellation by @Yang1032 in #2338
  • [Docs] Add Brixbench usage documentation by @xiaoyu-xyz in #2352
  • fix(docs): document AIBRIX_STATESYNC_ENABLED requirement and fix Helm chart env var name by @varungup90 in #2355
  • fix(chart): remove redundant openTelemetry provider and prioritize backendRefs by @rayne-Li in #2356
  • [docs] Add TRT-LLM support to multi-engine page by @varungup90 in #2357
  • feat(batch): Enabling job informer + job list pagination by @zhangjyr in #2322
  • feat(console): error injection framework by @DwyaneShi in #2336
  • feat(batch): Add batch smart client transport retry foundation by @Jeffwan in #2339
  • Fix smart client regression by @Jeffwan in #2361
  • fix: fix error injection's lint errors by @DwyaneShi in #2363
  • [Bug] Fix per-model metrics cross-talk on multi-model pods by @V-3604 in #2331
  • fix(gateway): make gRPC max message size configurable via env var by @varungup90 in #2364
  • fix(roleset): cleanup orphan resources when podGroupSize changes by @DebugSy in #2131
  • [Misc] Fix the gofmt issue by @Jeffwan in #2369
  • Bump version to v0.7.0-rc.3 by @Jeffwan in #2370
  • Bump version to v0.7.0 by @Jeffwan in #2371