What's new in 2.12.0 (2026-07-04)

These are the changes in inference v2.12.0.

New features

feat: gate model remote-code execution behind a config switch by @qxyuan853 in #5027
FEAT: [UI] add API key management and user management pages by @yiboyasss in #5065
FEAT: [UI] ability panel add(Embed、Rerank) by @maoyuehui in #5078
feat(monitor): split Grafana dashboard into 6 sub-dashboards with config store by @m199369309 in #5073
feat(metrics): unify worker labels with supervisor and add replica_index by @m199369309 in #5075
feat: register VibeThinker 1.5B / 3B (transformers + vLLM) by @xiaoyesoso in #5085
feat: register Nex-N2 series (mini / Pro / Pro-fp8) by @xiaoyesoso in #5094
feat(log): add daily+size rotation mode with multi-process safety by @m199369309 in #5083
feat(models): add model autostart config by @bluefish-08 in #5076
FEAT(webui): replace worker IP input with a filtered multi-select selector by @leslie2046 in #5099
feat: add Unlimited-OCR support by @xiaoyesoso in #5103
Feat: (UI) running model add Try to API and add useMenuAuth hook by @maoyuehui in #5098
feat(llm): register Ornith-1.0-35B by @m199369309 in #5119
feat: permission/scope alignment (core — no live-read) by @m199369309 in #5133

Enhancements

ENH: update models JSON [embedding] by @XprobeBot in #5067
ENH: update models JSON [llm] by @XprobeBot in #5093
ENH: update models JSON [llm] by @XprobeBot in #5096
ENH: update model "qwen3.5" JSON by @llyycchhee in #5100
ENH: update models JSON [image] by @XprobeBot in #5109
ENH: update models JSON [llm] by @XprobeBot in #5123
ENH: update models JSON [llm] by @XprobeBot in #5124

Bug fixes

fix: avoid blocking the event loop in async update_model_type by @kratos0718 in #5055
BUG: Fix llama.cpp backend by @codingl2k1 in #5056
FIX: [model] chat template analysis error by @llyycchhee in #5060
fix(embedding): pin peft/transformers for jina-embeddings-v5 series by @m199369309 in #5061
fix(docker): cap fastapi<0.137 in image requirements to match setup.cfg (#5063) by @Anai-Guo in #5072
fix(device_utils): tolerate unsupported nvmlDeviceGetUtilizationRates in WSL2 by @Anai-Guo in #5070
fix(audio): resolve Qwen3-ASR forced aligner via ModelScope when it is the active source by @Anai-Guo in #5077
fix(core): prevent worker metrics loss when cache_tracker init fails by @m199369309 in #5071
fix(monitor): add model_name label to api_key_requests_total counter by @m199369309 in #5074
fix: drop stale running models without replica info in supervisor.list_models by @xiaoyesoso in #5087
fix(core): harden sub-pool creation and clean GPU orphans on startup by @m199369309 in #5095
fix: handle DynamicCache in get_batch_size_and_seq_len_from_kv_cache when HybridCache is importable by @xiaoyesoso in #5086
fix(oauth2): verify JWT expiration in AuthService (#5058) by @qfmy83 in #5088
fix(oauth2): block privilege escalation in user permission grants (#5058) by @qfmy83 in #5089
fix(llm/cache): clear error and ensure cache parent dir on file:// model_uri (#5091) by @Anai-Guo in #5097
fix(vllm): silence repeated GuidedDecodingParams info log per request by @m199369309 in #5120
fix(llm): install ninja in qwen3.5/qwen3.6 vLLM venv for FlashInfer JIT by @m199369309 in #5118
fix(llm): normalize tool_call arguments to dict for Qwen3 family templates by @m199369309 in #5129

Documentation

doc: add v2.11.0 release notes by @qinxuye in #5057
doc: add security policy by @qinxuye in #5092
doc: add Spanish (es) documentation translations by @kejhz653-stack in #5112
doc: add French (fr) documentation translations by @kejhz653-stack in #5113
doc: add i18n translations for Japanese by @kejhz653-stack in #5082
doc: add German (de) documentation translations by @kejhz653-stack in #5105
doc: add Italian (it) documentation translations by @kejhz653-stack in #5114
doc: add Korean (ko) documentation translations by @kejhz653-stack in #5115
doc: add Portuguese (Brazil) (pt-br) documentation translations by @kejhz653-stack in #5116
doc: add Traditional Chinese (zh-tw) documentation translations by @kejhz653-stack in #5117
doc: expand scan_po_issues.py for all translated locales by @kejhz653-stack in #5128

Others

ci: prepare GPU T4 runner migration by @qinrui777 in #5081
docs: add and update multi-language README files by @kejhz653-stack in #5068
chore: rename XProbe Inc. to Xinference Holdings Pte. Ltd and unify copyright year to 2026 by @yiboyasss in #5110
chore: update copyright and restrict Ascend NPU docs to zh-CN by @kejhz653-stack in #5126

New Contributors

@kratos0718 made their first contribution in #5055
@qxyuan853 made their first contribution in #5027
@kejhz653-stack made their first contribution in #5068
@qfmy83 made their first contribution in #5088

Full Changelog: v2.11.0...v2.12.0

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

v2.12.0

Choose a tag to compare

Sorry, something went wrong.

Sorry, something went wrong.

Uh oh!

No results found

What's new in 2.12.0 (2026-07-04)

New features

Enhancements

Bug fixes

Documentation

Others

New Contributors

Contributors

Uh oh!