What's new in 2.12.0 (2026-07-04)
These are the changes in inference v2.12.0.
New features
- feat: gate model remote-code execution behind a config switch by @qxyuan853 in #5027
- FEAT: [UI] add API key management and user management pages by @yiboyasss in #5065
- FEAT: [UI] ability panel add(Embed、Rerank) by @maoyuehui in #5078
- feat(monitor): split Grafana dashboard into 6 sub-dashboards with config store by @m199369309 in #5073
- feat(metrics): unify worker labels with supervisor and add replica_index by @m199369309 in #5075
- feat: register VibeThinker 1.5B / 3B (transformers + vLLM) by @xiaoyesoso in #5085
- feat: register Nex-N2 series (mini / Pro / Pro-fp8) by @xiaoyesoso in #5094
- feat(log): add daily+size rotation mode with multi-process safety by @m199369309 in #5083
- feat(models): add model autostart config by @bluefish-08 in #5076
- FEAT(webui): replace worker IP input with a filtered multi-select selector by @leslie2046 in #5099
- feat: add Unlimited-OCR support by @xiaoyesoso in #5103
- Feat: (UI) running model add Try to API and add useMenuAuth hook by @maoyuehui in #5098
- feat(llm): register Ornith-1.0-35B by @m199369309 in #5119
- feat: permission/scope alignment (core — no live-read) by @m199369309 in #5133
Enhancements
- ENH: update models JSON [embedding] by @XprobeBot in #5067
- ENH: update models JSON [llm] by @XprobeBot in #5093
- ENH: update models JSON [llm] by @XprobeBot in #5096
- ENH: update model "qwen3.5" JSON by @llyycchhee in #5100
- ENH: update models JSON [image] by @XprobeBot in #5109
- ENH: update models JSON [llm] by @XprobeBot in #5123
- ENH: update models JSON [llm] by @XprobeBot in #5124
Bug fixes
- fix: avoid blocking the event loop in async update_model_type by @kratos0718 in #5055
- BUG: Fix llama.cpp backend by @codingl2k1 in #5056
- FIX: [model] chat template analysis error by @llyycchhee in #5060
- fix(embedding): pin peft/transformers for jina-embeddings-v5 series by @m199369309 in #5061
- fix(docker): cap fastapi<0.137 in image requirements to match setup.cfg (#5063) by @Anai-Guo in #5072
- fix(device_utils): tolerate unsupported nvmlDeviceGetUtilizationRates in WSL2 by @Anai-Guo in #5070
- fix(audio): resolve Qwen3-ASR forced aligner via ModelScope when it is the active source by @Anai-Guo in #5077
- fix(core): prevent worker metrics loss when cache_tracker init fails by @m199369309 in #5071
- fix(monitor): add model_name label to api_key_requests_total counter by @m199369309 in #5074
- fix: drop stale running models without replica info in supervisor.list_models by @xiaoyesoso in #5087
- fix(core): harden sub-pool creation and clean GPU orphans on startup by @m199369309 in #5095
- fix: handle DynamicCache in get_batch_size_and_seq_len_from_kv_cache when HybridCache is importable by @xiaoyesoso in #5086
- fix(oauth2): verify JWT expiration in AuthService (#5058) by @qfmy83 in #5088
- fix(oauth2): block privilege escalation in user permission grants (#5058) by @qfmy83 in #5089
- fix(llm/cache): clear error and ensure cache parent dir on file:// model_uri (#5091) by @Anai-Guo in #5097
- fix(vllm): silence repeated GuidedDecodingParams info log per request by @m199369309 in #5120
- fix(llm): install ninja in qwen3.5/qwen3.6 vLLM venv for FlashInfer JIT by @m199369309 in #5118
- fix(llm): normalize tool_call arguments to dict for Qwen3 family templates by @m199369309 in #5129
Documentation
- doc: add v2.11.0 release notes by @qinxuye in #5057
- doc: add security policy by @qinxuye in #5092
- doc: add Spanish (es) documentation translations by @kejhz653-stack in #5112
- doc: add French (fr) documentation translations by @kejhz653-stack in #5113
- doc: add i18n translations for Japanese by @kejhz653-stack in #5082
- doc: add German (de) documentation translations by @kejhz653-stack in #5105
- doc: add Italian (it) documentation translations by @kejhz653-stack in #5114
- doc: add Korean (ko) documentation translations by @kejhz653-stack in #5115
- doc: add Portuguese (Brazil) (pt-br) documentation translations by @kejhz653-stack in #5116
- doc: add Traditional Chinese (zh-tw) documentation translations by @kejhz653-stack in #5117
- doc: expand scan_po_issues.py for all translated locales by @kejhz653-stack in #5128
Others
- ci: prepare GPU T4 runner migration by @qinrui777 in #5081
- docs: add and update multi-language README files by @kejhz653-stack in #5068
- chore: rename XProbe Inc. to Xinference Holdings Pte. Ltd and unify copyright year to 2026 by @yiboyasss in #5110
- chore: update copyright and restrict Ascend NPU docs to zh-CN by @kejhz653-stack in #5126
New Contributors
- @kratos0718 made their first contribution in #5055
- @qxyuan853 made their first contribution in #5027
- @kejhz653-stack made their first contribution in #5068
- @qfmy83 made their first contribution in #5088
Full Changelog: v2.11.0...v2.12.0