Skip to content

v2.12.0

Latest

Choose a tag to compare

@XprobeBot XprobeBot released this 04 Jul 11:45
0a310a5

What's new in 2.12.0 (2026-07-04)

These are the changes in inference v2.12.0.

New features

  • feat: gate model remote-code execution behind a config switch by @qxyuan853 in #5027
  • FEAT: [UI] add API key management and user management pages by @yiboyasss in #5065
  • FEAT: [UI] ability panel add(Embed、Rerank) by @maoyuehui in #5078
  • feat(monitor): split Grafana dashboard into 6 sub-dashboards with config store by @m199369309 in #5073
  • feat(metrics): unify worker labels with supervisor and add replica_index by @m199369309 in #5075
  • feat: register VibeThinker 1.5B / 3B (transformers + vLLM) by @xiaoyesoso in #5085
  • feat: register Nex-N2 series (mini / Pro / Pro-fp8) by @xiaoyesoso in #5094
  • feat(log): add daily+size rotation mode with multi-process safety by @m199369309 in #5083
  • feat(models): add model autostart config by @bluefish-08 in #5076
  • FEAT(webui): replace worker IP input with a filtered multi-select selector by @leslie2046 in #5099
  • feat: add Unlimited-OCR support by @xiaoyesoso in #5103
  • Feat: (UI) running model add Try to API and add useMenuAuth hook by @maoyuehui in #5098
  • feat(llm): register Ornith-1.0-35B by @m199369309 in #5119
  • feat: permission/scope alignment (core — no live-read) by @m199369309 in #5133

Enhancements

Bug fixes

  • fix: avoid blocking the event loop in async update_model_type by @kratos0718 in #5055
  • BUG: Fix llama.cpp backend by @codingl2k1 in #5056
  • FIX: [model] chat template analysis error by @llyycchhee in #5060
  • fix(embedding): pin peft/transformers for jina-embeddings-v5 series by @m199369309 in #5061
  • fix(docker): cap fastapi<0.137 in image requirements to match setup.cfg (#5063) by @Anai-Guo in #5072
  • fix(device_utils): tolerate unsupported nvmlDeviceGetUtilizationRates in WSL2 by @Anai-Guo in #5070
  • fix(audio): resolve Qwen3-ASR forced aligner via ModelScope when it is the active source by @Anai-Guo in #5077
  • fix(core): prevent worker metrics loss when cache_tracker init fails by @m199369309 in #5071
  • fix(monitor): add model_name label to api_key_requests_total counter by @m199369309 in #5074
  • fix: drop stale running models without replica info in supervisor.list_models by @xiaoyesoso in #5087
  • fix(core): harden sub-pool creation and clean GPU orphans on startup by @m199369309 in #5095
  • fix: handle DynamicCache in get_batch_size_and_seq_len_from_kv_cache when HybridCache is importable by @xiaoyesoso in #5086
  • fix(oauth2): verify JWT expiration in AuthService (#5058) by @qfmy83 in #5088
  • fix(oauth2): block privilege escalation in user permission grants (#5058) by @qfmy83 in #5089
  • fix(llm/cache): clear error and ensure cache parent dir on file:// model_uri (#5091) by @Anai-Guo in #5097
  • fix(vllm): silence repeated GuidedDecodingParams info log per request by @m199369309 in #5120
  • fix(llm): install ninja in qwen3.5/qwen3.6 vLLM venv for FlashInfer JIT by @m199369309 in #5118
  • fix(llm): normalize tool_call arguments to dict for Qwen3 family templates by @m199369309 in #5129

Documentation

Others

New Contributors

Full Changelog: v2.11.0...v2.12.0