feat: backport monitoring metrics and fix stream label / serve_count bugs by m199369309 · Pull Request #4899 · xorbitsai/inference

m199369309 · 2026-05-12T03:05:59Z

Summary

Monitoring metrics backport (2.7.0 → 2.8.0): Restores 5 customized metric changes that were overwritten during the 2.8.0 upgrade:
- model_last_load_duration_seconds Gauge with load timing in worker
- _WORKER_ONLY_METRICS set + Supervisor-side deregister to avoid empty HELP/TYPE headers
- image/audio/video engine/format label differentiation (empty labels instead of "unknown")
- grafana_alert_datasource in UI config API + grafanaUtils.js URL params
- @request_limit decorator on image_to_image, inpainting, ocr methods
- Exception protection for record_metrics
Stream label fix: request_limit decorator now recognizes xoscar IteratorWrapper as a stream type, fixing stream="true" label never being recorded for LLM requests
serve_count double-decrement fix: Added stream type detection guard (isasyncgen/isgenerator/IteratorWrapper) to 3 stream_results() finally blocks, ensuring decrease_serve_count() is only called when the decorator has deferred the decrement to the caller. This prevents model_serve_count from going negative in two scenarios:
- model.chat() fails before iterator is assigned (iterator is None)
- model.chat() returns a non-stream value on stream=True path (iterator is not a generator/IteratorWrapper)

Test plan

Deploy and verify model_last_load_duration_seconds is reported after model load
Verify Supervisor /metrics endpoint no longer shows worker-only metric HELP/TYPE headers
Send stream + non-stream LLM requests, confirm model_request_total{stream="true"} and stream="false" are both recorded correctly
Verify model_serve_count stays ≥ 0 after failed stream requests
Verify grafana_alert_datasource appears in /v1/cluster/ui_config response
Verify image/audio model metrics show empty engine/format labels instead of "unknown"

- Add model_last_load_duration_seconds Gauge and load timing in worker - Add _WORKER_ONLY_METRICS set and deregister from Supervisor registry - Differentiate engine/format labels for image/audio/video model types - Add @request_limit to image_to_image, inpainting, and ocr methods - Add grafana_alert_datasource to UI config API - Add alert_datasource and from/to params to grafanaUtils.js - Add exception protection to record_metrics Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

- Add IteratorWrapper to stream detection in request_limit decorator, fixing stream="true" label never being recorded for LLM requests - Guard decrease_serve_count() with iterator-not-None check in 3 stream_results() finally blocks, preventing double-decrement when model.chat() fails before iterator is assigned Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

gemini-code-assist

Code Review

This pull request enhances metrics management by introducing model load duration tracking and separating worker-only metrics. It also improves API robustness by adding request limits to image models and refining serve count handling in streaming endpoints. Additionally, the UI and Grafana utilities were updated to support alert datasources and time ranges. Review feedback identifies opportunities to further stabilize serve count decrements by aligning API stream detection with decorator logic and suggests using relative imports for consistency.

- Align API stream_results() finally blocks with request_limit decorator logic: check iterator type (isasyncgen/isgenerator/IteratorWrapper) before calling decrease_serve_count, preventing double-decrement when model returns non-stream value on stream=True path - Use relative import for _WORKER_ONLY_METRICS for consistency Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

qinxuye

LGTM

m199369309 and others added 2 commits May 12, 2026 11:02

XprobeBot added the feature label May 12, 2026

XprobeBot added this to the v2.x milestone May 12, 2026

gemini-code-assist Bot reviewed May 12, 2026

View reviewed changes

Comment thread xinference/api/restful_api.py Outdated

Comment thread xinference/api/restful_api.py Outdated

Comment thread xinference/api/restful_api.py Outdated

Comment thread xinference/api/restful_api.py Outdated

m199369309 and others added 3 commits May 12, 2026 11:21

style: fix prettier formatting in grafanaUtils.js

d6cdbcb

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

style: fix prettier v2 formatting for long template literals

5d5e8da

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

m199369309 mentioned this pull request May 12, 2026

feat: enhance monitoring UI with time picker, auto-refresh, and dashboard optimizations #4900

Merged

9 tasks

qinxuye approved these changes May 12, 2026

View reviewed changes

qinxuye merged commit 99fab44 into main May 12, 2026
10 of 13 checks passed

qinxuye deleted the feat/metrics-backport-and-stream-fix branch May 12, 2026 07:56

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

feat: backport monitoring metrics and fix stream label / serve_count bugs#4899

feat: backport monitoring metrics and fix stream label / serve_count bugs#4899
qinxuye merged 5 commits into
mainfrom
feat/metrics-backport-and-stream-fix

m199369309 commented May 12, 2026 •

edited

Loading

Uh oh!

gemini-code-assist Bot left a comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

qinxuye left a comment

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

Conversation

m199369309 commented May 12, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Summary

Test plan

Uh oh!

gemini-code-assist Bot left a comment

Choose a reason for hiding this comment

Code Review

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

qinxuye left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

m199369309 commented May 12, 2026 •

edited

Loading