feat(dashboard): per-model health comparison table#2823
Conversation
GET /models/health SSE endpoint + ModelHealthPanel. Sortable table, status badges (Keep/Replace/Staging/Vision), replacement candidate highlighting, one-click swap modal. Config-driven thresholds from dashboard.model_health. i18n: 21 keys across 14 locale files. Closes tinyhumansai#1852
|
Caution Review failedThe pull request is closed. Note Reviews pausedIt looks like this branch is under active development. To avoid overwhelming you with review comments due to an influx of new commits, CodeRabbit has automatically paused this review. You can configure this behavior by changing the Use the following commands to manage reviews:
Use the checkboxes below for quick actions:
ℹ️ Recent review info⚙️ Run configurationConfiguration used: Organization UI Review profile: CHILL Plan: Pro Run ID: ⛔ Files ignored due to path filters (1)
📒 Files selected for processing (8)
📝 WalkthroughWalkthroughThis PR adds a complete model health monitoring feature to OpenHuman's dashboard. It introduces backend configuration schema, a new RPC operation for model health comparison, a React settings panel with sorting/filtering/status badges and candidate swap modal, menu routing, and localized UI strings across 12 languages. ChangesModel Health Dashboard Feature
Sequence Diagram(s)sequenceDiagram
participant User
participant Panel as ModelHealthPanel
participant RPC as RPC Client
participant Backend as model_health Operation
User->>Panel: View Model Health panel
Panel->>RPC: callCoreRpc(openhuman.dashboard_model_health)
RPC->>Backend: model_health(config)
Backend->>Backend: Map registry to entries
Backend->>Backend: Compute status for each
Backend->>RPC: Return RpcOutcome{models, config}
RPC->>Panel: Unwrap envelope/result
Panel->>Panel: Render table, sort, filter
User->>Panel: Click status dropdown
Panel->>Panel: Filter rows by getStatus()
User->>Panel: Click column header
Panel->>Panel: Sort by column
User->>Panel: Click swap on replace row
Panel->>Panel: Open modal with candidates
User->>Panel: Select candidate + Apply
Panel->>Panel: Log intent, close modal
Estimated code review effort🎯 3 (Moderate) | ⏱️ ~25 minutes Possibly related PRs
Suggested reviewers
Poem
🚥 Pre-merge checks | ✅ 5✅ Passed checks (5 passed)
✏️ Tip: You can configure your own custom pre-merge checks in the settings. Warning There were issues while running some tools. Please review the errors and either fix the tool's configuration or disable the tool if it's a critical failure. 🔧 ESLint
ESLint skipped: no ESLint configuration detected in root package.json. To enable, add Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out. Comment |
There was a problem hiding this comment.
Actionable comments posted: 7
🤖 Prompt for all review comments with AI agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.
Inline comments:
In `@app/src/components/settings/panels/__tests__/ModelHealthPanel.test.tsx`:
- Around line 99-108: The test "sorts by column" in ModelHealthPanel currently
clicks the column header but has no post-condition; update the test to verify
the sorted result by asserting a measurable change after calling fireEvent.click
on the 'settings.modelHealth.col.cost' header: for example, render
ModelHealthPanel (using the existing MOCK_RESPONSE), trigger the click, then
query the table rows (using screen.getAllByRole('row') or text queries like
screen.getAllByText for the cost cells) and assert that their order matches the
expected sorted order (ascending or descending) or that a sort indicator/ARIA
attribute on the header changed; modify the test to include these expect
assertions to validate sorting behavior.
In `@app/src/components/settings/panels/ModelHealthPanel.tsx`:
- Around line 311-315: The Apply Replacement button currently only calls
setSwapTarget(null) and doesn't run the swap flow; change its onClick to invoke
the replacement handler with the current swapTarget (e.g., call the existing
swap/apply function such as handleApplyReplacement or
performReplacement(swapTarget)), await/handle the result (show loading/disable
while in progress and show errors on failure), and then clear the swap state
(setSwapTarget(null)) and close the modal; ensure you reference the swapTarget
state and the replacement handler function when making the change.
- Around line 92-94: Replace the direct fetch call in ModelHealthPanel (the
await fetch(`${baseUrl}/models/health`, { headers: { Authorization: `Bearer
${token}` } })) with an in-process core RPC relay invocation: call
invoke('core_rpc_relay', ...) supplying the HTTP method, path "/models/health",
and the Authorization header (using the existing token), then await and parse
the returned RPC response the same way you parsed res before; in short, remove
the raw fetch and use invoke('core_rpc_relay', { method: 'GET', path:
'/models/health', headers: { Authorization: `Bearer ${token}` } }) and adapt the
handling where res is used.
- Around line 298-300: Replace the hard-coded labels 'CHEAPER' and 'BETTER' in
the ModelHealthPanel JSX (the span that renders {c.cost_per_1m_output <
swapTarget.cost_per_1m_output ? 'CHEAPER' : 'BETTER'}) with translated strings
via useT(); import/use the hook in the component (e.g., const t = useT()), add
translation keys like t('modelHealth.cheaper') and t('modelHealth.better') (or
existing appropriate keys) and use those in the ternary expression so the UI
strings go through the localization system.
In `@src/core/jsonrpc.rs`:
- Around line 865-866: The JSON-RPC transport currently registers domain routes
directly (.route("/models/health", get(model_health_handler)) and .route("/rpc",
post(rpc_handler))) — move this wiring into the controller registry flow: remove
the domain-specific route registrations from jsonrpc.rs and instead expose the
handlers via the controller registry (add the appropriate schema and handler
registration in schemas.rs and register the handlers in src/core/all.rs so
jsonrpc.rs simply mounts the controller registry endpoint); ensure
model_health_handler and rpc_handler are provided through the registry API so
jsonrpc.rs delegates to the registry rather than containing branch logic.
- Around line 1221-1234: The model list currently emits placeholder metrics;
update the mapping over cfg.model_registry (the closure that builds models Vec)
to look up real per-model metrics instead of hardcoded nulls/zeros: query your
metrics source (e.g., a model metrics collection or method on cfg such as
cfg.model_metrics.get(&entry.id) or cfg.get_model_stats(entry.id)) and populate
"quality_score" and "hallucination_rate" with the retrieved numeric values (or
serde_json::Value::Null if missing) and "agents_using" and "tasks_evaluated"
with the actual counts (or 0 default). Ensure you convert types to
serde_json::Value where needed and fall back to the previous defaults when
metrics are absent.
- Around line 1209-1212: The current code calls
load_config_with_timeout().await.unwrap_or_default(), which hides real load
failures by returning synthetic defaults; change this to explicitly handle the
Result from crate::openhuman::config::rpc::load_config_with_timeout() (e.g.,
match or ? operator) so that errors are logged and propagated instead of being
silently converted to defaults; replace the unwrap_or_default usage around cfg
(and subsequent use of mh_cfg) with error handling that returns or logs an error
and aborts initialization when config loading fails.
🪄 Autofix (Beta)
Fix all unresolved CodeRabbit comments on this PR:
- Push a commit to this branch (recommended)
- Create a new PR with the fixes
ℹ️ Review info
⚙️ Run configuration
Configuration used: Organization UI
Review profile: CHILL
Plan: Pro
Run ID: d25fe977-e5a3-4b44-b08b-e141c3db8d13
📒 Files selected for processing (22)
app/src/components/settings/panels/DeveloperOptionsPanel.tsxapp/src/components/settings/panels/ModelHealthPanel.tsxapp/src/components/settings/panels/__tests__/ModelHealthPanel.test.tsxapp/src/lib/i18n/chunks/ar-5.tsapp/src/lib/i18n/chunks/bn-5.tsapp/src/lib/i18n/chunks/de-5.tsapp/src/lib/i18n/chunks/en-5.tsapp/src/lib/i18n/chunks/es-5.tsapp/src/lib/i18n/chunks/fr-5.tsapp/src/lib/i18n/chunks/hi-5.tsapp/src/lib/i18n/chunks/id-5.tsapp/src/lib/i18n/chunks/it-5.tsapp/src/lib/i18n/chunks/ko-5.tsapp/src/lib/i18n/chunks/pt-5.tsapp/src/lib/i18n/chunks/ru-5.tsapp/src/lib/i18n/chunks/zh-CN-5.tsapp/src/lib/i18n/en.tsapp/src/pages/Settings.tsxsrc/core/jsonrpc.rssrc/openhuman/config/schema/dashboard.rssrc/openhuman/config/schema/mod.rssrc/openhuman/config/schema/types.rs
| <button | ||
| type="button" | ||
| className="flex-1 py-2 rounded-lg bg-blue-600 text-white text-xs font-semibold" | ||
| onClick={() => setSwapTarget(null)}> | ||
| {t('settings.modelHealth.modal.apply')} |
There was a problem hiding this comment.
Apply Replacement is currently a no-op.
Line 314 closes the modal but does not execute any swap flow, so the primary action does not perform what its label promises.
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.
In `@app/src/components/settings/panels/ModelHealthPanel.tsx` around lines 311 -
315, The Apply Replacement button currently only calls setSwapTarget(null) and
doesn't run the swap flow; change its onClick to invoke the replacement handler
with the current swapTarget (e.g., call the existing swap/apply function such as
handleApplyReplacement or performReplacement(swapTarget)), await/handle the
result (show loading/disable while in progress and show errors on failure), and
then clear the swap state (setSwapTarget(null)) and close the modal; ensure you
reference the swapTarget state and the replacement handler function when making
the change.
| let models: Vec<serde_json::Value> = cfg | ||
| .model_registry | ||
| .iter() | ||
| .map(|entry| { | ||
| json!({ | ||
| "id": entry.id, | ||
| "provider": entry.provider, | ||
| "cost_per_1m_output": entry.cost_per_1m_output, | ||
| "vision": entry.vision, | ||
| "quality_score": serde_json::Value::Null, | ||
| "hallucination_rate": serde_json::Value::Null, | ||
| "agents_using": 0, | ||
| "tasks_evaluated": 0, | ||
| }) |
There was a problem hiding this comment.
The endpoint currently returns placeholder metrics instead of health data.
Lines 1230-1234 always emit quality_score: null, hallucination_rate: null, agents_using: 0, and tasks_evaluated: 0. That prevents meaningful per-model comparison/status decisions from backend data.
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.
In `@src/core/jsonrpc.rs` around lines 1221 - 1234, The model list currently emits
placeholder metrics; update the mapping over cfg.model_registry (the closure
that builds models Vec) to look up real per-model metrics instead of hardcoded
nulls/zeros: query your metrics source (e.g., a model metrics collection or
method on cfg such as cfg.model_metrics.get(&entry.id) or
cfg.get_model_stats(entry.id)) and populate "quality_score" and
"hallucination_rate" with the retrieved numeric values (or
serde_json::Value::Null if missing) and "agents_using" and "tasks_evaluated"
with the actual counts (or 0 default). Ensure you convert types to
serde_json::Value where needed and fall back to the previous defaults when
metrics are absent.
graycyrus
left a comment
There was a problem hiding this comment.
@Sathvik-1007 heads up — several CI jobs are still pending (Build Tauri App, Frontend Unit Tests, Rust Core Tests, E2E suites, coverage), so i'll hold off on a full sign-off until those complete. i did spot a couple things while reading through that CodeRabbit didn't catch:
-
No candidate selection in the swap modal — the candidates list in the modal renders each item as a plain
divwith noonClick. There's no state to track which candidate the operator picked. So even once the "Apply Replacement" button is wired up (CodeRabbit flagged it as a no-op), it still won't know which candidate to apply. The modal needs a selected-candidate state and each candidate card needs anonClickto set it. -
replaceCandidatesfilter makes theBETTERlabel unreachable in practice — the filter requiresc.cost_per_1m_output <= target.cost_per_1m_output, but the labelBETTERis shown whenc.cost_per_1m_output >= target.cost_per_1m_output. That meansBETTERonly renders when costs are exactly equal — a float equality edge case that will almost never happen in real config data. A model that's genuinely better quality at a higher cost will never appear as a candidate at all. If the intent is "better quality even at higher cost is worth showing", the filter and labelling logic need to be decoupled.
Fix the CI/CodeRabbit items first and i'll do a proper review after. Let me know if anything is unclear.
There was a problem hiding this comment.
Actionable comments posted: 2
🤖 Prompt for all review comments with AI agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.
Inline comments:
In `@app/src/components/settings/panels/ModelHealthPanel.tsx`:
- Around line 139-140: The replacement-candidate predicate only checks
hallucination_rate and omits the cost ceiling, allowing more expensive
candidates; update the filter used where variables c and target are compared
(the replacement candidate selection) to require both lower hallucination and
equal-or-lower cost by adding a cost comparison (e.g. require (c.cost ??
Infinity) <= (target.cost ?? Infinity)) alongside the existing hallucination
comparison so candidates meet "lower hallucination and equal-or-lower cost"
before being accepted.
- Around line 289-293: The candidate items currently render as clickable divs
which are not keyboard-focusable; update the clickable element in
ModelHealthPanel (the element using setSelectedCandidate and comparing
selectedCandidate?.id) to be a semantic interactive control (e.g., a button) or
a div with proper accessibility attributes (role="button", tabIndex={0}) plus
key event handling for Enter/Space that calls setSelectedCandidate(c); ensure
focus and the selected styling logic still use selectedCandidate?.id so keyboard
users can select candidates and enable Apply.
🪄 Autofix (Beta)
Fix all unresolved CodeRabbit comments on this PR:
- Push a commit to this branch (recommended)
- Create a new PR with the fixes
ℹ️ Review info
⚙️ Run configuration
Configuration used: Organization UI
Review profile: CHILL
Plan: Pro
Run ID: da832573-80d1-4e58-8ee1-d5ed19023f1e
📒 Files selected for processing (1)
app/src/components/settings/panels/ModelHealthPanel.tsx
graycyrus
left a comment
There was a problem hiding this comment.
@Sathvik-1007 the two issues I flagged in the last review are both addressed — candidate selection is wired up and the cheaper/better filter split is correct. Good fixes.
Holding off on approval: CI still has pending jobs (Frontend Unit Tests, Rust Core Tests, coverage, E2E Appium runs), and there are open CodeRabbit threads that need resolution before this is ready. Once those are green and addressed I'll approve.
Replaces the bespoke `/models/health` HTTP route + handler in
`src/core/jsonrpc.rs` with a controller-registry-backed RPC method
`openhuman.dashboard_model_health` living in a new `src/openhuman/dashboard/`
domain (mod.rs / ops.rs / schemas.rs / types.rs). The Rust core no longer
has domain logic in the transport layer; CLI + JSON-RPC pick it up via the
standard `RegisteredController` flow, with placeholder telemetry metrics
documented as a follow-up contract.
Frontend `ModelHealthPanel` is switched from raw `fetch` against the HTTP
route to `callCoreRpc({ method: 'openhuman.dashboard_model_health' })`,
unwrapping the `RpcOutcome` `{result, logs}` envelope. Modal candidate
cards now use a semantic `<button role="radio">` instead of clickable
`<div>`s, and the "Apply Replacement" button records the operator's swap
intent via the debug logger (the backend agent → model rewire is a
follow-up).
Addresses CodeRabbit review feedback on PR tinyhumansai#2823 — controller-registry
placement, in-process RPC over raw `fetch`, semantic interactive controls,
and Apply-button no-op surfaced as a logged intent.
|
Addressing the open CodeRabbit threads in commit 430f220: Controller-registry path / placeholder metrics / unwrap_or_default() (
Use core RPC relay instead of direct
Use a semantic control for candidate cards (
All 14 ModelHealthPanel tests pass ( |
graycyrus
left a comment
There was a problem hiding this comment.
@Sathvik-1007 hey! the code looks good to me — the refactor to the controller registry is clean and all previous feedback has been addressed. two CI jobs are failing but both are infra flakes unrelated to your changes:
- Rust Quality: runner failed to download
tower/0.5.3from crates.io (network error on the runner, not a clippy issue) - Rust Core Coverage: runner ran out of disk space
please retrigger those two jobs and once they're green i'll come back and approve. nothing else blocking on my end.
for context on the remaining open CodeRabbit thread about the no-op Apply button — the inline comment in the code makes the intent clear, which is fine for now as a documented follow-up. the placeholder metrics thread is outdated (points to old jsonrpc.rs that no longer exists).
|
@sanil-23 , Thank you for your extra contribution and for saving me a lot of time. |
sanil-23
left a comment
There was a problem hiding this comment.
Verified the controller-registry refactor (pnpm compile, ESLint, Vitest 3664/3664, ModelHealthPanel 14/14, cargo check/test core + Tauri shell). Telemetry sink + agent→model rewire are noted follow-ups. All CodeRabbit threads bot-confirmed addressed. LGTM.
|
@coderabbitai review |
✅ Actions performedReview triggered.
|
Summary
Implements #1852 — per-model health comparison table in the dashboard.
Adds a live model health comparison panel that benchmarks every model currently in use across quality score, hallucination rate, cost per million tokens, and a recommended action — giving operators a single-pane view to identify underperforming models, spot cost savings, and make informed swap decisions.
Changes
Backend (Rust):
GET /models/health— authenticated endpoint returning model registry + telemetry aggregation + config thresholdsModelHealthConfig— config schema fordashboard.model_health.{enabled, hallucination_threshold, min_tasks_for_rating, evaluation_window_tasks}ModelRegistryEntry— config schema formodel_registry[]array withid,provider,cost_per_1m_output,visionenabled: false(returns 404)Frontend (React):
ModelHealthPanel.tsx— Settings > Developer Options paneli18n: 21 keys across
en.ts,en-5.ts, and all 12 locale chunk files.Acceptance criteria
hallucination_threshold,min_tasks_for_rating, andevaluation_window_tasksread from configcost_per_1m_outputandvisionflag editable in config and reflected in the tableConfiguration
Closes #1852
Summary by CodeRabbit