Skip to content

Service management page (ServiceBlockingGate) broken in multiple use cases #291

@graycyrus

Description

@graycyrus

Bug Report

Description

The "OpenHuman Service Required" page (ServiceBlockingGate) is unreliable across multiple use cases. Service status shows as "Unknown", Agent Server always shows "Not Running", buttons are disabled when they shouldn't be, and "Load failed" appears on fresh app launch. All buttons and status indicators should work correctly regardless of the app's startup state.

Screenshot

The gate shows:

  • Service: Unknown
  • Agent Server: Not Running
  • "Load failed" error banner
  • Several buttons disabled despite being needed for recovery

Root Causes

1. openhuman.agent_server_status RPC method does not exist

  • ServiceBlockingGate calls openhumanAgentServerStatus() which invokes RPC method openhuman.agent_server_status
  • This method is not registered in the core's RPC dispatch — it always errors
  • "Agent Server: Not Running" is permanently displayed and misleading
  • File: app/src/utils/tauriCommands.ts (line ~1705)

2. CORS errors cause "Load failed" on fresh launch

  • At app startup, the socket to the core sidecar isn't connected yet
  • callCoreRpc() falls back to raw fetch() against the core HTTP endpoint
  • This triggers CORS errors (no Access-Control-Allow-Origin header)
  • Both status checks fail → "Load failed" banner appears
  • File: app/src/components/daemon/ServiceBlockingGate.tsx (lines 56-94)

3. Buttons disabled in Unknown/error state

  • When service state is "Unknown", the normalizer treats it as not-installed
  • Stop, Restart, and Uninstall buttons are all disabled
  • These are exactly the buttons a user needs to recover from a bad state
  • Install and Start are always enabled, but recovery actions are locked out
  • File: app/src/components/daemon/ServiceBlockingGate.tsx (lines 128-173)

4. Fallback CLI depends on staged core binary

  • Direct invoke commands (service_install_direct, etc.) call run_core_cli() which needs the openhuman core binary staged alongside Tauri
  • If yarn core:stage wasn't run (common after fresh clone or after Rust changes), all fallback commands fail
  • No clear error message tells the user the binary is missing
  • File: app/src-tauri/src/lib.rs (lines 108-131)

Expected Behavior

  1. All buttons should be functional in all states — especially recovery buttons (Stop, Restart, Uninstall) when state is Unknown
  2. Agent Server status should either work (register the RPC method) or not be shown
  3. Fresh app launch should not show "Load failed" — the gate should gracefully wait for the sidecar socket to connect before declaring failure
  4. Clear error messages when the core binary isn't staged/available
  5. Status polling should retry with backoff rather than immediately showing failure on first attempt

Affected Files

File Role
app/src/components/daemon/ServiceBlockingGate.tsx Gate UI, button logic, status polling
app/src/utils/tauriCommands.ts RPC calls + fallback CLI invocations
app/src-tauri/src/lib.rs Direct Tauri commands (service_*_direct)
src/openhuman/service/ Rust service install/start/stop/status logic

Reproduction Steps

  1. Launch the app fresh (or when core sidecar isn't running)
  2. Observe "OpenHuman Service Required" gate
  3. Service shows "Unknown", Agent Server shows "Not Running"
  4. "Load failed" error banner appears
  5. Stop, Restart, Uninstall buttons are disabled
  6. Even Install/Start may fail silently if core binary isn't staged

Platforms

  • macOS
  • Windows
  • Linux

Scope: E2E Testing & Edge Case Audit

This issue is open-ended — beyond the known root causes above, the implementer should:

E2E Test Coverage

  • Add or expand E2E specs (app/test/e2e/specs/service-connectivity-flow.spec.ts) to cover every button in every state combination:
    • NotInstalled → Install → Start → Stop → Restart → Uninstall (full lifecycle)
    • Unknown state → all buttons should be testable
    • "Load failed" → Refresh should recover
    • Rapid button clicks (debounce / race conditions)
    • Service crashes mid-operation → gate re-blocks and buttons remain usable
  • Add unit tests for the status normalizer, button enable/disable logic, and error banner display conditions
  • Ensure tests run on the mock service backend with deterministic state transitions

Edge Case Hunting

  • Audit every code path in ServiceBlockingGate.tsx, tauriCommands.ts service functions, and src/openhuman/service/ Rust implementation for unhandled edge cases
  • Look for: race conditions between polling and button actions, timeout handling gaps, stale state after operations, error messages that swallow useful info, platform-specific failures (macOS launchctl vs Linux systemd vs Windows schtasks)
  • Fix any edge cases found — this is not just a reporting task, it's a hardening pass
  • Ensure error messages are actionable (tell the user what to do, not just "Load failed")
  • Verify the gate unblocks reliably when the service comes up (no stuck states)

The goal is that this page works flawlessly in every scenario — fresh install, reinstall, crash recovery, missing binary, slow startup, rapid interactions, all platforms.

Metadata

Metadata

Labels

No labels
No labels

Type

No type

Projects

No projects

Milestone

No milestone

Relationships

None yet

Development

No branches or pull requests

Issue actions