Skip to content

Add long-run runtime stability checks#4

Merged
xhwSkhizein merged 4 commits intomainfrom
feat/long-run-stability-checks
Apr 13, 2026
Merged

Add long-run runtime stability checks#4
xhwSkhizein merged 4 commits intomainfrom
feat/long-run-stability-checks

Conversation

@xhwSkhizein
Copy link
Copy Markdown
Owner

@xhwSkhizein xhwSkhizein commented Apr 13, 2026

Summary

  • add the long-run runtime stability design spec and first implementation slice
  • surface runtime stability metrics through runtime-status and browser-cli status
  • split Python tests out of scripts/lint.sh into scripts/test.sh and update local validation docs

Validation

  • ./scripts/lint.sh
  • ./scripts/test.sh
  • ./scripts/guard.sh
  • ./scripts/check.sh

Summary by CodeRabbit

Release Notes

  • New Features

    • Added stability metrics tracking to runtime status, including command counts, driver switches, and cleanup failures.
    • Enhanced status command to display a new Stability section with operational diagnostics.
    • Introduced long-run validation checklist for multi-round endurance and disconnect/reconnect scenarios.
  • Bug Fixes

    • Improved detection and recovery guidance for cleanup failures; runtime now marked as degraded when failures occur.
  • Documentation

    • Updated testing workflow guidance with new validation pathways.
    • Added comprehensive long-run stability design and implementation documentation.
  • Chores

    • Refactored test execution into dedicated script; updated validation flow across all tools.

@coderabbitai
Copy link
Copy Markdown

coderabbitai Bot commented Apr 13, 2026

Warning

Rate limit exceeded

@xhwSkhizein has exceeded the limit for the number of commits that can be reviewed per hour. Please wait 21 minutes and 0 seconds before requesting another review.

Your organization is not enrolled in usage-based pricing. Contact your admin to enable usage-based pricing to continue reviews beyond the rate limit, or try again in 21 minutes and 0 seconds.

⌛ How to resolve this issue?

After the wait time has elapsed, a review can be triggered using the @coderabbitai review command as a PR comment. Alternatively, push new commits to this PR.

We recommend that you space out your commits to avoid hitting the rate limit.

🚦 How do rate limits work?

CodeRabbit enforces hourly rate limits for each developer per organization.

Our paid plans have higher rate limits than the trial, open-source and free plans. In all cases, we re-allow further reviews after a brief timeout.

Please see our FAQ for further information.

ℹ️ Review info
⚙️ Run configuration

Configuration used: defaults

Review profile: CHILL

Plan: Pro

Run ID: 56ae4b62-824a-4d5a-a082-022ff1f1b25d

📥 Commits

Reviewing files that changed from the base of the PR and between 9156a08 and 2198b21.

📒 Files selected for processing (3)
  • .gitignore
  • docs/superpowers/plans/2026-04-12-browser-cli-popup-runtime-polish.md
  • docs/superpowers/specs/2026-04-11-browser-cli-next-roadmap.md
📝 Walkthrough

Walkthrough

This change implements long-run stability tracking for Browser CLI by adding runtime metrics (command counts, driver switches, extension disconnects, cleanup failures) to BrowserService, exposing them through runtime_presentation, and rendering them in status output. Testing workflows are restructured to include a dedicated scripts/test.sh step, with validation scripts and documentation updated to reflect the new flow.

Changes

Cohort / File(s) Summary
Documentation & Planning
AGENTS.md, README.md, docs/smoke-checklist.md, docs/superpowers/plans/..., docs/superpowers/specs/...
Added long-run stability troubleshooting path, updated testing/validation ownership and sequencing, extended smoke checklist with multi-round endurance checks and disconnect/reconnect validation, plus comprehensive implementation plan and design specification.
Test Infrastructure
scripts/check.sh, scripts/lint.sh, scripts/test.sh, scripts/guards/docs_sync.py
Created dedicated scripts/test.sh for pytest execution; updated scripts/check.sh to run tests after lint; removed pytest from scripts/lint.sh; tightened docs-sync validation to require test script references in agent/README guidance.
Core Stability Implementation
src/browser_cli/daemon/browser_service.py, src/browser_cli/daemon/runtime_presentation.py, src/browser_cli/commands/status.py
Added _StabilityMetrics tracking in BrowserService (commands_started, driver_switches, extension_disconnects, workspace_rebuilds, cleanup_failures, last_cleanup_error); updated runtime_presentation to classify cleanup failures as degraded and include stability in output; extended StatusReport with stability field and rendering.
Integration & Unit Tests
tests/integration/test_runtime_stability.py, tests/unit/test_daemon_browser_service.py, tests/unit/test_extension_transport.py, tests/unit/test_lifecycle_commands.py, tests/unit/test_runtime_presentation.py
Added daemon residency loop test validating status consistency across repeated open/snapshot/html/close/reload cycles; added tests for stability metrics tracking, bounded tab state across reconnects, artifact request recovery after disconnect, stability section rendering, and cleanup-failure degradation classification.

Sequence Diagram

sequenceDiagram
    participant CLI as Status Command
    participant BS as BrowserService
    participant RP as RuntimePresentation
    participant Out as Output

    CLI->>BS: runtime_status()
    BS->>BS: Collect stability metrics<br/>(commands_started, driver_switches,<br/>cleanup_failures, last_cleanup_error)
    BS-->>CLI: {stability: {...}, ...}
    
    CLI->>RP: build_runtime_presentation(raw_status)
    RP->>RP: Extract stability block<br/>Evaluate cleanup_failures
    alt cleanup_failures > 0
        RP->>RP: Set overall_state = 'degraded'<br/>Set recovery_guidance
    end
    RP-->>CLI: presentation with stability
    
    CLI->>CLI: render_status_report()
    CLI->>CLI: Render Stability section<br/>(commands started, cleanup failures,<br/>last cleanup error)
    CLI->>Out: Display status output
Loading

Estimated code review effort

🎯 4 (Complex) | ⏱️ ~60 minutes

Possibly related PRs

Poem

🐰 Hops through stability's long tunnels so bright,
Metrics now capture each command's flight,
When cleanup fails, the presentation glows red,
Reload it quick—bring the daemon back from dead!
Tests in three rounds prove nothing goes astray, 🔄

🚥 Pre-merge checks | ✅ 2 | ❌ 1

❌ Failed checks (1 warning)

Check name Status Explanation Resolution
Docstring Coverage ⚠️ Warning Docstring coverage is 0.00% which is insufficient. The required threshold is 80.00%. Write docstrings for the functions missing them to satisfy the coverage threshold.
✅ Passed checks (2 passed)
Check name Status Explanation
Title check ✅ Passed The pull request title accurately summarizes the main change: adding long-run runtime stability checks, which is the primary objective across the implementation, design specs, and test coverage included in this changeset.
Description check ✅ Passed The pull request description adequately covers the main objectives with a clear summary section, specific validation commands, and alignment with the template's expected content, though it does not formally fill the template structure with checkboxes.

✏️ Tip: You can configure your own custom pre-merge checks in the settings.

✨ Finishing Touches
🧪 Generate unit tests (beta)
  • Create PR with unit tests
  • Commit unit tests in branch feat/long-run-stability-checks

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

Comment @coderabbitai help to get the list of available commands and usage tips.

Copy link
Copy Markdown

@coderabbitai coderabbitai Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 2

🧹 Nitpick comments (2)
src/browser_cli/commands/status.py (1)

115-117: Prefer daemon presentation stability snapshot as primary source.

Use presentation["stability"] first (fallback to top-level stability) so status rendering stays aligned with daemon-normalized shared presentation data.

♻️ Proposed refactor
     presentation = dict((live_payload or {}).get("presentation") or {})
-    stability = dict((live_payload or {}).get("stability") or {})
+    stability = dict(presentation.get("stability") or {}) or dict(
+        (live_payload or {}).get("stability") or {}
+    )
Based on learnings: `runtime-status` includes a daemon-owned `presentation` snapshot; `browser-cli status` should render that shared state.
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@src/browser_cli/commands/status.py` around lines 115 - 117, The status
rendering currently pulls stability from the top-level live_payload before
considering the daemon-normalized presentation snapshot; update the logic so
stability is assigned from presentation.get("stability") first (falling back to
(live_payload or {}).get("stability") if absent) while keeping presentation =
dict((live_payload or {}).get("presentation") or {}) and overall_status computed
via presentation.get("overall_state") or _classify_overall_status as before;
adjust the variable `stability` (and any downstream uses) to prefer the
presentation snapshot to ensure daemon-normalized state is rendered.
AGENTS.md (1)

235-237: Consider consolidating redundant validation instructions.

Lines 235-237 contain overlapping guidance:

  • "After each code change, run lint and guard..."
  • "After each code change, run lint, tests, and guard."
  • "After each code change, run scripts/lint.sh, scripts/test.sh..."

The first statement (line 235) appears to be stale and contradicts the more complete instructions on lines 236-237.

📝 Suggested consolidation
 - `scripts/check.sh` runs lint, tests, and guard in the expected order.
 - The guard implementations live under `scripts/guards/`.
-- After each code change, run lint and guard as part of the full validation flow.
-- After each code change, run lint, tests, and guard.
 - After each code change, run `scripts/lint.sh`, `scripts/test.sh`, and `scripts/guard.sh`, or run `scripts/check.sh`.
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@AGENTS.md` around lines 235 - 237, Remove the redundant validation sentence
and consolidate the three overlapping lines into a single clear instruction:
delete the stale line "After each code change, run lint and guard..." and keep a
single consolidated line that directs contributors to run `scripts/lint.sh`,
`scripts/test.sh`, and `scripts/guard.sh` (or `scripts/check.sh`) after each
code change so AGENTS.md contains one authoritative validation instruction.
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.

Inline comments:
In `@docs/superpowers/specs/2026-04-13-browser-cli-long-run-stability-design.md`:
- Line 5: The spec contains a developer-specific absolute path string
'/home/hongv/workspace/browser-cli'; remove or replace that exact string in
docs/superpowers/specs/2026-04-13-browser-cli-long-run-stability-design.md with
a generic reference such as "the repository root" or "the checked-out
repository" so tests and workflow fixtures resolve assets relative to the repo
rather than a machine-specific path.

In `@src/browser_cli/daemon/runtime_presentation.py`:
- Around line 95-98: The returned presentation snapshot merges the local
variable stability into presentation["stability"] but doesn't persist the
already-normalized cleanup_failures value, causing inconsistent shapes; update
the code that builds presentation["stability"] (the merge of stability and
"last_cleanup_error") to include the normalized cleanup_failures (the value
produced earlier where cleanup_failures was normalized) so that
presentation["stability"]["cleanup_failures"] is the normalized value rather
than the original unnormalized field.

---

Nitpick comments:
In `@AGENTS.md`:
- Around line 235-237: Remove the redundant validation sentence and consolidate
the three overlapping lines into a single clear instruction: delete the stale
line "After each code change, run lint and guard..." and keep a single
consolidated line that directs contributors to run `scripts/lint.sh`,
`scripts/test.sh`, and `scripts/guard.sh` (or `scripts/check.sh`) after each
code change so AGENTS.md contains one authoritative validation instruction.

In `@src/browser_cli/commands/status.py`:
- Around line 115-117: The status rendering currently pulls stability from the
top-level live_payload before considering the daemon-normalized presentation
snapshot; update the logic so stability is assigned from
presentation.get("stability") first (falling back to (live_payload or
{}).get("stability") if absent) while keeping presentation = dict((live_payload
or {}).get("presentation") or {}) and overall_status computed via
presentation.get("overall_state") or _classify_overall_status as before; adjust
the variable `stability` (and any downstream uses) to prefer the presentation
snapshot to ensure daemon-normalized state is rendered.
🪄 Autofix (Beta)

Fix all unresolved CodeRabbit comments on this PR:

  • Push a commit to this branch (recommended)
  • Create a new PR with the fixes

ℹ️ Review info
⚙️ Run configuration

Configuration used: defaults

Review profile: CHILL

Plan: Pro

Run ID: 9fd5a2cf-fe85-44bf-a0e4-ebcca12f364b

📥 Commits

Reviewing files that changed from the base of the PR and between f4addc4 and 9156a08.

📒 Files selected for processing (17)
  • AGENTS.md
  • README.md
  • docs/smoke-checklist.md
  • docs/superpowers/plans/2026-04-13-browser-cli-long-run-stability-implementation-plan.md
  • docs/superpowers/specs/2026-04-13-browser-cli-long-run-stability-design.md
  • scripts/check.sh
  • scripts/guards/docs_sync.py
  • scripts/lint.sh
  • scripts/test.sh
  • src/browser_cli/commands/status.py
  • src/browser_cli/daemon/browser_service.py
  • src/browser_cli/daemon/runtime_presentation.py
  • tests/integration/test_runtime_stability.py
  • tests/unit/test_daemon_browser_service.py
  • tests/unit/test_extension_transport.py
  • tests/unit/test_lifecycle_commands.py
  • tests/unit/test_runtime_presentation.py
💤 Files with no reviewable changes (1)
  • scripts/lint.sh


Date: 2026-04-13
Status: Drafted for review
Repo: `/home/hongv/workspace/browser-cli`
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟡 Minor

Remove developer-specific absolute path from specification.

The hard-coded path /home/hongv/workspace/browser-cli is machine-specific and will be inaccurate for other contributors. Replace with a generic reference like "the repository root" or simply remove this line.

As per coding guidelines: "Tests and workflow fixtures must resolve repo assets relative to the checked-out repository, not a developer-specific absolute workspace path."

📝 Suggested fix
-Repo: `/home/hongv/workspace/browser-cli`
+Repo: browser-cli
📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change
Repo: `/home/hongv/workspace/browser-cli`
Repo: browser-cli
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@docs/superpowers/specs/2026-04-13-browser-cli-long-run-stability-design.md`
at line 5, The spec contains a developer-specific absolute path string
'/home/hongv/workspace/browser-cli'; remove or replace that exact string in
docs/superpowers/specs/2026-04-13-browser-cli-long-run-stability-design.md with
a generic reference such as "the repository root" or "the checked-out
repository" so tests and workflow fixtures resolve assets relative to the repo
rather than a machine-specific path.

Comment on lines +95 to +98
"stability": {
**stability,
"last_cleanup_error": last_cleanup_error,
},
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟡 Minor

Normalize cleanup_failures in the returned presentation snapshot.

Line 20 already normalizes cleanup_failures, but Lines 95-98 don’t persist that normalized value into presentation["stability"]. That can produce inconsistent downstream rendering/contract shape.

🔧 Proposed fix
         "stability": {
             **stability,
+            "cleanup_failures": cleanup_failures,
             "last_cleanup_error": last_cleanup_error,
         },
Based on learnings: `runtime-status` includes a daemon-owned `presentation` snapshot; `browser-cli status` and popup should render that shared state.
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@src/browser_cli/daemon/runtime_presentation.py` around lines 95 - 98, The
returned presentation snapshot merges the local variable stability into
presentation["stability"] but doesn't persist the already-normalized
cleanup_failures value, causing inconsistent shapes; update the code that builds
presentation["stability"] (the merge of stability and "last_cleanup_error") to
include the normalized cleanup_failures (the value produced earlier where
cleanup_failures was normalized) so that
presentation["stability"]["cleanup_failures"] is the normalized value rather
than the original unnormalized field.

@xhwSkhizein xhwSkhizein merged commit f6b184a into main Apr 13, 2026
7 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant