Add long-run runtime stability checks by xhwSkhizein · Pull Request #4 · xhwSkhizein/browser-cli

xhwSkhizein · 2026-04-13T07:58:08Z

Summary

add the long-run runtime stability design spec and first implementation slice
surface runtime stability metrics through runtime-status and browser-cli status
split Python tests out of scripts/lint.sh into scripts/test.sh and update local validation docs

Validation

./scripts/lint.sh
./scripts/test.sh
./scripts/guard.sh
./scripts/check.sh

Summary by CodeRabbit

Release Notes

New Features
- Added stability metrics tracking to runtime status, including command counts, driver switches, and cleanup failures.
- Enhanced status command to display a new Stability section with operational diagnostics.
- Introduced long-run validation checklist for multi-round endurance and disconnect/reconnect scenarios.
Bug Fixes
- Improved detection and recovery guidance for cleanup failures; runtime now marked as degraded when failures occur.
Documentation
- Updated testing workflow guidance with new validation pathways.
- Added comprehensive long-run stability design and implementation documentation.
Chores
- Refactored test execution into dedicated script; updated validation flow across all tools.

coderabbitai · 2026-04-13T07:59:39Z

Warning

Rate limit exceeded

@xhwSkhizein has exceeded the limit for the number of commits that can be reviewed per hour. Please wait 21 minutes and 0 seconds before requesting another review.

Your organization is not enrolled in usage-based pricing. Contact your admin to enable usage-based pricing to continue reviews beyond the rate limit, or try again in 21 minutes and 0 seconds.

⌛ How to resolve this issue?

After the wait time has elapsed, a review can be triggered using the @coderabbitai review command as a PR comment. Alternatively, push new commits to this PR.

We recommend that you space out your commits to avoid hitting the rate limit.

🚦 How do rate limits work?

CodeRabbit enforces hourly rate limits for each developer per organization.

Our paid plans have higher rate limits than the trial, open-source and free plans. In all cases, we re-allow further reviews after a brief timeout.

Please see our FAQ for further information.

ℹ️ Review info

⚙️ Run configuration

Configuration used: defaults

Review profile: CHILL

Plan: Pro

Run ID: 56ae4b62-824a-4d5a-a082-022ff1f1b25d

📥 Commits

Reviewing files that changed from the base of the PR and between 9156a08 and 2198b21.

📒 Files selected for processing (3)

.gitignore
docs/superpowers/plans/2026-04-12-browser-cli-popup-runtime-polish.md
docs/superpowers/specs/2026-04-11-browser-cli-next-roadmap.md

📝 Walkthrough

Walkthrough

This change implements long-run stability tracking for Browser CLI by adding runtime metrics (command counts, driver switches, extension disconnects, cleanup failures) to BrowserService, exposing them through runtime_presentation, and rendering them in status output. Testing workflows are restructured to include a dedicated scripts/test.sh step, with validation scripts and documentation updated to reflect the new flow.

Changes

Cohort / File(s)	Summary
Documentation & Planning `AGENTS.md`, `README.md`, `docs/smoke-checklist.md`, `docs/superpowers/plans/...`, `docs/superpowers/specs/...`	Added long-run stability troubleshooting path, updated testing/validation ownership and sequencing, extended smoke checklist with multi-round endurance checks and disconnect/reconnect validation, plus comprehensive implementation plan and design specification.
Test Infrastructure `scripts/check.sh`, `scripts/lint.sh`, `scripts/test.sh`, `scripts/guards/docs_sync.py`	Created dedicated `scripts/test.sh` for pytest execution; updated `scripts/check.sh` to run tests after lint; removed pytest from `scripts/lint.sh`; tightened docs-sync validation to require test script references in agent/README guidance.
Core Stability Implementation `src/browser_cli/daemon/browser_service.py`, `src/browser_cli/daemon/runtime_presentation.py`, `src/browser_cli/commands/status.py`	Added `_StabilityMetrics` tracking in `BrowserService` (commands_started, driver_switches, extension_disconnects, workspace_rebuilds, cleanup_failures, last_cleanup_error); updated `runtime_presentation` to classify cleanup failures as degraded and include stability in output; extended `StatusReport` with stability field and rendering.
Integration & Unit Tests `tests/integration/test_runtime_stability.py`, `tests/unit/test_daemon_browser_service.py`, `tests/unit/test_extension_transport.py`, `tests/unit/test_lifecycle_commands.py`, `tests/unit/test_runtime_presentation.py`	Added daemon residency loop test validating status consistency across repeated open/snapshot/html/close/reload cycles; added tests for stability metrics tracking, bounded tab state across reconnects, artifact request recovery after disconnect, stability section rendering, and cleanup-failure degradation classification.

Sequence Diagram

sequenceDiagram
    participant CLI as Status Command
    participant BS as BrowserService
    participant RP as RuntimePresentation
    participant Out as Output

    CLI->>BS: runtime_status()
    BS->>BS: Collect stability metrics<br/>(commands_started, driver_switches,<br/>cleanup_failures, last_cleanup_error)
    BS-->>CLI: {stability: {...}, ...}
    
    CLI->>RP: build_runtime_presentation(raw_status)
    RP->>RP: Extract stability block<br/>Evaluate cleanup_failures
    alt cleanup_failures > 0
        RP->>RP: Set overall_state = 'degraded'<br/>Set recovery_guidance
    end
    RP-->>CLI: presentation with stability
    
    CLI->>CLI: render_status_report()
    CLI->>CLI: Render Stability section<br/>(commands started, cleanup failures,<br/>last cleanup error)
    CLI->>Out: Display status output

Estimated code review effort

🎯 4 (Complex) | ⏱️ ~60 minutes

Possibly related PRs

Add popup runtime observer and shared runtime presentation #3: Modifies the same runtime truth path (BrowserService.runtime_status, build_runtime_presentation, StatusReport rendering) to expose and surface shared runtime presentation/stability information across daemon and command layers.

Poem

🐰 Hops through stability's long tunnels so bright,
Metrics now capture each command's flight,
When cleanup fails, the presentation glows red,
Reload it quick—bring the daemon back from dead!
Tests in three rounds prove nothing goes astray, 🔄

🚥 Pre-merge checks | ✅ 2 | ❌ 1

❌ Failed checks (1 warning)

Check name	Status	Explanation	Resolution
Docstring Coverage	⚠️ Warning	Docstring coverage is 0.00% which is insufficient. The required threshold is 80.00%.	Write docstrings for the functions missing them to satisfy the coverage threshold.

✅ Passed checks (2 passed)

Check name	Status	Explanation
Title check	✅ Passed	The pull request title accurately summarizes the main change: adding long-run runtime stability checks, which is the primary objective across the implementation, design specs, and test coverage included in this changeset.
Description check	✅ Passed	The pull request description adequately covers the main objectives with a clear summary section, specific validation commands, and alignment with the template's expected content, though it does not formally fill the template structure with checkboxes.

_{✏️ Tip: You can configure your own custom pre-merge checks in the settings.}

✨ Finishing Touches

🧪 Generate unit tests (beta)

Create PR with unit tests
Commit unit tests in branch feat/long-run-stability-checks

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

_{Comment @coderabbitai help to get the list of available commands and usage tips.}

coderabbitai

Actionable comments posted: 2

🧹 Nitpick comments (2)

src/browser_cli/commands/status.py (1)
115-117: Prefer daemon presentation stability snapshot as primary source.

Use presentation["stability"] first (fallback to top-level stability) so status rendering stays aligned with daemon-normalized shared presentation data.
♻️ Proposed refactor
     presentation = dict((live_payload or {}).get("presentation") or {})
-    stability = dict((live_payload or {}).get("stability") or {})
+    stability = dict(presentation.get("stability") or {}) or dict(
+        (live_payload or {}).get("stability") or {}
+    )
Based on learnings: `runtime-status` includes a daemon-owned `presentation` snapshot; `browser-cli status` should render that shared state.
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@src/browser_cli/commands/status.py` around lines 115 - 117, The status
rendering currently pulls stability from the top-level live_payload before
considering the daemon-normalized presentation snapshot; update the logic so
stability is assigned from presentation.get("stability") first (falling back to
(live_payload or {}).get("stability") if absent) while keeping presentation =
dict((live_payload or {}).get("presentation") or {}) and overall_status computed
via presentation.get("overall_state") or _classify_overall_status as before;
adjust the variable `stability` (and any downstream uses) to prefer the
presentation snapshot to ensure daemon-normalized state is rendered.
AGENTS.md (1)
235-237: Consider consolidating redundant validation instructions.

Lines 235-237 contain overlapping guidance:

"After each code change, run lint and guard..."

"After each code change, run lint, tests, and guard."

"After each code change, run scripts/lint.sh, scripts/test.sh..."

The first statement (line 235) appears to be stale and contradicts the more complete instructions on lines 236-237.
📝 Suggested consolidation
 - `scripts/check.sh` runs lint, tests, and guard in the expected order.
 - The guard implementations live under `scripts/guards/`.
-- After each code change, run lint and guard as part of the full validation flow.
-- After each code change, run lint, tests, and guard.
 - After each code change, run `scripts/lint.sh`, `scripts/test.sh`, and `scripts/guard.sh`, or run `scripts/check.sh`.
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@AGENTS.md` around lines 235 - 237, Remove the redundant validation sentence
and consolidate the three overlapping lines into a single clear instruction:
delete the stale line "After each code change, run lint and guard..." and keep a
single consolidated line that directs contributors to run `scripts/lint.sh`,
`scripts/test.sh`, and `scripts/guard.sh` (or `scripts/check.sh`) after each
code change so AGENTS.md contains one authoritative validation instruction.

🤖 Prompt for all review comments with AI agents

Verify each finding against the current code and only fix it if needed.

Inline comments:
In `@docs/superpowers/specs/2026-04-13-browser-cli-long-run-stability-design.md`:
- Line 5: The spec contains a developer-specific absolute path string
'/home/hongv/workspace/browser-cli'; remove or replace that exact string in
docs/superpowers/specs/2026-04-13-browser-cli-long-run-stability-design.md with
a generic reference such as "the repository root" or "the checked-out
repository" so tests and workflow fixtures resolve assets relative to the repo
rather than a machine-specific path.

In `@src/browser_cli/daemon/runtime_presentation.py`:
- Around line 95-98: The returned presentation snapshot merges the local
variable stability into presentation["stability"] but doesn't persist the
already-normalized cleanup_failures value, causing inconsistent shapes; update
the code that builds presentation["stability"] (the merge of stability and
"last_cleanup_error") to include the normalized cleanup_failures (the value
produced earlier where cleanup_failures was normalized) so that
presentation["stability"]["cleanup_failures"] is the normalized value rather
than the original unnormalized field.

---

Nitpick comments:
In `@AGENTS.md`:
- Around line 235-237: Remove the redundant validation sentence and consolidate
the three overlapping lines into a single clear instruction: delete the stale
line "After each code change, run lint and guard..." and keep a single
consolidated line that directs contributors to run `scripts/lint.sh`,
`scripts/test.sh`, and `scripts/guard.sh` (or `scripts/check.sh`) after each
code change so AGENTS.md contains one authoritative validation instruction.

In `@src/browser_cli/commands/status.py`:
- Around line 115-117: The status rendering currently pulls stability from the
top-level live_payload before considering the daemon-normalized presentation
snapshot; update the logic so stability is assigned from
presentation.get("stability") first (falling back to (live_payload or
{}).get("stability") if absent) while keeping presentation = dict((live_payload
or {}).get("presentation") or {}) and overall_status computed via
presentation.get("overall_state") or _classify_overall_status as before; adjust
the variable `stability` (and any downstream uses) to prefer the presentation
snapshot to ensure daemon-normalized state is rendered.

🪄 Autofix (Beta)

Fix all unresolved CodeRabbit comments on this PR:

Push a commit to this branch (recommended)
Create a new PR with the fixes

ℹ️ Review info

⚙️ Run configuration

Configuration used: defaults

Review profile: CHILL

Plan: Pro

Run ID: 9fd5a2cf-fe85-44bf-a0e4-ebcca12f364b

📥 Commits

Reviewing files that changed from the base of the PR and between f4addc4 and 9156a08.

📒 Files selected for processing (17)

AGENTS.md
README.md
docs/smoke-checklist.md
docs/superpowers/plans/2026-04-13-browser-cli-long-run-stability-implementation-plan.md
docs/superpowers/specs/2026-04-13-browser-cli-long-run-stability-design.md
scripts/check.sh
scripts/guards/docs_sync.py
scripts/lint.sh
scripts/test.sh
src/browser_cli/commands/status.py
src/browser_cli/daemon/browser_service.py
src/browser_cli/daemon/runtime_presentation.py
tests/integration/test_runtime_stability.py
tests/unit/test_daemon_browser_service.py
tests/unit/test_extension_transport.py
tests/unit/test_lifecycle_commands.py
tests/unit/test_runtime_presentation.py

💤 Files with no reviewable changes (1)

scripts/lint.sh

coderabbitai · 2026-04-13T08:06:32Z

+
+Date: 2026-04-13
+Status: Drafted for review
+Repo: `/home/hongv/workspace/browser-cli`


⚠️ Potential issue | 🟡 Minor

Remove developer-specific absolute path from specification.

The hard-coded path /home/hongv/workspace/browser-cli is machine-specific and will be inaccurate for other contributors. Replace with a generic reference like "the repository root" or simply remove this line.

As per coding guidelines: "Tests and workflow fixtures must resolve repo assets relative to the checked-out repository, not a developer-specific absolute workspace path."

📝 Suggested fix

-Repo: `/home/hongv/workspace/browser-cli` +Repo: browser-cli

📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change

Repo: `/home/hongv/workspace/browser-cli`

Repo: browser-cli

🤖 Prompt for AI Agents

Verify each finding against the current code and only fix it if needed. In `@docs/superpowers/specs/2026-04-13-browser-cli-long-run-stability-design.md` at line 5, The spec contains a developer-specific absolute path string '/home/hongv/workspace/browser-cli'; remove or replace that exact string in docs/superpowers/specs/2026-04-13-browser-cli-long-run-stability-design.md with a generic reference such as "the repository root" or "the checked-out repository" so tests and workflow fixtures resolve assets relative to the repo rather than a machine-specific path.

coderabbitai · 2026-04-13T08:06:32Z

+        "stability": {
+            **stability,
+            "last_cleanup_error": last_cleanup_error,
+        },


⚠️ Potential issue | 🟡 Minor

Normalize cleanup_failures in the returned presentation snapshot.

Line 20 already normalizes cleanup_failures, but Lines 95-98 don’t persist that normalized value into presentation["stability"]. That can produce inconsistent downstream rendering/contract shape.

🔧 Proposed fix

"stability": { **stability, + "cleanup_failures": cleanup_failures, "last_cleanup_error": last_cleanup_error, },

Based on learnings: `runtime-status` includes a daemon-owned `presentation` snapshot; `browser-cli status` and popup should render that shared state.

🤖 Prompt for AI Agents

Verify each finding against the current code and only fix it if needed. In `@src/browser_cli/daemon/runtime_presentation.py` around lines 95 - 98, The returned presentation snapshot merges the local variable stability into presentation["stability"] but doesn't persist the already-normalized cleanup_failures value, causing inconsistent shapes; update the code that builds presentation["stability"] (the merge of stability and "last_cleanup_error") to include the normalized cleanup_failures (the value produced earlier where cleanup_failures was normalized) so that presentation["stability"]["cleanup_failures"] is the normalized value rather than the original unnormalized field.

xhwSkhizein added 3 commits April 13, 2026 15:05

docs: add long-run stability design spec

32e3773

feat: add long-run runtime stability checks

cdf6170

build: split repo tests from lint

9156a08

docs: update roadmap and popup runtime plan

2198b21

coderabbitai Bot reviewed Apr 13, 2026

View reviewed changes

xhwSkhizein merged commit f6b184a into main Apr 13, 2026
7 checks passed

coderabbitai Bot mentioned this pull request Apr 13, 2026

refactor: replace workflow surface with task and automation #6

Merged

coderabbitai Bot mentioned this pull request Apr 26, 2026

Add agent runtime recovery surfaces #18

Merged

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Add long-run runtime stability checks#4

Add long-run runtime stability checks#4
xhwSkhizein merged 4 commits intomainfrom
feat/long-run-stability-checks

xhwSkhizein commented Apr 13, 2026 •

edited by coderabbitai Bot

Loading

Uh oh!

coderabbitai Bot commented Apr 13, 2026 •

edited

Loading

Rate limit exceeded

Walkthrough

Changes

Sequence Diagram

Estimated code review effort

Possibly related PRs

Poem

❌ Failed checks (1 warning)

Uh oh!

coderabbitai Bot left a comment

Uh oh!

coderabbitai Bot Apr 13, 2026

Uh oh!

coderabbitai Bot Apr 13, 2026

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Conversation

xhwSkhizein commented Apr 13, 2026 • edited by coderabbitai Bot Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Summary

Validation

Summary by CodeRabbit

Release Notes

Uh oh!

coderabbitai Bot commented Apr 13, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Rate limit exceeded

Walkthrough

Changes

Sequence Diagram

Estimated code review effort

Possibly related PRs

Poem

❌ Failed checks (1 warning)

Uh oh!

coderabbitai Bot left a comment

Choose a reason for hiding this comment

Uh oh!

coderabbitai Bot Apr 13, 2026

Choose a reason for hiding this comment

Uh oh!

coderabbitai Bot Apr 13, 2026

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

xhwSkhizein commented Apr 13, 2026 •

edited by coderabbitai Bot

Loading

coderabbitai Bot commented Apr 13, 2026 •

edited

Loading