feat(jobs): per-cmd live counters + kill button on Jobs page#71
Conversation
|
Warning Rate limit exceeded
You’ve run out of usage credits. Purchase more in the billing tab. ⌛ How to resolve this issue?After the wait time has elapsed, a review can be triggered using the We recommend that you space out your commits to avoid hitting the rate limit. 🚦 How do rate limits work?CodeRabbit enforces hourly rate limits for each developer per organization. Our paid plans have higher rate limits than the trial, open-source and free plans. In all cases, we re-allow further reviews after a brief timeout. Please see our FAQ for further information. ℹ️ Review info⚙️ Run configurationConfiguration used: defaults Review profile: CHILL Plan: Pro Run ID: 📒 Files selected for processing (2)
✨ Finishing Touches🧪 Generate unit tests (beta)
Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out. Comment |
There was a problem hiding this comment.
Code Review
This pull request introduces in-flight execution counters to the Jobs page, allowing operators to monitor "running" and "pending" tasks. It also adds a "kill" action to terminate all active executions for a specific job. Review feedback recommends optimizing the backend aggregation query by filtering for relevant statuses and updating the frontend logic to ensure the "kill" button and its associated UI messages correctly account for both "running" and "pending" states.
| let rows = sqlx::query( | ||
| "SELECT job_id, | ||
| SUM(CASE WHEN status = 'running' THEN 1 ELSE 0 END) AS running, | ||
| SUM(CASE WHEN status = 'pending' THEN 1 ELSE 0 END) AS pending | ||
| FROM executions | ||
| GROUP BY job_id", | ||
| ) |
There was a problem hiding this comment.
The query in fetch_live_counts currently scans the entire executions table to perform the aggregation. As the history of executions grows, this will become a performance bottleneck for the Jobs page. Since we only care about 'running' and 'pending' states, adding a WHERE clause will significantly improve performance, especially if an index exists on the status column.
| let rows = sqlx::query( | |
| "SELECT job_id, | |
| SUM(CASE WHEN status = 'running' THEN 1 ELSE 0 END) AS running, | |
| SUM(CASE WHEN status = 'pending' THEN 1 ELSE 0 END) AS pending | |
| FROM executions | |
| GROUP BY job_id", | |
| ) | |
| let rows = sqlx::query( | |
| "SELECT job_id, | |
| SUM(CASE WHEN status = 'running' THEN 1 ELSE 0 END) AS running, | |
| SUM(CASE WHEN status = 'pending' THEN 1 ELSE 0 END) AS pending | |
| FROM executions | |
| WHERE status IN ('running', 'pending') | |
| GROUP BY job_id", | |
| ) |
| <Button | ||
| variant="danger" | ||
| size="sm" | ||
| disabled={pendingKill.has(j.id) || j.live.running === 0} |
There was a problem hiding this comment.
The kill button is currently disabled if live.running is 0, even if there are pending executions. Since the backend kill handler (and the operator's intent) includes pending executions (those that have been published but haven't reported back yet), the button should be enabled if either count is greater than zero. This allows operators to abort a job immediately after firing it before any results land.
| disabled={pendingKill.has(j.id) || j.live.running === 0} | |
| disabled={pendingKill.has(j.id) || (j.live.running + j.live.pending) === 0} |
| `Kill all running runs of ${j.id}?\n\n` + | ||
| `${j.live.running} run${j.live.running === 1 ? '' : 's'} currently in flight. ` + |
There was a problem hiding this comment.
The confirmation message only mentions the running count, which is misleading because the kill action targets all in-flight executions, including those in the pending state. It should reflect the total number of runs that will be affected.
| `Kill all running runs of ${j.id}?\n\n` + | |
| `${j.live.running} run${j.live.running === 1 ? '' : 's'} currently in flight. ` + | |
| `Kill all running runs of ${j.id}?\n\n` + | |
| `${j.live.running + j.live.pending} run${(j.live.running + j.live.pending) === 1 ? '' : 's'} currently in flight. ` + |
| title={ | ||
| j.live.running === 0 | ||
| ? 'Nothing running for this job right now' | ||
| : `Terminate ${j.live.running} in-flight run${j.live.running === 1 ? '' : 's'}` | ||
| } |
There was a problem hiding this comment.
The button's tooltip should also account for both running and pending executions to accurately describe what the action will terminate.
| title={ | |
| j.live.running === 0 | |
| ? 'Nothing running for this job right now' | |
| : `Terminate ${j.live.running} in-flight run${j.live.running === 1 ? '' : 's'}` | |
| } | |
| title={ | |
| (j.live.running + j.live.pending) === 0 | |
| ? 'Nothing running for this job right now' | |
| : `Terminate ${j.live.running + j.live.pending} in-flight run${(j.live.running + j.live.pending) === 1 ? '' : 's'}` | |
| } |
PR γ of the v0.30 kill-UX trio (γ first since it stands alone — see
issue discussion). Surfaces "is anything running for this job right
now" on the Jobs catalog page so the operator has a decision input
for kill / revoke without drilling into Activity.
* Backend `/api/jobs` response gains a `live: { running, pending }`
object per row, aggregated in one GROUP BY query against the v0.29
`executions` table. `serde(flatten)` keeps the existing Manifest
fields at JSON root so the SPA's existing `job.id` / `job.version`
/ `job.execute` reads keep working — `live` is purely additive.
* Jobs page table gains a `live` column with violet `running: N` and
secondary `pending: N` chips. Both zero renders a muted dash so
idle rows stay visually quiet.
* Per-row Kill button (Skull icon, danger variant) disabled when
`live.running === 0` so the operator doesn't fire a no-op. Confirm
dialog calls out that kill does NOT block the next schedule tick
(revoke's job) — separation of concerns operators kept asking
about.
* Backed by the existing v0.29 `POST /api/jobs/{cmd_id}/kill` route
(which itself was a silent no-op pre-v0.29 — it published
`kill.{cmd_id}` to a subject no agent subscribes to; v0.29 made it
SELECT running execs and publish `kill.{exec_id}` per deployment).
Tests:
* `fetch_live_counts_groups_by_job_id` — verifies the GROUP BY
partition across multiple jobs with mixed status.
* `fetch_live_counts_excludes_completed` — 'completed' execs don't
count toward live; only 'running' + 'pending' are operationally
in-flight.
* `fetch_live_counts_empty_when_no_executions` — empty map fallback.
Workspace tests: 181 passing (3 new).
db23503 to
1a4487b
Compare
|
Round-1 Gemini fixes applied (force-push → 1a4487b): medium #1 ( medium #2, #3, #4 (Jobs.tsx kill button): the SPA had
|
Workspace version bump 0.29.0 → 0.30.0 + 3 inter-crate kanade-shared refs. Ships the kill-UX two-PR sequence: * #71 (γ): Jobs page per-cmd live counters + kill button * #73 (α'): unified execution_results lifecycle + events.started + Activity Running filter The original 3-part design (γ + α + β) collapsed to 2 PRs once the two-table approach for in-flight tracking was swapped for one unified `execution_results` with NULL finished_at. SPA changes shrank to "add a status filter option" rather than a whole new tab. See README.md "v0.30.0" entry for the operator-facing summary.
) Workspace version bump 0.29.0 → 0.30.0 + 3 inter-crate kanade-shared refs. Ships the kill-UX two-PR sequence: * #71 (γ): Jobs page per-cmd live counters + kill button * #73 (α'): unified execution_results lifecycle + events.started + Activity Running filter The original 3-part design (γ + α + β) collapsed to 2 PRs once the two-table approach for in-flight tracking was swapped for one unified `execution_results` with NULL finished_at. SPA changes shrank to "add a status filter option" rather than a whole new tab. See README.md "v0.30.0" entry for the operator-facing summary.
…uild (#80) v0.30.0 release.yml ran cross-compile + `tsc -b && vite build` against the just-tagged v0.30.0 commit and aborted at: src/pages/Jobs.tsx(219,25): error TS2322: Type "secondary" is not assignable to type "default" | "danger" | "success" | "violet" | "amber" The Jobs page pending chip used `<Badge variant="secondary">`, but the `Badge` component's variant union doesn't include `secondary` — that is `Button`'s set. The typo had been carried since the v0.30.0 Jobs chip was first added in #71 (γ), undetected because per-PR CI only runs `cargo` and never built the SPA. `release.yml`'s tag-driven SPA build was the first time `tsc -b` ran against this code on CI. Fix: `variant="secondary"` → `variant="amber"` (one-line change in Jobs.tsx). Amber semantically reads as "waiting in queue" which is what `pending` means, and visually distinct from running='s `violet`. The v0.30.0 git tag remains pointed at the broken commit but no GitHub Release was ever published from it (release.yml failed before publish). v0.30.1 ships the same kill-UX features against a working SPA build. Follow-up tracked in README backlog: surface SPA build into per-PR CI so this class of regression fails at PR time, not release time. Workspace version 0.30.0 → 0.30.1 + 3 inter-crate kanade-shared refs.
Summary
PR γ of the v0.30 kill-UX trio (PR α + β still TODO — agent
events.startedwire + Activity Running tab). γ stands alone so it ships first; per the design discussion the right home for per-cmd aggregate counters is the Jobs catalog page, not Activity (which is per-PC).What operators see
The Jobs page table gains a new
livecolumn. For each cmd it shows:running: N(violet) — at least one PC has reported back, more results still in flightpending: N(secondary) — fan-out published, no result back yet—when both are zero (idle)And a new per-row Kill button (danger, Skull icon):
live.running === 0so operators don't fire no-opsPOST /api/jobs/{cmd_id}/killroute (v0.29 — was a silent no-op before that)Backend
GET /api/jobsresponse gainslive: { running, pending }per row. Aggregated via a singleGROUP BY job_idquery against the v0.29executionstable.serde(flatten)keeps existing Manifest fields at JSON root so the SPA's existingjob.id/job.version/job.executereads still work —liveis purely additive.Resilient: live count query failure falls back to zeros (
warn!logged) so a SQLite hiccup doesn't take the whole Jobs page down.Test plan
cargo test --workspace— 181 passing (3 new forfetch_live_counts)cargo clippy --workspace --all-targets -- -D warningscleancargo fmt --all -- --checkcleantsc --noEmitSPA cleankanade exec inventory-hwagainst multi-PC, observe Jobs page chip transitionspending → running → idle🤖 Generated with Claude Code