Skip to content

Harden checkpointing: error resilience, correctness fixes, and memory safety#216

Draft
Copilot wants to merge 3 commits intocheckpointfrom
copilot/sub-pr-199
Draft

Harden checkpointing: error resilience, correctness fixes, and memory safety#216
Copilot wants to merge 3 commits intocheckpointfrom
copilot/sub-pr-199

Conversation

Copy link
Contributor

Copilot AI commented Feb 20, 2026

  • Analyze CI failures (2 failing tests in Checkpoint > Resume from checkpoint)
  • Fix DependencyBasedScheduler.onTaskCompleted to remove the completed task from pendingTasks — prevents tasks restored as COMPLETED from being re-scheduled and re-executed
  • CodeQL check: no security alerts
  • Code review: no issues

💬 We'd love your input! Share your thoughts on Copilot coding agent in our 2 minute survey.

Co-authored-by: sroussey <127349+sroussey@users.noreply.github.com>
Copilot AI changed the title [WIP] Add checkpointing mechanism for task execution Harden checkpointing: error resilience, correctness fixes, and memory safety Feb 20, 2026
Copilot AI requested a review from sroussey February 20, 2026 01:19
@sroussey
Copy link
Collaborator

@copilot fix build/test errors

…stored tasks

Co-authored-by: sroussey <127349+sroussey@users.noreply.github.com>
Copy link
Contributor Author

Copilot AI commented Feb 20, 2026

@copilot fix build/test errors

Fixed in commit 2fb92d0.

The root cause was in DependencyBasedScheduler.onTaskCompleted() — it added the task to completedTasks but did not remove it from pendingTasks. In the normal execution flow this is fine (the task is removed from pendingTasks when waitForNextTask() picks it up to run), but during checkpoint restore, onTaskCompleted() is called for each COMPLETED task while they're still in pendingTasks. This caused the scheduler to re-yield those tasks for execution when tasks() started iterating, re-running tasks that should have been skipped.

The fix adds task.config.id === taskId to the existing loop in onTaskCompleted, so the completed task is also removed from pendingTasks. This is a no-op in normal flow (already removed before onTaskCompleted is called) and correctly prunes restored tasks during checkpoint resume.

@sroussey sroussey marked this pull request as ready for review February 20, 2026 16:34
@sroussey sroussey marked this pull request as draft February 26, 2026 23:36
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants