fix: auto-recover stale sync state after crash by moekatib · Pull Request #110 · withoneai/cli

moekatib · 2026-04-18T17:33:04Z

Summary

When a sync crashed mid-run (laptop sleep, OOM, SIGKILL), sync_state.json kept status: "syncing" and the next run threw "already syncing. Use --force to override" even though the filesystem lock had already detected the dead pid and taken over. Fix: trust the lock as the single source of truth on concurrency — if acquireSyncLock returns, a lingering syncing state is evidence of a crashed previous run, so log a recovery line and proceed.
Install SIGINT/SIGTERM handlers in syncModel that flip state to failed and release the filesystem lock before exiting, so cleanly-signaled terminations don't leave zombie state behind (closes the window where a process killed between "set syncing" and the catch block leaves state wedged).
Bump to 1.37.2.

Scope: no heartbeat field (explicitly marked optional in the spec). --force semantics unchanged — it still skips cursor-resume and lastSync-derived date filtering.

Test plan

Start a sync, kill -9 mid-run, re-run without --force → expect "Recovered from crashed previous sync" log line and normal proceed (previously threw)
Run two one sync run <platform> concurrently → second still throws SyncLockError (lock gate fires first, no regression)
Clean sync, re-run → no recovery log, normal sync
Ctrl-C mid-sync → state flips to failed, lock directory removed, next run starts clean without recovery message

🤖 Generated with Claude Code

When a sync process dies mid-run (laptop sleep, OOM, SIGKILL, terminal close), sync_state.json keeps status: "syncing" forever. The next run then threw "already syncing. Use --force to override" even though the filesystem lock had already detected the dead pid and taken over. The two concurrency gates disagreed. Now the filesystem lock is the single source of truth — if acquireSyncLock returns, no live process holds the lock, and a lingering "syncing" state is auto-recovered with a log line instead of a throw. Also installs SIGINT/SIGTERM handlers that flip state to "failed" and release the lock before exiting, so cleanly-signaled terminations don't leave zombie state behind. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

moekatib merged commit e19ac10 into main Apr 19, 2026

moekatib deleted the fix/sync-crash-recovery branch April 19, 2026 05:51

moekatib mentioned this pull request Apr 19, 2026

Concurrent sync writers race on shared sync_state.json #111

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

fix: auto-recover stale sync state after crash#110

fix: auto-recover stale sync state after crash#110
moekatib merged 1 commit intomainfrom
fix/sync-crash-recovery

moekatib commented Apr 18, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Conversation

moekatib commented Apr 18, 2026

Summary

Test plan

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant