Skip to content

bug: stale sidecar .lock files cause "database is locked" errors on every launch #3551

Description

@KooshaPari

Summary

A forgecode session that crashes or is killed mid-write leaves behind two 0-byte sidecar lock files in the state directory:

  • ~/Library/Application Support/forge/.secrets.lock (macOS) / ~/.forge/.secrets.lock (Linux)
  • ~/Library/Application Support/forge/config/config.json.lock (macOS) / ~/.forge/config/config.json.lock (Linux)

Every subsequent forgecode launch then races for these stale locks and surfaces a generic ERROR: database is locked line at the top of every chat, regardless of whether the user is doing anything lock-related. This is currently happening in every chat for me (see Repro).

The symptom is intermittent — sometimes a launch wins the race and the chat works; sometimes it loses and the error appears. With multiple parent daemon processes (see Diagnostic), the contention becomes near-constant.

Repro

  1. rm -rf ~/.forge && forge --version (clean state, first run creates dirs and writes config)
  2. Verify ~/.forge/.secrets.lock is NOT present after first run completes
  3. Start a chat: forge --conversation-id $(uuidgen) (or just open the CLI)
  4. From another terminal, kill -9 the forgecode PID while it's actively writing config (e.g. mid-update_environment in env.rs)
  5. Verify ~/.forge/.secrets.lock is now present, 0 bytes, with mtime = the kill time
  6. Relaunch forgecode → observe ERROR: database is locked on the next chat, repeatedly

On my machine the lock files have been present since 2026-06-17 02:24 (~6 days at time of filing), which means every chat in that window has been racing for them.

Diagnostic evidence (from my current machine)

$ ls -la ~/Library/Application\\ Support/forge/.secrets* \\\
          ~/Library/Application\\ Support/forge/config/config.json*
-rw-------@ 1 kooshapari staff 3681 Jun 17 19:56 .../forge/.secrets
-rw-r--r--@ 1 kooshapari staff    0 Jun 17 02:24 .../forge/.secrets.lock      ← STALE
-rw-r--r--@ 1 kooshapari staff  242 Jun 17 19:56 .../forge/config/config.json ← (real config)
-rw-r--r--@ 1 kooshapari staff    0 Jun 17 02:24 .../forge/config/config.json.lock ← STALE

$ lsof .../forge/.secrets.lock .../forge/config/config.json.lock
(empty — no live process holds either lock)

State files themselves are healthy (3681-byte encrypted OAuth credentials, valid JSON config). Only the sidecar locks are stale.

Likely code locations

The lock-file pattern is consistent with rmcp's file-based credential fallback (Cargo.lock pins rmcp = 1.7.0 in my build); auth secrets in ~/Library/Application Support/forge/.secrets look like an rmcp-style encrypted store with a sidecar lock.

The .forge/.secrets and .forge/config/config.json paths also match the legacy forgecode state dir layout (see crates/forge_config/src/reader.rs:63 resolve_base_path).

Suggested fix

Pick one (or both):

Option A — Use fs2 (flock) instead of sidecar files

use fs2::FileExt;
let f = std::fs::OpenOptions::new()
    .create(true).write(true).truncate(false)
    .open(&path)?;
f.lock_exclusive()?;  // auto-released by the kernel on process exit, even SIGKILL
// ... write ...
// (drop f to release)

flock(2) is released by the kernel when the holding process exits for any reason (including SIGKILL), so stale locks are impossible. fs2 = "0.4" is a 200-LOC crate that wraps this.

Option B — Startup recovery for stale sidecar locks

If keeping the sidecar pattern, add a one-time sweep on startup:

fn cleanup_stale_locks(state_dir: &Path) -> std::io::Result<()> {
    for entry in std::fs::read_dir(state_dir)? {
        let path = entry?.path();
        if path.extension() == Some("lock") && path.metadata()?.len() == 0 {
            // Only remove if no live process holds it (lsof-equivalent)
            // AND sibling file is older than the lock (i.e. no in-flight write)
            if is_unlocked(&path)? && sibling_is_older(&path)? {
                tracing::info!(path = %path.display(), "removing stale lock file");
                let _ = std::fs::remove_file(&path);
            }
        }
    }
    Ok(())
}

Either way: include the path of the cleared lock file in the INFO log so users can self-diagnose, and a short tracing::warn!(could not remove stale lock: {path}, please remove manually) on removal failure.

Workaround (until fixed)

A safe, idempotent bash script that clears only 0-byte *.lock files whose target file exists and has no live lsof holder, with a 1-hour minimum age (override via --force):

# /tmp/clear-forge-locks.sh
#!/usr/bin/env bash
set -euo pipefail
for d in "$HOME/Library/Application Support/forge" "$HOME/.forge"; do
    [ -d "$d" ] || continue
    find "$d" -type f -name '*.lock' -size 0 -mmin +60 | while read -r lock; do
        if ! lsof "$lock" 2>/dev/null | grep -q .; then
            rm -v "$lock"
        fi
    done
done

I just used this to clear both locks in this report; the error immediately disappeared from the next chat.

Environment

  • forgecode v2.13.14 (binary: ~/.local/bin/forge)
  • macOS 15.x (Apple Silicon)
  • rmcp 1.7.0 (from Cargo.lock)
  • ~13 zombie forge processes accumulated from prior crashed sessions on this device (3 parent daemons + 10 conversation children, some with duplicate --conversation-id values — separate but related issue worth filing)

Severity

P1 — affects every user on the platform, every chat, indefinitely, with no in-app recovery path. Surface error is misleading ("database" implies SQLite when the actual cause is a stale sidecar file).

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type
    No fields configured for issues without a type.

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions