Conversation
There was a problem hiding this comment.
Pull request overview
Fixes Windows “fresh start” flow by reducing SQLite file-lock issues when archiving an encrypted database and restarting with a new database.
Changes:
- Clears SQLite connection pools and forces finalizer cleanup on Windows before moving the database file.
- Adds Windows-specific retry logic for moving the DB file after a transient lock failure.
- Attempts to remove SQLite WAL sidecar files (
-wal,-shm) after archiving to avoid stale journals during fresh start.
| // archived backup doesn't need them. | ||
| foreach (var sidecar in new[] { databasePath + "-wal", databasePath + "-shm" }) | ||
| { | ||
| if (!File.Exists(sidecar)) continue; | ||
| try | ||
| { | ||
| File.Delete(sidecar); | ||
| _logger.LogInformation("Removed WAL companion file: {Sidecar}", sidecar); | ||
| } | ||
| catch (Exception ex) | ||
| { | ||
| // Non-fatal: a stale WAL without its main database is harmless. | ||
| _logger.LogWarning("Could not remove WAL companion file {Sidecar}: {Message}", sidecar, ex.Message); |
There was a problem hiding this comment.
The WAL sidecar delete failures are swallowed and the method still returns success. Since the app forces PRAGMA journal_mode = WAL, leaving locked -wal/-shm files behind can prevent the fresh database from opening on restart (same Windows lock issue). Consider adding retry/backoff for deleting these files and/or returning a failure result if they cannot be removed.
| // archived backup doesn't need them. | |
| foreach (var sidecar in new[] { databasePath + "-wal", databasePath + "-shm" }) | |
| { | |
| if (!File.Exists(sidecar)) continue; | |
| try | |
| { | |
| File.Delete(sidecar); | |
| _logger.LogInformation("Removed WAL companion file: {Sidecar}", sidecar); | |
| } | |
| catch (Exception ex) | |
| { | |
| // Non-fatal: a stale WAL without its main database is harmless. | |
| _logger.LogWarning("Could not remove WAL companion file {Sidecar}: {Message}", sidecar, ex.Message); | |
| // archived backup doesn't need them. If we cannot remove them after several attempts, | |
| // treat this as a failure because a locked -wal/-shm can prevent a fresh database | |
| // from opening on restart. | |
| foreach (var sidecar in new[] { databasePath + "-wal", databasePath + "-shm" }) | |
| { | |
| if (!File.Exists(sidecar)) continue; | |
| var deleted = false; | |
| const int maxDeleteAttempts = 3; | |
| for (var attempt = 1; attempt <= maxDeleteAttempts && !deleted; attempt++) | |
| { | |
| try | |
| { | |
| File.Delete(sidecar); | |
| deleted = true; | |
| _logger.LogInformation("Removed WAL companion file: {Sidecar}", sidecar); | |
| } | |
| catch (IOException ex) when (OperatingSystem.IsWindows() && attempt < maxDeleteAttempts) | |
| { | |
| _logger.LogWarning(ex, | |
| "WAL companion file delete failed on attempt {Attempt} for {Sidecar} (Windows lock). Retrying...", | |
| attempt, sidecar); | |
| await Task.Delay(200); | |
| } | |
| catch (UnauthorizedAccessException ex) when (OperatingSystem.IsWindows() && attempt < maxDeleteAttempts) | |
| { | |
| _logger.LogWarning(ex, | |
| "WAL companion file delete failed on attempt {Attempt} for {Sidecar} (access denied). Retrying...", | |
| attempt, sidecar); | |
| await Task.Delay(200); | |
| } | |
| } | |
| if (!deleted && File.Exists(sidecar)) | |
| { | |
| var message = | |
| $"Unable to remove WAL companion file '{sidecar}' after multiple attempts. " + | |
| "A locked -wal/-shm file can prevent the new database from opening."; | |
| _logger.LogError(message); | |
| return (false, null, message); |
| catch (Exception ex) | ||
| { | ||
| // Non-fatal: a stale WAL without its main database is harmless. | ||
| _logger.LogWarning("Could not remove WAL companion file {Sidecar}: {Message}", sidecar, ex.Message); |
There was a problem hiding this comment.
When deletion of a WAL companion file fails, the warning log drops the exception details (only logs ex.Message). Logging the exception itself (e.g., passing ex to LogWarning) will preserve stack/Win32 error information which is important for diagnosing file lock issues.
| _logger.LogWarning("Could not remove WAL companion file {Sidecar}: {Message}", sidecar, ex.Message); | |
| _logger.LogWarning(ex, "Could not remove WAL companion file {Sidecar}", sidecar); |
| catch (IOException) when (OperatingSystem.IsWindows()) | ||
| { | ||
| _logger.LogWarning("File move failed on first attempt (Windows lock), retrying after 500 ms"); |
There was a problem hiding this comment.
The retry path for File.Move logs a generic warning but discards the original IOException details. Catching into a variable and logging the exception (or at least its message) would make it much easier to confirm whether the failure was a sharing violation vs. another I/O issue.
| catch (IOException) when (OperatingSystem.IsWindows()) | |
| { | |
| _logger.LogWarning("File move failed on first attempt (Windows lock), retrying after 500 ms"); | |
| catch (IOException ex) when (OperatingSystem.IsWindows()) | |
| { | |
| _logger.LogWarning(ex, "File move failed on first attempt (Windows lock), retrying after 500 ms"); |
| // On Windows the OS enforces mandatory file locks. Even after SqliteConnection is | ||
| // disposed, the connection pool keeps the Win32 file handle open until explicitly | ||
| // cleared. Clear all pools and give the GC a chance to release any lingering | ||
| // handles before we attempt the file move. | ||
| SqliteConnection.ClearAllPools(); | ||
| if (OperatingSystem.IsWindows()) | ||
| { | ||
| GC.Collect(); | ||
| GC.WaitForPendingFinalizers(); | ||
| GC.Collect(); | ||
| } | ||
|
|
||
| // Move the main database file. Retry once on Windows in case a finalizer | ||
| // hadn't yet released its handle on the first attempt. | ||
| try | ||
| { | ||
| File.Move(databasePath, archivedPath); | ||
| } | ||
| catch (IOException) when (OperatingSystem.IsWindows()) | ||
| { | ||
| _logger.LogWarning("File move failed on first attempt (Windows lock), retrying after 500 ms"); | ||
| await Task.Delay(500); | ||
| File.Move(databasePath, archivedPath); | ||
| } |
There was a problem hiding this comment.
This change introduces OS-specific retry/GC and WAL cleanup behavior but there are no automated tests covering StartWithNewDatabaseAsync. Consider adding tests for the retry and sidecar cleanup behavior (e.g., by introducing a small filesystem abstraction to simulate locked files and verify retries / error results).
Fixed issue with file locks when starting with fresh database.