Skip to content

fix(editor): serialize structural block ops through DatabaseWriteActor#97

Merged
tstapler merged 10 commits into
mainfrom
fix/block-write-race
May 27, 2026
Merged

fix(editor): serialize structural block ops through DatabaseWriteActor#97
tstapler merged 10 commits into
mainfrom
fix/block-write-race

Conversation

@tstapler
Copy link
Copy Markdown
Owner

Problem

Fast typing followed immediately by Enter caused new blocks to disappear, content to revert to a stale version, or blocks to merge incorrectly (reproducible >50% of the time).

Root cause

BlockStateManager has two uncoordinated async write paths:

  • Content writes: applyContentChangewriteActor.updateBlockContentOnly() — queued through DatabaseWriteActor (serialized)
  • Structural writes: splitBlock, addNewBlock, mergeBlock, handleBackspace — called blockRepository.splitBlock/mergeBlocks/deleteBlock directly, bypassing the actor queue

Race sequence:

t=0  User types. applyContentChange → actor.updateBlockContentOnly("Hello World") [queued]
t=1  User hits Enter. splitBlock() launches new coroutine.
t=2  blockRepository.splitBlock() reads stale DB content (actor hasn't written yet).
t=3  Actor dequeues content update → overwrites the structural result with pre-split content.
t=4  New block disappears / content reverts.

Fix

DatabaseWriteActor — adds three typed methods that route through execute {}:

  • splitBlock(blockUuid, cursorPosition, newBlockUuid)
  • mergeBlocks(keepUuid, dropUuid, separator)
  • deleteBlockStructural(blockUuid)

Also adds hasPendingWrites (@Volatile counter covering the full enqueue→DB-commit window).

BlockStateManager — adds private writeSplitBlock / writeMergeBlocks / writeDeleteBlockStructural helpers using the established writeActor?.xxx() ?: blockRepository.xxx() fallback pattern. All 6 structural call sites across addNewBlock, splitBlock, mergeBlock, and handleBackspace now route through these helpers.

StelekitViewModel — extends the external-file-change conflict guard from 3-tier to 4-tier: adds hasActorPendingWrites as a fourth condition so a file watcher event arriving while a split/merge is in the actor queue triggers the conflict dialog instead of silently overwriting.

Tests

Added 4 race-condition tests using DelayedContentBlockRepository (delays updateBlockContentOnly to create a deterministic race window):

  • splitBlock_after_pending_content_write_uses_latest_content
  • addNewBlock_after_pending_content_write_preserves_typed_content
  • mergeBlock_after_pending_content_write_uses_latest_content
  • handleBackspace_after_pending_content_write_merges_latest_content

Test results

  • BlockStateManagerTest: 60/60 PASS (56 pre-existing + 4 new)
  • DatabaseWriteActorTest: 18/18 PASS
  • detekt: CLEAN

🤖 Generated with Claude Code

Fast typing followed by Enter caused new blocks to disappear or revert
because splitBlock/mergeBlock/handleBackspace called blockRepository
directly, bypassing the actor queue. A pending updateBlockContentOnly
enqueued in the actor would execute after the structural op, overwriting
the split/merge result with stale content.

Fix: add typed splitBlock/mergeBlocks/deleteBlockStructural methods to
DatabaseWriteActor and route all structural call sites in BlockStateManager
through the actor via private writeSplitBlock/writeMergeBlocks/writeDeleteBlockStructural
helpers (writeActor?.xxx() ?: blockRepository.xxx() fallback for tests).

Also adds hasPendingWrites (AtomicInt counter) to DatabaseWriteActor and
exposes hasActorPendingWrites on BlockStateManager so conflict detection in
StelekitViewModel catches external file changes that arrive while a
split/merge is in-flight.

Adds 4 race-condition tests that verify ordering under concurrent
content-write + structural-op scenarios.

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
Copilot AI review requested due to automatic review settings May 26, 2026 16:47
@github-actions
Copy link
Copy Markdown
Contributor

github-actions Bot commented May 26, 2026

JVM Load Benchmark (Desktop)

Synthetic in-memory benchmark measuring load performance for the desktop (JVM) app.
Comparing 802f829 (this PR) vs be1652e (baseline)
Graph config: xlarge — 230 pages

Metric This PR Baseline Delta
Phase 1 TTI ↓ 9ms 7ms +2ms (+29%) ⚠️
Phase 2 background ↓ 3ms 3ms 0 (0%)
Phase 3 index ↓ 15ms 13ms +2ms (+15%) ⚠️
Total ↓ 26ms 23ms +3ms (+13%) ⚠️
Write p95 (baseline) ↓ 28ms 28ms 0 (0%)
Write p95 (under load) ↓ n/a n/a
Jank factor ↓ n/a n/a
↓ lower is better
Flamegraphs (this PR) **Allocation** — object allocation pressure (JDBC/SQLite churn)

Alloc flamegraph not available

CPU — method-level hotspots by on-CPU time

CPU flamegraph not available

Top SQL queries by total time (this PR) | table:operation | calls | p50 | p99 | max | total | |-----------------|-------|-----|-----|-----|-------| | `pages:select` | 2 | 1ms | 1ms | 1ms | 1ms |
Top allocation hotspots (this PR) `71.6%` byte[]_[k] `5.6%` java.lang.String_[k] `2.5%` java.lang.StringBuilder_[k] `1.9%` jdk.internal.org.objectweb.asm.SymbolTable$Entry_[k] `1.2%` jdk.internal.ref.CleanerImpl$PhantomCleanableRef_[k]
Top CPU hotspots (this PR) `99.4%` /usr/lib/x86_64-linux-gnu/libc.so.6 `0.1%` /tmp/sqlite-3.51.3.0-9a34a6c3-1d9e-4eb2-9741-05029cb41a6d-libsqlitejdbc.so `0%` kotlinx/coroutines/scheduling/CoroutineScheduler$Worker.trySteal_[0] `0%` java/util/zip/ZipFile$Source.getEntryPos_[i] `0%` kotlinx/coroutines/internal/DispatchedContinuation._[0]

Copy link
Copy Markdown

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

This PR fixes a race condition in the editor where structural operations (split/merge/delete) could bypass the DatabaseWriteActor queue and interleave incorrectly with pending content writes, causing blocks to disappear or content to revert under fast typing + Enter/Backspace.

Changes:

  • Route structural block operations (splitBlock, mergeBlocks, structural deletes) through DatabaseWriteActor to serialize them with content writes.
  • Add an actor “pending writes” flag and use it in external file-change conflict protection to avoid overwriting queued structural edits.
  • Add deterministic race-condition tests using a delayed updateBlockContentOnly repository wrapper.

Reviewed changes

Copilot reviewed 4 out of 4 changed files in this pull request and generated 2 comments.

File Description
kmp/src/commonMain/kotlin/dev/stapler/stelekit/db/DatabaseWriteActor.kt Adds typed structural methods, injectable scope for tests, and a pending-writes indicator.
kmp/src/commonMain/kotlin/dev/stapler/stelekit/ui/state/BlockStateManager.kt Routes structural operations through actor-backed helper methods and exposes hasActorPendingWrites.
kmp/src/commonMain/kotlin/dev/stapler/stelekit/ui/StelekitViewModel.kt Extends external-file-change conflict guard to include actor-queue pending structural writes.
kmp/src/commonTest/kotlin/dev/stapler/stelekit/ui/state/BlockStateManagerTest.kt Adds 4 race-condition regression tests and a delayed content-write repository test double.

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

Comment thread kmp/src/commonMain/kotlin/dev/stapler/stelekit/db/DatabaseWriteActor.kt Outdated
@github-actions
Copy link
Copy Markdown
Contributor

github-actions Bot commented May 26, 2026

Android Load Benchmark

Instrumented benchmark on an API 30 x86_64 emulator — 500-page synthetic graph.

Comparing 802f829 (this PR) vs be1652e (baseline)
Device: API 30 x86_64 emulator — 530 pages loaded

Graph Load

Metric This PR Baseline Delta
Phase 1 TTI ↓ 83ms 101ms -18ms (-18%) ✅
Phase 3 index ↓ 4011ms 4242ms -231ms (-5%) ✅

Interactive Write Latency (during Phase 3)

Metric This PR Baseline Delta
Write p95 (baseline) ↓ 5ms 5ms 0 (0%)
Write p95 (during phase 3) ↓ 11ms 12ms -1ms (-8%) ✅
Jank factor ↓ 2.2x 2.4x -0.2x (-8%) ✅
Concurrent writes ↑ 20 21 -1ms (-5%) ⚠️

SAF I/O Overhead (ContentProvider vs direct File read)

Measures Binder IPC cost added by ContentResolver per readFile() call.
Real SAF via ExternalStorageProvider will be higher on device; this is a lower bound.

Metric This PR Baseline Delta
Direct read / file ↓ 0.0ms 0.0ms 0 (0%)
Provider read / file ↓ 0.2ms 0.2ms 0ms (-15%) ✅
IPC overhead ratio ↓ 5x 6x -1x (-17%) ✅
↓ lower is better · ↑ higher is better

tstapler and others added 9 commits May 26, 2026 10:48
…ased hasPendingWrites

Two issues flagged in PR review:

1. `_activeOps` used `@Volatile Int` with `++`/`--` — not atomic; concurrent
   `execute()` callers could corrupt the count or make it go negative.

2. Counter was incremented before `send()` but decremented only after `await()`.
   If `send()` threw (closed channel, cancellation), the counter leaked indefinitely.

Fix: drop the explicit counter entirely. `hasPendingWrites` now derives state from
the channels themselves (`!highPriority.isEmpty || !lowPriority.isEmpty`) plus an
actor-owned `@Volatile Boolean` flag set inside the `processRequest` try/finally.
Channel.isEmpty is concurrency-safe; the flag is single-writer (actor coroutine),
so @volatile gives correct multi-reader visibility without atomics or new deps.

Also adds the ClassLevelDirectRepositoryWriteOptIn detekt rule (post-mortem
enforcement) and moves class-level @OptIn to function-level across BlockStateManager,
StelekitViewModel, AnnotationEditorViewModel, BacklinkRenamer, and
GlobalUnlinkedReferencesViewModel.

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
… scope ownership, watcher correctness

Addresses all 31 findings from the post-architecture code review.

Key changes:
- GraphLoaderPort / GraphWriterPort: replace mutable var properties with explicit
  setter functions; add renamePage/savePage/deletePage to port so callers never
  need unsafe casts to the concrete type
- BlockRepository split into BlockReadRepository, BlockWriteRepository,
  BlockSearchRepository, BlockStructureRepository (ISP)
- ImportViewModel / ScreenRouter: class now owns its CoroutineScope; removed
  rememberCoroutineScope() leaking out of composition (C18 fix)
- GraphFileWatcher: fix SharedFlow suppression (Channel.RENDEZVOUS replaces broken
  yield() pattern), add close() to cancel owned scope, add pollIntervalMs parameter,
  log dropped events (C13/C14/C15/A3)
- MarkdownPageParser: extract to own file, replace Pair/Triple returns with typed
  PageBuildResult/PageMetadata, deduplicate mergedProperties, remove dead title alias
- FlashcardScheduler: extract SM-2 logic from @composable into pure stateless object
- GraphEvents.kt: move ExternalFileChange and WriteError out of GraphLoader
- AllPagesViewModel: depend on BlockSearchRepository, not full BlockRepository
- BlockWriteRepository.clear(): return Either<DomainError, Unit>
- Remove unused count field from DuplicateGroup
- StelekitViewModelDependencies: remove default scope value, require explicit supply
- GraphLoaderWatcherTest: replace vacuous assertTrue(count >= 0) with real assertion
- GraphFileWatcherTest: new suite covering suppression, close(), git-merge paths;
  fix virtual-time/real-time timeout mismatch and registry priming

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
Audit all BlockRepository usages and apply ISP — classes that only read
now depend on BlockReadRepository; write-only on BlockWriteRepository.
Removes the dead blockRepository parameter from JournalsView (unused).

Files narrowed to BlockReadRepository: VectorSearch, ExportService, Editor
Files narrowed to BlockWriteRepository: TextOperations, ImageImportService
Unused imports removed: DatalogQuery, BlockTreeOperations
Dead parameter removed: JournalsView + 9 call sites updated

Files kept as BlockRepository: JournalService, BacklinkRenamer,
ReferencesPanel, BlockStateManager, EditorViewModel, StelekitViewModel,
DatabaseWriteActor, PersistenceManager, OptimizedTextOperations,
GlobalUnlinkedReferencesViewModel (all use multiple roles legitimately)

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
resolvedDisplay() checks $DISPLAY first, then scans /tmp/.X*-lock to find
a running XWayland instance. Fixes all 78 UI test failures when running
from a terminal on Wayland where $DISPLAY is not exported to the shell.

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
Replaces single resolvedDisplay() with four helpers:
- linuxUid(): reads effective UID from /proc/self/status
- resolvedXdgRuntimeDir(): env var → /proc UID probe
- resolvedDisplay(): env var → XWayland /tmp/.X*-lock probe
- resolvedWaylandDisplay(): env var → wayland-0 socket probe

configureDisplayEnv() applies all three to a Test task so the JVM
receives whichever display variables are discoverable: DISPLAY for AWT,
WAYLAND_DISPLAY + XDG_RUNTIME_DIR for Skiko's Wayland renderer.
Works on X11, Wayland+XWayland, and CI with a real display.

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
…eIntStateOf, nesting depth, unused param

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
…leak in DatabaseWriteActor

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
Replace the flawed `committedAnnotations.isEmpty()` guard in `initialize()`
with an explicit `hasBeenMutated` AtomicBoolean flag. The old guard could
not distinguish "no annotations ever added" from "user deleted all
annotations", causing the background init coroutine to restore a just-deleted
annotation from the repository. Now `commitAnnotation` and `deleteAnnotation`
both set `hasBeenMutated = true` before touching state, and the init
coroutine skips applying repository results once any mutation has occurred.
Also adds an `initialized` flag (via `compareAndSet`) to prevent a second
`initialize()` call from re-issuing the one-shot repository load.

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
@tstapler tstapler merged commit ab4e924 into main May 27, 2026
12 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants