Skip to content

feat(search): visit-recency ranking, exact-title guarantee, FTS repair, and latency benchmarks#52

Merged
tstapler merged 2 commits intomainfrom
stelekit-search
Apr 28, 2026
Merged

feat(search): visit-recency ranking, exact-title guarantee, FTS repair, and latency benchmarks#52
tstapler merged 2 commits intomainfrom
stelekit-search

Conversation

@tstapler
Copy link
Copy Markdown
Owner

@tstapler tstapler commented Apr 27, 2026

Summary

This PR consolidates the unmerged `stelekit-better-search` work and the new visit-recency / performance work into a single commit against `main`.

Ranking (from `stelekit-better-search`, not yet in main):

  • AND-first / OR-fallback FTS semantics — all terms must match; falls back to OR automatically when AND returns empty
  • `PAGE_BOOST = 5.0` — page-title hits rank above block-body hits
  • `RECENCY_HALFLIFE_DAYS = 14.0` — exponential decay on edit time (up to ×2.0 for recently-edited content)
  • `GRAPH_BOOST = 3.0` — 1-hop linked pages get a 3× multiplier via `selectNeighbourPageUuids`
  • `RankedSearchHit` sealed class for unified cross-type ranking
  • 37 integration tests covering all of the above

Ranking (new in this PR):

  • Visit-recency signal: every `navigateTo` writes a timestamp to the new `page_visits` table; `buildRankedList` applies a 3-day-half-life multiplier (×2.0 at t=0)
  • Exact-title-match guarantee: `promoteExactTitleMatch()` moves any page whose name exactly equals the query to position 0

FTS health:

  • Fixed `pages_ai` trigger bug: `last_insert_rowid()` → `new.rowid`
  • `rebuildFts()` and `integrityCheckFts()` added for index repair

Performance test suite:

  • `SearchLatencyTest`: p99 < 200ms assertion over 100 queries on a 10k-page in-memory graph
  • `SearchBenchmark`: 6 JMH `SampleTime` benchmark methods via `./gradlew :kmp:jvmBenchmark`
  • `SyntheticGraphDbBuilder`: shared fixture for tests and benchmarks

Test plan

  • All existing `jvmTest` tests pass (no regressions)
  • AND semantics: `testAndSemanticsMultiTermOnlyMatchesBothTerms`
  • OR fallback: `testOrFallbackFiresWhenAndReturnsEmpty`
  • Field boost: `testFieldBoostingPageRanksAboveBlock`
  • Graph distance: `testGraphDistanceBoostLinkedPageRanksHigher`
  • Edit recency: `testRecencyBoostRecentBlockRanksHigher`
  • Visit boost: `testVisitBoostRanksVisitedPageFirst`
  • Exact title match: `testExactTitleMatchIsFirstResult` + `testExactTitleMatchCaseInsensitive`
  • Visit recording: `testRecordPageVisitIncrementsCount` / `testRecordPageVisitStoresTimestamp`
  • FTS rebuild: `FtsRebuildTest`
  • Latency: `SearchLatencyTest` — p99 < 200ms at 10k pages
  • Benchmarks compile and resolve via `jvmBenchmark`

🤖 Generated with Claude Code

…r, and latency benchmarks

- page_visits table tracks navigation events with upsert (one row per page);
  visitRecencyMultiplier (3-day half-life, ×2.0 at t=0) blended into buildRankedList
- StelekitViewModel fires recordPageVisit fire-and-forget on every navigateTo
- promoteExactTitleMatch promotes exact-name-match (case-insensitive) to position 0
- pages_ai FTS trigger fixed: last_insert_rowid() → new.rowid
- rebuildFts() / integrityCheckFts() added to SearchRepository and SqlDelightSearchRepository
- Batch selectPageVisitsByUuids lookup eliminates N+1 in ranking hot path
- InstrumentedSearchRepository wired with OTel span attrs (ranking.visit_boost, result.ranked.count)
- SearchLatencyTest: p99 < 200ms assertion at 10k pages (100 queries measured)
- SearchBenchmark: 6 JMH SampleTime benchmark methods via kotlinx-benchmark
- SyntheticGraphDbBuilder: shared in-memory graph fixture for tests and benchmarks

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
Copilot AI review requested due to automatic review settings April 27, 2026 22:45
@github-actions
Copy link
Copy Markdown
Contributor

github-actions Bot commented Apr 27, 2026

Android Load Benchmark

Instrumented benchmark on an API 30 x86_64 emulator measuring load performance for the Android app.

Comparing b51e634 (this PR) vs 16a1b40 (baseline)
Device: API 30 x86_64 emulator — 25 pages

Metric This PR Baseline Delta
Phase 1 TTI ↓ 70ms 80ms -10ms (-12%) ✅
Phase 3 index ↓ 88ms 99ms -11ms (-11%) ✅
Write p95 (baseline) ↓ 3ms 2ms +1ms (+50%) ⚠️
Write p95 (during phase 3) ↓ 1ms 2ms -1ms (-50%) ✅
Jank factor ↓ 0.33x 1x -0.67x (-67%) ✅
Concurrent writes ↑ 1 1 0 (0%)
↓ lower is better · ↑ higher is better

@github-actions
Copy link
Copy Markdown
Contributor

github-actions Bot commented Apr 27, 2026

JVM Load Benchmark (Desktop)

Synthetic in-memory benchmark measuring load performance for the desktop (JVM) app.
Comparing b51e634 (this PR) vs 16a1b40 (baseline)
Graph config: xlarge — 230 pages

Metric This PR Baseline Delta
Phase 1 TTI ↓ 9ms 9ms 0 (0%)
Phase 2 background ↓ 3ms 3ms 0 (0%)
Phase 3 index ↓ 18ms 13ms +5ms (+38%) ⚠️
Total ↓ 29ms 25ms +4ms (+16%) ⚠️
Write p95 (baseline) ↓ 40ms 31ms +9ms (+29%) ⚠️
Write p95 (under load) ↓ n/a n/a
Jank factor ↓ n/a n/a
↓ lower is better
Flamegraphs (this PR) **Allocation** — object allocation pressure (JDBC/SQLite churn)

Alloc flamegraph not available

CPU — method-level hotspots by on-CPU time

CPU flamegraph not available

Top SQL queries by total time (this PR) | table:operation | calls | p50 | p99 | max | total | |-----------------|-------|-----|-----|-----|-------| | `pages:select` | 2 | 1ms | 1ms | 2ms | 3ms |
Top allocation hotspots (this PR) `70.6%` byte[]_[k] `3.6%` java.lang.String_[k] `3.1%` java.lang.StringBuilder_[k] `2.1%` java.lang.invoke.MethodType_[k] `2.1%` java.lang.Object[]_[k]
Top CPU hotspots (this PR) `99.3%` /usr/lib/x86_64-linux-gnu/libc.so.6 `0%` SR_handler `0%` org/sqlite/core/CoreStatement.exec_[0] `0%` ClassFileParser::parse_methods `0%` jdk/internal/loader/URLClassPath$JarLoader$2.getBytes_[1]

Copy link
Copy Markdown

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

This PR enhances the search subsystem by adding a navigation-based “visit recency” ranking signal, guaranteeing exact title matches appear first, and introducing FTS repair utilities plus latency/benchmark coverage to validate performance at scale.

Changes:

  • Add visit tracking (page_visits) and apply a visit-recency multiplier during ranking.
  • Promote exact title matches (case-insensitive, trimmed) to the first result post-scoring.
  • Add FTS repair/integrity utilities and introduce new integration + latency + benchmark tests.

Reviewed changes

Copilot reviewed 16 out of 16 changed files in this pull request and generated 9 comments.

Show a summary per file
File Description
kmp/src/commonMain/kotlin/dev/stapler/stelekit/repository/SqlDelightSearchRepository.kt Applies visit-recency multiplier, exact-title promotion, and adds FTS rebuild/integrity + visit write paths.
kmp/src/commonMain/sqldelight/dev/stapler/stelekit/db/SteleDatabase.sq Fixes pages_ai trigger rowid usage and adds page_visits table + queries.
kmp/src/commonMain/kotlin/dev/stapler/stelekit/db/MigrationRunner.kt Adds migrations to fix the pages_ai trigger and create page_visits.
kmp/src/commonMain/kotlin/dev/stapler/stelekit/ui/StelekitViewModel.kt Records page visits on navigation (fire-and-forget).
kmp/src/commonMain/kotlin/dev/stapler/stelekit/repository/GraphRepository.kt Extends SearchRepository with visit + FTS maintenance APIs.
kmp/src/commonMain/kotlin/dev/stapler/stelekit/db/RestrictedDatabaseQueries.kt Exposes restricted write helpers for visit tracking queries.
kmp/src/commonMain/kotlin/dev/stapler/stelekit/repository/InMemoryRepositories.kt Adds an in-memory implementation of visit tracking for the in-memory search repo.
kmp/src/jvmCommonMain/kotlin/dev/stapler/stelekit/performance/InstrumentedSearchRepository.kt Adds tracing around searchWithFilters.
kmp/src/jvmTest/kotlin/dev/stapler/stelekit/repository/SearchRepositoryIntegrationTests.kt Adds integration tests for visit tracking + exact-title promotion.
kmp/src/jvmTest/kotlin/dev/stapler/stelekit/repository/VisitRecencyMultiplierTest.kt Unit tests for the visit-recency decay curve.
kmp/src/jvmTest/kotlin/dev/stapler/stelekit/repository/ExactTitleMatchTest.kt Unit tests for exact-title promotion behavior.
kmp/src/jvmTest/kotlin/dev/stapler/stelekit/repository/FtsRebuildTest.kt Integration tests for rebuild/integrity-check flows.
kmp/src/jvmTest/kotlin/dev/stapler/stelekit/repository/SearchLatencyTest.kt Adds latency assertions for cold-start and p99 at 10k pages.
kmp/src/jvmTest/kotlin/dev/stapler/stelekit/benchmark/SyntheticGraphDbBuilder.kt Test utility to populate a DB for latency/benchmark tests.
kmp/src/jvmMain/kotlin/dev/stapler/stelekit/benchmarks/SearchBenchmark.kt Adds JMH/kotlinx-benchmark methods for search and ranking hotspots.
kmp/src/commonTest/kotlin/dev/stapler/stelekit/ui/screens/SearchViewModelTest.kt Updates test double to implement the extended SearchRepository interface.

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

Comment on lines +70 to +75
latencies.sort()
val p99 = latencies[(latencies.size * 0.99).toInt()]
assertTrue(
p99 < 200,
"p99 latency ${p99}ms exceeded 200ms at 10k pages (p50=${latencies[50]}ms, max=${latencies.last()}ms)"
)
Copy link
Copy Markdown
Owner Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Fixed — updated to val p99Idx = (Math.ceil(0.99 * latencies.size)).toInt() - 1 (nearest-rank definition). With 100 samples this now correctly indexes 98 (0-based), not 99.

Comment on lines +675 to +678
@DirectRepositoryWrite
override suspend fun recordPageVisit(pageUuid: String): Either<DomainError, Unit> {
visitMap[pageUuid] = System.currentTimeMillis()
return Unit.right()
Copy link
Copy Markdown
Owner Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Fixed — changed to kotlin.time.Clock.System.now().toEpochMilliseconds() which is multiplatform-safe.

}

/** In-memory visit store for testing — maps pageUuid to last_visited_at epoch ms. */
val visitMap: MutableMap<String, Long> = mutableMapOf()
Copy link
Copy Markdown
Owner Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Fixed — visitMap is now internal val so it's accessible within the module for testing but not exposed as a public API.

Comment on lines +39 to +44
suspend fun populate(
db: SteleDatabase,
pageCount: Int,
blocksPerPage: Int = 10,
seed: Long = 42L,
) {
Copy link
Copy Markdown
Owner Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Fixed — removed the unused seed parameter from populate(). Content generation is deterministic by index (sequential word-list cycling), so a seed would provide no additional value here.

Comment on lines +397 to +404
if (writeActor != null) {
writeActor.execute {
@OptIn(DirectSqlWrite::class)
restricted.insertPageVisitIfAbsent(pageUuid, nowMs)
@OptIn(DirectSqlWrite::class)
restricted.updatePageVisit(nowMs, pageUuid)
Unit.right()
}
Copy link
Copy Markdown
Owner Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Fixed — recordPageVisit now uses writeActor.execute(priority = DatabaseWriteActor.Priority.LOW) so visit bookkeeping doesn't contend with interactive writes.

Comment on lines +427 to +433
if (writeActor != null) {
writeActor.execute {
sqlDriver.execute(null, "INSERT INTO blocks_fts(blocks_fts) VALUES('rebuild')", 0)
sqlDriver.execute(null, "INSERT INTO pages_fts(pages_fts) VALUES('rebuild')", 0)
Unit.right()
}
} else {
Copy link
Copy Markdown
Owner Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Fixed — rebuildFts() now uses writeActor.execute(priority = DatabaseWriteActor.Priority.LOW) so the O(N) rebuild yields to user-initiated writes.

Comment on lines +33 to +46
override fun searchWithFilters(searchRequest: SearchRequest): Flow<Either<DomainError, SearchResult>> {
val startMs = HistogramWriter.epochMs()
return delegate.searchWithFilters(searchRequest).map { result ->
val durationMs = HistogramWriter.epochMs() - startMs
val span = tracer.spanBuilder("searchWithFilters").startSpan()
try {
span.setAttribute("ranking.visit_boost", "true")
span.setAttribute("result.ranked.count", result.getOrNull()?.ranked?.size?.toLong() ?: 0L)
span.setAttribute("duration.ms", durationMs)
} finally {
span.end()
}
result
}
Copy link
Copy Markdown
Owner Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Fixed — the span is now created before invoking the delegate, onStart captures startMs, and onCompletion records duration and ends the span. The span accurately covers the full search latency.

import kotlin.test.assertFalse
import kotlin.test.assertNotNull
import dev.stapler.stelekit.repository.RankedSearchHit
import kotlin.time.Duration.Companion.hours
Copy link
Copy Markdown
Owner Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Fixed — removed the unused kotlin.time.Duration.Companion.hours import.

Comment on lines +56 to +68
// Warm up — excluded from measurement
repeat(5) { i ->
repo.searchWithFilters(SearchRequest(query = queryTerms[i % queryTerms.size], limit = 50)).first()
}

// Measure 100 queries
val latencies = mutableListOf<Long>()
repeat(100) { i ->
val query = queryTerms[i % queryTerms.size]
val start = System.currentTimeMillis()
repo.searchWithFilters(SearchRequest(query = query, limit = 50)).first()
latencies.add(System.currentTimeMillis() - start)
}
Copy link
Copy Markdown
Owner Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Fixed — added assertTrue(r.isRight(), "Warm-up query $i should succeed") in the warm-up loop and assertTrue(r.isRight(), "Measured query $i should succeed") in the measurement loop. The test now fails fast on any Left result rather than masking it with artificially low latencies.

- Add @DirectRepositoryWrite to rebuildFts/integrityCheckFts in InMemoryRepositories,
  InstrumentedSearchRepository, and SearchViewModelTest stub
- Add @OptIn(DirectRepositoryWrite::class) to InstrumentedSearchRepository class
- Use Priority.LOW for recordPageVisit and rebuildFts write-actor calls
- Fix OTel span timing in InstrumentedSearchRepository (onStart/onCompletion)
- Fix p99 latency formula to nearest-rank definition (ceil(0.99*n)-1)
- Add isRight() assertions to warm-up and measured loops in SearchLatencyTest
- Remove unused seed parameter from SyntheticGraphDbBuilder.populate()
- Remove unused kotlin.time.Duration.Companion.hours import from integration tests
- Rename benchmark methods to camelCase to pass Detekt FunctionNaming rule
- Add detekt baseline entry for pre-existing UnusedParameter in GraphLoader

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
@tstapler tstapler merged commit 86da88e into main Apr 28, 2026
11 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants