feat(lib): section/title inference — multi-pattern cascade, CSI lookup, Jaccard fuzzy match#62
Conversation
…I lookup, fuzzy title match
…/title unknown gap
…ates tree on content match
…tent inference fires
…SI lookup + loadFiles warnings
…ad_files - Extract handler functions to separate handlers.ts module to reduce tools.ts complexity - Add guardPath checks to glob-resolved paths before resolving - Replace path accumulation with discriminated union PathResolution type - All paths (glob + explicit) now validated against project root boundary - Prevents glob patterns like '../../../../etc/passwd' from escaping sandbox - Unit tests pass, lint passes
|
Note Reviews pausedIt looks like this branch is under active development. To avoid overwhelming you with review comments due to an influx of new commits, CodeRabbit has automatically paused this review. You can configure this behavior by changing the Use the following commands to manage reviews:
Use the checkboxes below for quick actions:
📝 WalkthroughWalkthroughAdds deterministic CSI section/title inference: core algorithm and tests, DB title lookup and re-export, parser wiring to include/apply inference, loader-level inference warnings, MCP handler extraction/enrichment, path-resolution helpers, and integration tests. ChangesSection/Title Inference Feature
Estimated code review effort🎯 4 (Complex) | ⏱️ ~60 minutes Possibly related PRs
Sequence Diagram(s)sequenceDiagram
participant Client
participant MCP_Handler
participant Parser
participant DB
participant Persistor
Client->>MCP_Handler: parse_document(filename, contentBase64)
MCP_Handler->>Parser: parse(buffer, filename)
Parser-->>MCP_Handler: ParseResult (tree, sectionInference)
alt content-driven inference present
MCP_Handler->>DB: lookupCsiSectionTitle(inferredSection)
DB-->>MCP_Handler: standardTitle|null
MCP_Handler->>MCP_Handler: computeTitleMatch(inferredTitle, standardTitle)
end
MCP_Handler->>Persistor: persistParsedSpec(tree)
Persistor-->>MCP_Handler: specId
MCP_Handler-->>Client: JSON response (specId, section, title, sectionInference?, nodeCount)
🚥 Pre-merge checks | ✅ 4 | ❌ 1❌ Failed checks (1 warning)
✅ Passed checks (4 passed)
✏️ Tip: You can configure your own custom pre-merge checks in the settings. ✨ Finishing Touches🧪 Generate unit tests (beta)
Comment |
There was a problem hiding this comment.
Actionable comments posted: 7
Caution
Some comments are outside the diff and can’t be posted inline due to platform limitations.
⚠️ Outside diff range comments (1)
src/mcp/tools.ts (1)
130-157:⚠️ Potential issue | 🔴 Critical | ⚡ Quick win
guardPath()is still bypassable via symlinks.
path.resolve()+startsWith()only validates the lexical path. A symlink insideprojectRootthat points outside the root still passes this check, andloadFiles()will then read the external target. Resolve both the root and each candidate withrealpath()(or reject symlinks) before doing the containment check.🔒 Hardening sketch
-import { glob } from 'node:fs/promises'; +import { glob, realpath } from 'node:fs/promises'; ... -function guardPath(fp: string, projectRoot: string): ToolError | null { - const abs = path.resolve(fp); - if (!abs.startsWith(projectRoot + path.sep) && abs !== projectRoot) { +async function guardPath(fp: string, projectRoot: string): Promise<ToolError | null> { + const root = await realpath(projectRoot); + const abs = await realpath(path.resolve(projectRoot, fp)); + if (!abs.startsWith(root + path.sep) && abs !== root) { return toolError(`path is outside project root: ${fp}`); } return null; }Then await
guardPath()in both resolution helpers before accepting a candidate path.🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the rest with a brief reason, keep changes minimal, and validate. In `@src/mcp/tools.ts` around lines 130 - 157, guardPath currently uses lexical path.resolve and startsWith which is bypassable via symlinks; make guardPath async and canonicalize both projectRoot and the candidate using fs.promises.realpath (or reject symlinks via lstat) and then perform the containment check against the real projectRoot (ensure you compare with path.sep similarly). Update resolveGlobPaths to await guardPath for each resolved match (use the realpath of each match before checking) and update resolveExplicitPaths to await guardPath for each explicit path and push the canonicalized real paths; also ensure any callers like loadFiles receive/operate on the real (canonical) paths rather than the original lexical ones.
🧹 Nitpick comments (5)
src/lib/infer-section.integration.test.ts (1)
17-21: ⚡ Quick winSimplify null handling in title assertion.
The null coalescing to empty string on line 20 is unnecessary. You can directly assert on the title value for clearer intent.
♻️ Proposed simplification
it('returns standard title for known CSI section', async () => { const title = await lookupCsiSectionTitle('27 10 00'); - expect(typeof title).toBe('string'); - expect((title ?? '').length).toBeGreaterThan(0); + expect(title).toBeTruthy(); + expect(title).toMatch(/.+/); });This avoids the awkward coercion and more directly expresses that the title should be a non-empty string.
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the rest with a brief reason, keep changes minimal, and validate. In `@src/lib/infer-section.integration.test.ts` around lines 17 - 21, The test should directly assert that lookupCsiSectionTitle('27 10 00') returns a non-empty string: in the test that calls lookupCsiSectionTitle, remove the null-coalescing (title ?? '') and instead assert on title itself (e.g., keep expect(typeof title).toBe('string') and add expect(title.length).toBeGreaterThan(0) or expect(title).toBeTruthy()) so the intent is clear and null coercion is eliminated.src/parser/parse.test.ts (1)
60-76: ⚡ Quick winAdd a regression test for partial metadata preservation.
Please add a case where
section: 'unknown'buttitleis already known, and inference finds section only; this should keep the original title.🧪 Suggested test case
+ it('preserves existing title when section is inferred but title is not', async () => { + const partialTree: CsiTree = { + id: 'x', + section: 'unknown', + title: 'Known Title', + parts: [{ id: 'n1', type: 'part', text: 'SECTION 26 09 33', children: [], meta: {} }], + }; + vi.mocked(parseSec).mockReturnValue({ tree: partialTree, refs: [] }); + const result = await parse(Buffer.from(''), 'spec.sec'); + expect(result.tree.section).toBe('26 09 33'); + expect(result.tree.title).toBe('Known Title'); + });🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the rest with a brief reason, keep changes minimal, and validate. In `@src/parser/parse.test.ts` around lines 60 - 76, Add a new unit test in parse.test.ts that mirrors the existing "updates tree section and title when inference fires on unknown section" test but uses a CsiTree with section: 'unknown' and title: 'MOTOR CONTROLLERS' (title already known). Mock parseSec (vi.mocked(parseSec).mockReturnValue(...)) to return that tree and then call parse(Buffer.from(''), 'spec.sec'); assert that result.tree.section === '26 09 33' (inferred) and result.tree.title === 'MOTOR CONTROLLERS' (unchanged), and validate result.sectionInference.method === 'content-high' and result.sectionInference.confidence === 'high' to ensure only the section was updated while preserving existing title.src/lib/file-loader.ts (2)
33-38: ⚡ Quick winDon’t swallow CSI title lookup failures without context.
At Line 36, the catch block drops the original error entirely, so DB outages look the same as “no matching section.” Capture and log the error context before falling back to
null.Suggested patch
async function resolveStandardTitle(section: string): Promise<string | null> { try { return await lookupCsiSectionTitle(section); - } catch { + } catch (err) { + logger.warn({ err, section }, 'lookupCsiSectionTitle failed; continuing without standard title'); return null; } }As per coding guidelines, "Custom error class per module boundary extends
SpecrError, usecausechaining at every catch site to preserve context."🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the rest with a brief reason, keep changes minimal, and validate. In `@src/lib/file-loader.ts` around lines 33 - 38, The catch in resolveStandardTitle currently swallows lookupCsiSectionTitle failures; create a module-specific error class (e.g., FileLoaderError) that extends SpecrError and update the catch to capture the original error (catch (err)), log the error context, and rethrow a new FileLoaderError with a descriptive message and the original error as the cause (using cause chaining) so callers preserve failure context instead of silently returning null.
76-87: ⚡ Quick winAvoid mutating
inferenceWarningsin place.At Line 86,
.push()mutates shared state. Prefer returning the warning fromprocessFile()and aggregating immutably inloadFiles().Suggested patch
async function processFile( file: string, - dryRun: boolean, - inferenceWarnings: InferenceWarning[] -): Promise<void> { + dryRun: boolean +): Promise<InferenceWarning | null> { const buffer = await readFile(file); const result = await parse(buffer, file); - if (dryRun) return; + if (dryRun) return null; const specId = await persistParsedSpec(result); - const warning = await buildInferenceWarning(file, specId, result.sectionInference); - if (warning) inferenceWarnings.push(warning); + return await buildInferenceWarning(file, specId, result.sectionInference); }- const inferenceWarnings: InferenceWarning[] = []; + let inferenceWarnings: ReadonlyArray<InferenceWarning> = []; ... - await processFile(file, opts?.dryRun ?? false, inferenceWarnings); + const warning = await processFile(file, opts?.dryRun ?? false); + if (warning) inferenceWarnings = [...inferenceWarnings, warning];As per coding guidelines, "Immutable patterns: create new objects, never mutate. Use spread operators, not property assignment."
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the rest with a brief reason, keep changes minimal, and validate. In `@src/lib/file-loader.ts` around lines 76 - 87, processFile currently mutates the shared inferenceWarnings array with inferenceWarnings.push; change processFile(file, dryRun, ...) to return the built warning (or null/undefined) instead of mutating state: keep readFile, parse, persistParsedSpec and call buildInferenceWarning(file, specId, result.sectionInference) then return that value; update the caller loadFiles (or any caller that passed the inferenceWarnings array) to collect warnings immutably (e.g., await Promise.all on processFile calls and combine results using filter and array spread/concat into a new array) so no in-place mutation occurs.src/mcp/handlers.ts (1)
62-67: ⚡ Quick winDon't mask CSI title lookup failures as a missing row.
This catch turns every database/query failure into
null, which suppresses title-match enrichment and makes outages indistinguishable from "section not found". At minimum, log the exception before degrading so operators can see why inference warnings disappeared.🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the rest with a brief reason, keep changes minimal, and validate. In `@src/mcp/handlers.ts` around lines 62 - 67, The function resolveStandardTitleForMcp currently swallows all errors from lookupCsiSectionTitle and returns null; change the catch to capture the thrown error (e.g., catch (err)) and log the exception before returning null so operational failures are visible — update the catch block in resolveStandardTitleForMcp to call the existing logger (or console.error) with a clear message and the error and then return null; keep the return behavior so callers are unchanged but failures are now recorded for debugging.
🤖 Prompt for all review comments with AI agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.
Inline comments:
In `@docs/superpowers/specs/2026-05-16-section-inference-design.md`:
- Around line 64-65: The spec is inconsistent about where fuzzy matching and the
CSI title column live; unify them by specifying that fuzzy-match logic runs in
inferSectionMeta after standardTitle is derived (enrichment step), and update
the DB/query contract to use a single canonical column name (choose either title
or section_title) across the document and examples; change all references in the
spec and DB snippet (and the CSI query contract) to the chosen column name and
add a short note in the inferSectionMeta/enrichment section describing that
fuzzy matching uses standardTitle and writes/queries the canonical column (refer
to inferSectionMeta, standardTitle, enrichment, and the DB snippet/CSI query
contract to locate places to update).
In `@src/db/queries/search.ts`:
- Around line 68-73: The DB result from lookupCsiSectionTitle must be
runtime-validated with Zod instead of relying on the TypeScript generic; add a
Zod schema like z.object({ title: z.string() }), import z from 'zod', and
validate result.rows[0] (or handle missing rows) using safeParse/parse; if
validation succeeds return the validated.title, otherwise handle the failure
(return null or log/throw consistently with existing error handling). Update
lookupCsiSectionTitle to call the validator before returning and ensure any
invalid shape is not returned to callers.
In `@src/lib/infer-section.integration.test.ts`:
- Around line 34-38: The test currently only asserts inferredSection and tree
update when result.sectionInference.confidence !== 'none', which lets a case
where method !== 'none' but confidence === 'none' skip important checks; replace
the conditional skip with explicit assertions (e.g.,
expect(result.sectionInference.confidence).not.toBe('none')) and then always
assert that result.sectionInference.inferredSection matches /^\d{2} \d{2}
\d{2}$/ and that result.tree.section equals
result.sectionInference.inferredSection so both method, confidence and the
inferredSection/tree update are validated (refer to
result.sectionInference.method, result.sectionInference.confidence,
result.sectionInference.inferredSection, and result.tree.section).
- Around line 8-10: The test references fixture constants PROJECT_ROOT,
ARCAT_DOCX, and UFGS_SEC and should not run in CI when those files are missing;
add a sentinel-file gating check (e.g., use fs.existsSync or the existing
describe.skipIf helper) at the top of infer-section.integration.test.ts to
detect whether ARCAT_DOCX (or a designated sentinel file) exists and wrap the
fixture-dependent describe blocks with describe.skipIf(!fixtureExists, ...) so
the tests are skipped when fixtures are absent; update the test file to compute
fixtureExists from ARCAT_DOCX/ UFGS_SEC and use that symbol in the
describe.skipIf call.
In `@src/lib/infer-section.ts`:
- Around line 13-15: The regexes KEYWORD_RE and INLINE_TITLE_RE match "SECTION"
inside other words (e.g., "INTERSECTION"), causing false positives; update both
patterns to require word boundaries around SECTION (use \b before and after
SECTION) so they only match whole-word "SECTION" while preserving the rest of
each pattern and case-insensitive flag.
In `@src/mcp/handlers.ts`:
- Around line 193-198: handleParseDocument currently bypasses the shared
dispatcher and rebuilds inference by calling parseSec/parseDocx then
enrichInferenceForMcp and persistParsedSpec, which causes MCP-specific
divergence and emits sectionInference incorrectly; instead, use the
parser/inference result already provided by the dispatcher (do not call
enrichInferenceForMcp on a fresh parse), feed that shared inference tree into
persistParsedSpec (and only emit sectionInference for non-metadata warning cases
as the dispatcher does), removing the extra parse+enrich path around
parseSec/parseDocx, enrichInferenceForMcp and ensuring handleParseDocument
consumes and persists the dispatcher-owned tree and refs exactly as the rest of
the stack does.
In `@src/parser/index.ts`:
- Around line 23-24: applyInference currently overwrites an existing tree.title
with the inferredTitle value 'unknown'; change the return so title is only
replaced when the inferredTitle is not the sentinel 'unknown' (or when
tree.title is missing). Concretely, inside the function that contains the
snippet (applyInference), compute a new title like: if inference.inferredTitle
=== 'unknown' && tree.title then keep tree.title else use
inference.inferredTitle, and return { ...tree, section:
inference.inferredSection, title: computedTitle } so section can still be
updated while preserving an existing known title.
---
Outside diff comments:
In `@src/mcp/tools.ts`:
- Around line 130-157: guardPath currently uses lexical path.resolve and
startsWith which is bypassable via symlinks; make guardPath async and
canonicalize both projectRoot and the candidate using fs.promises.realpath (or
reject symlinks via lstat) and then perform the containment check against the
real projectRoot (ensure you compare with path.sep similarly). Update
resolveGlobPaths to await guardPath for each resolved match (use the realpath of
each match before checking) and update resolveExplicitPaths to await guardPath
for each explicit path and push the canonicalized real paths; also ensure any
callers like loadFiles receive/operate on the real (canonical) paths rather than
the original lexical ones.
---
Nitpick comments:
In `@src/lib/file-loader.ts`:
- Around line 33-38: The catch in resolveStandardTitle currently swallows
lookupCsiSectionTitle failures; create a module-specific error class (e.g.,
FileLoaderError) that extends SpecrError and update the catch to capture the
original error (catch (err)), log the error context, and rethrow a new
FileLoaderError with a descriptive message and the original error as the cause
(using cause chaining) so callers preserve failure context instead of silently
returning null.
- Around line 76-87: processFile currently mutates the shared inferenceWarnings
array with inferenceWarnings.push; change processFile(file, dryRun, ...) to
return the built warning (or null/undefined) instead of mutating state: keep
readFile, parse, persistParsedSpec and call buildInferenceWarning(file, specId,
result.sectionInference) then return that value; update the caller loadFiles (or
any caller that passed the inferenceWarnings array) to collect warnings
immutably (e.g., await Promise.all on processFile calls and combine results
using filter and array spread/concat into a new array) so no in-place mutation
occurs.
In `@src/lib/infer-section.integration.test.ts`:
- Around line 17-21: The test should directly assert that
lookupCsiSectionTitle('27 10 00') returns a non-empty string: in the test that
calls lookupCsiSectionTitle, remove the null-coalescing (title ?? '') and
instead assert on title itself (e.g., keep expect(typeof title).toBe('string')
and add expect(title.length).toBeGreaterThan(0) or expect(title).toBeTruthy())
so the intent is clear and null coercion is eliminated.
In `@src/mcp/handlers.ts`:
- Around line 62-67: The function resolveStandardTitleForMcp currently swallows
all errors from lookupCsiSectionTitle and returns null; change the catch to
capture the thrown error (e.g., catch (err)) and log the exception before
returning null so operational failures are visible — update the catch block in
resolveStandardTitleForMcp to call the existing logger (or console.error) with a
clear message and the error and then return null; keep the return behavior so
callers are unchanged but failures are now recorded for debugging.
In `@src/parser/parse.test.ts`:
- Around line 60-76: Add a new unit test in parse.test.ts that mirrors the
existing "updates tree section and title when inference fires on unknown
section" test but uses a CsiTree with section: 'unknown' and title: 'MOTOR
CONTROLLERS' (title already known). Mock parseSec
(vi.mocked(parseSec).mockReturnValue(...)) to return that tree and then call
parse(Buffer.from(''), 'spec.sec'); assert that result.tree.section === '26 09
33' (inferred) and result.tree.title === 'MOTOR CONTROLLERS' (unchanged), and
validate result.sectionInference.method === 'content-high' and
result.sectionInference.confidence === 'high' to ensure only the section was
updated while preserving existing title.
🪄 Autofix (Beta)
Fix all unresolved CodeRabbit comments on this PR:
- Push a commit to this branch (recommended)
- Create a new PR with the fixes
ℹ️ Review info
⚙️ Run configuration
Configuration used: defaults
Review profile: CHILL
Plan: Pro Plus
Run ID: 4b606dba-f97a-4653-ba28-5a0f85d6eefc
📒 Files selected for processing (13)
docs/superpowers/plans/2026-05-16-section-inference.mddocs/superpowers/specs/2026-05-16-section-inference-design.mdsrc/db/index.tssrc/db/queries/search.tssrc/lib/file-loader.test.tssrc/lib/file-loader.tssrc/lib/infer-section.integration.test.tssrc/lib/infer-section.test.tssrc/lib/infer-section.tssrc/mcp/handlers.tssrc/mcp/tools.tssrc/parser/index.tssrc/parser/parse.test.ts
…update inference note strings
… async realpath guardPath, spec doc fixes
There was a problem hiding this comment.
Actionable comments posted: 1
🤖 Prompt for all review comments with AI agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.
Inline comments:
In `@src/lib/infer-section.ts`:
- Around line 144-145: The empty catch should be changed to capture the error
and log it with cause-chaining before returning the defensive NONE_RESULT;
update the catch to "catch (err)" in the inferSection function (or the
surrounding function where NONE_RESULT is returned), call the module's logger
(e.g., logger or processLogger) to log a clear message like "Failed to infer
section" and pass an Error constructed with the original error as its cause (new
Error('Failed to infer section', { cause: err })) or include { cause: err } in
the log call, then return NONE_RESULT.
🪄 Autofix (Beta)
Fix all unresolved CodeRabbit comments on this PR:
- Push a commit to this branch (recommended)
- Create a new PR with the fixes
ℹ️ Review info
⚙️ Run configuration
Configuration used: defaults
Review profile: CHILL
Plan: Pro Plus
Run ID: eb60a74e-40ce-4ac8-b0c2-0719d0ccd980
📒 Files selected for processing (5)
docs/superpowers/specs/2026-05-16-section-inference-design.mdsrc/lib/infer-section.integration.test.tssrc/lib/infer-section.tssrc/mcp/tools.tssrc/parser/index.ts
✅ Files skipped from review due to trivial changes (1)
- docs/superpowers/specs/2026-05-16-section-inference-design.md
🚧 Files skipped from review as they are similar to previous changes (3)
- src/lib/infer-section.integration.test.ts
- src/mcp/tools.ts
- src/parser/index.ts
There was a problem hiding this comment.
Actionable comments posted: 2
🤖 Prompt for all review comments with AI agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.
Inline comments:
In `@src/mcp/tools.ts`:
- Around line 209-210: Don't use process.cwd() as the trust boundary; instead
resolve relative paths from the MCP-configured project root and pass that into
collectResolvedPaths. Replace the use of process.cwd() when building projectRoot
with the authoritative project root value provided by the MCP context/config
(e.g., use the existing projectRoot/config.projectRoot/serverContext.projectRoot
or add a projectRoot parameter to the caller) so that
collectResolvedPaths(globPattern, explicitPaths, projectRoot) operates against
the correct repository tree rather than the launch directory.
🪄 Autofix (Beta)
Fix all unresolved CodeRabbit comments on this PR:
- Push a commit to this branch (recommended)
- Create a new PR with the fixes
Summary
Test plan
Summary by CodeRabbit
New Features
Documentation
Tests