feat(parser): text anomaly warnings — meta.warnings on CsiTree#75
Conversation
|
No actionable comments were generated in the recent review. 🎉 ℹ️ Recent review info⚙️ Run configurationConfiguration used: defaults Review profile: CHILL Plan: Pro Plus Run ID: 📒 Files selected for processing (1)
📝 WalkthroughWalkthroughThis PR implements parse-time anomaly detection for the text parser, surfacing structural failures (root continuations, empty parts, missing structure) as typed warnings in the ChangesParse-time anomaly surfacing (Issue
Related design documentation updates
Sequence DiagramsequenceDiagram
participant Client
participant ExpressServer
participant ParseWorker
participant Database
Client->>ExpressServer: POST /parse (multipart text fixture)
ExpressServer->>ParseWorker: enqueue job / invoke worker
Client->>ExpressServer: poll GET /parse/jobs/:jobId
ParseWorker->>ExpressServer: job complete (finalTree with optional warnings)
ExpressServer->>Database: write spec row (specId) / update job result
ExpressServer->>Client: 202 then final job result (includes warnings, capabilities, nodeCount)
Estimated code review effort🎯 3 (Moderate) | ⏱️ ~25 minutes Possibly related issues
Possibly related PRs
Poem
🚥 Pre-merge checks | ✅ 4 | ❌ 1❌ Failed checks (1 warning)
✅ Passed checks (4 passed)
✏️ Tip: You can configure your own custom pre-merge checks in the settings. ✨ Finishing Touches📝 Generate docstrings
🧪 Generate unit tests (beta)
Comment |
There was a problem hiding this comment.
Caution
Some comments are outside the diff and can’t be posted inline due to platform limitations.
⚠️ Outside diff range comments (1)
src/api/parse.ts (1)
108-115:⚠️ Potential issue | 🟠 Major | ⚡ Quick winValidate warnings against ParseWarningSchema in worker output (Line 114).
warnings: z.array(z.unknown()).optional()accepts any payload shape, allowing malformed worker warnings to pass validation. Use the existingParseWarningSchemacontract instead—it's already defined insrc/ast/schemas.tsand correctly used inCsiNodeSchema.Proposed fix
import { z } from 'zod'; +import { ParseWarningSchema } from '../ast/schemas.js'; @@ const workerOutputSchema = z.object({ tree: z.object({ @@ - warnings: z.array(z.unknown()).optional(), + warnings: z.array(ParseWarningSchema).optional(), }), capabilities: z.array(z.string()).optional(), });Per coding guidelines: "All external input validation uses Zod: request bodies, env vars, parsed XML/OOXML, and database results".
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the rest with a brief reason, keep changes minimal, and validate. In `@src/api/parse.ts` around lines 108 - 115, Replace the overly-permissive warnings schema in workerOutputSchema with the canonical ParseWarningSchema so worker outputs are strictly validated: update the definition inside workerOutputSchema (the tree.warnings field) to use ParseWarningSchema imported from src/ast/schemas.ts (the same schema used by CsiNodeSchema) and keep it optional/array-typed as appropriate; ensure to add the import for ParseWarningSchema at the top of src/api/parse.ts and run existing tests/validation to confirm compatibility.
🤖 Prompt for all review comments with AI agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.
Outside diff comments:
In `@src/api/parse.ts`:
- Around line 108-115: Replace the overly-permissive warnings schema in
workerOutputSchema with the canonical ParseWarningSchema so worker outputs are
strictly validated: update the definition inside workerOutputSchema (the
tree.warnings field) to use ParseWarningSchema imported from src/ast/schemas.ts
(the same schema used by CsiNodeSchema) and keep it optional/array-typed as
appropriate; ensure to add the import for ParseWarningSchema at the top of
src/api/parse.ts and run existing tests/validation to confirm compatibility.
ℹ️ Review info
⚙️ Run configuration
Configuration used: defaults
Review profile: CHILL
Plan: Pro Plus
Run ID: b0250529-b793-4956-83b2-8b2a67cd8338
📒 Files selected for processing (11)
.gitignoredocs/superpowers/specs/2026-05-18-issue-027-design.mddocs/superpowers/specs/2026-05-18-issue-068-design.mddocs/superpowers/specs/2026-05-18-issue-073-design.mdsrc/api/parse-warnings.integration.test.tssrc/api/parse.tssrc/ast/schemas.tssrc/ast/types.tssrc/parser/text/index.tssrc/parser/text/warnings.test.tstests/fixtures/text/anomaly-empty-part.txt
…-warnings capability Surface three structural anomalies as typed warnings on the parsed text tree: - root-continuation: continuation text dropped before first structural heading (cap at 5) - empty-part: PART node with zero article children - no-structure-found: tree has zero parts Warnings ride sparsely on CsiTree.warnings (undefined when none). When present, parseText adds 'parse-warnings' to capabilities and the REST job result envelope exposes warnings[] alongside the existing nodeCount / capabilities fields. This is the observability layer for the text parser — analogous to meta.conflicts in the DOCX parser. Warnings are not persisted; they live in the parse job result envelope and disappear once consumed. Closes #68
There was a problem hiding this comment.
Actionable comments posted: 1
🧹 Nitpick comments (2)
docs/superpowers/specs/2026-05-18-issue-027-design.md (1)
25-39: ⚡ Quick winAdd language specifier to fenced code block.
The architecture diagram code block should include a language identifier for proper rendering and consistency with other code blocks in the document. As per static analysis tools, fenced code blocks should have a language specified.
📝 Proposed fix
-``` +```text src/parser/refs/ ← new format-agnostic module ├── index.ts barrel ├── rules.ts SECTION_REF_RULES + STANDARD_ORG_PATTERNS + buildStandardRefRules🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the rest with a brief reason, keep changes minimal, and validate. In `@docs/superpowers/specs/2026-05-18-issue-027-design.md` around lines 25 - 39, The fenced architecture diagram block in docs/superpowers/specs/2026-05-18-issue-027-design.md is missing a language specifier; update the opening fence to include an appropriate language (e.g., ```text) so static analysis and renderers treat it consistently, leaving the diagram content unchanged and keeping the block around the src/parser/refs/ ... src/parser/index.ts snippet intact.src/parser/text/warnings.test.ts (1)
72-105: 💤 Low valueConsider consolidating duplicate clean-path tests.
Tests 6 and 7 use nearly identical input fixtures and both verify the clean path (no anomalies). Test 7 adds a specific assertion that
capabilitiesequals['read-only'], but it omits thetree.warningscheck from test 6. Combining them into a single comprehensive test would reduce duplication and improve maintainability.♻️ Suggested consolidation
- it('no anomalies: well-formed UFGS structure → tree.warnings undefined, capabilities does NOT include parse-warnings', () => { + it('no anomalies: well-formed UFGS structure → tree.warnings undefined, capabilities remain ["read-only"]', () => { const text = [ 'SECTION 09 91 00 - PAINTING', 'PART 1 - GENERAL', '1.1 SCOPE', 'Hello.', 'PART 2 - PRODUCTS', '2.1 MATERIALS', - 'Product description.', + 'Materials.', 'PART 3 - EXECUTION', '3.1 INSTALLATION', - 'Installation steps.', + 'Install.', ].join('\n'); const result = parseText(text); expect(result.tree.warnings).toBeUndefined(); expect(result.capabilities).not.toContain('parse-warnings'); + expect(result.capabilities).toEqual(['read-only']); }); - - it('capabilities array unchanged on no-anomaly path (still ["read-only"])', () => { - const text = [ - 'SECTION 09 91 00 - PAINTING', - 'PART 1 - GENERAL', - '1.1 SCOPE', - 'Hello.', - 'PART 2 - PRODUCTS', - '2.1 MATERIALS', - 'Materials.', - 'PART 3 - EXECUTION', - '3.1 INSTALLATION', - 'Install.', - ].join('\n'); - const result = parseText(text); - expect(result.capabilities).toEqual(['read-only']); - });🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the rest with a brief reason, keep changes minimal, and validate. In `@src/parser/text/warnings.test.ts` around lines 72 - 105, Two tests repeat the clean/no-anomaly path; combine them into one comprehensive test: replace the two it(...) blocks ('no anomalies: well-formed UFGS structure → tree.warnings undefined, capabilities does NOT include parse-warnings' and 'capabilities array unchanged on no-anomaly path (still ["read-only"])') with a single test that calls parseText(...) once and asserts both that result.tree.warnings is undefined and that result.capabilities strictly equals ['read-only'] (and does not contain 'parse-warnings'); locate usages by the test descriptions and the parseText call to update the spec accordingly and remove the duplicate fixture/test block.
🤖 Prompt for all review comments with AI agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.
Inline comments:
In `@src/api/parse.ts`:
- Line 114: Replace the unsafe z.array(z.unknown()).optional() for worker
warnings with the proper ParseWarningSchema: import ParseWarningSchema from
src/ast/schemas.ts (or the named export) at the top of src/api/parse.ts and
change the schema entry for "warnings" to use
z.array(ParseWarningSchema).optional() so all worker output is validated against
the concrete warning schema (refer to the warnings field and the
ParseWarningSchema symbol to locate the change).
---
Nitpick comments:
In `@docs/superpowers/specs/2026-05-18-issue-027-design.md`:
- Around line 25-39: The fenced architecture diagram block in
docs/superpowers/specs/2026-05-18-issue-027-design.md is missing a language
specifier; update the opening fence to include an appropriate language (e.g.,
```text) so static analysis and renderers treat it consistently, leaving the
diagram content unchanged and keeping the block around the src/parser/refs/ ...
src/parser/index.ts snippet intact.
In `@src/parser/text/warnings.test.ts`:
- Around line 72-105: Two tests repeat the clean/no-anomaly path; combine them
into one comprehensive test: replace the two it(...) blocks ('no anomalies:
well-formed UFGS structure → tree.warnings undefined, capabilities does NOT
include parse-warnings' and 'capabilities array unchanged on no-anomaly path
(still ["read-only"])') with a single test that calls parseText(...) once and
asserts both that result.tree.warnings is undefined and that result.capabilities
strictly equals ['read-only'] (and does not contain 'parse-warnings'); locate
usages by the test descriptions and the parseText call to update the spec
accordingly and remove the duplicate fixture/test block.
🪄 Autofix (Beta)
Fix all unresolved CodeRabbit comments on this PR:
- Push a commit to this branch (recommended)
- Create a new PR with the fixes
ℹ️ Review info
⚙️ Run configuration
Configuration used: defaults
Review profile: CHILL
Plan: Pro Plus
Run ID: 0c5cc3a6-6ec2-4cc1-9449-0e9438f75ec6
📒 Files selected for processing (9)
docs/superpowers/specs/2026-05-18-issue-027-design.mddocs/superpowers/specs/2026-05-18-issue-073-design.mdsrc/api/parse-warnings.integration.test.tssrc/api/parse.tssrc/ast/schemas.tssrc/ast/types.tssrc/parser/text/index.tssrc/parser/text/warnings.test.tstests/fixtures/text/anomaly-empty-part.txt
✅ Files skipped from review due to trivial changes (1)
- tests/fixtures/text/anomaly-empty-part.txt
🚧 Files skipped from review as they are similar to previous changes (4)
- src/ast/types.ts
- src/ast/schemas.ts
- src/parser/text/index.ts
- src/api/parse-warnings.integration.test.ts
…nown() Worker output is external input — per error-handling guidelines, all external input is Zod-validated with concrete schemas. The earlier z.array(z.unknown()) defeated validation and allowed arbitrary warning structures into the job-result envelope. Per CodeRabbit review on PR #75.
Status table: add 1c-iii..1c-viii and 1c-sec-i/ii rows covering PRs #69 #70 #71 #72 #74 #75 #76. Updates 'Active development' subtitle to reflect Phase 1c being complete. Parsing section: add plaintext signal hardening (#70), parse-anomaly warnings (#75), and DOCX resilience suite (#72) bullets. MCP section: note POST /mcp rate limiting (#69). Not Yet Built: strike completed items (DOCX cross-ref extraction in PR #76, parse worker concurrency cap in PR #71). Add new known gap: REST persistTree ignores extracted refs (follow-up to #53).
Summary
Adds structured anomaly warnings to the text parser, giving spec writers observability into silent failure modes (analogous to
meta.conflictsin the DOCX parser).Three warning types surfaced on
CsiTree.warnings:root-continuation— continuation text dropped before first structural heading (cap at 5 to prevent pathological PDFs from producing massive warning arrays)empty-part— apartnode with zero article childrenno-structure-found— tree has zero parts (entire doc parsed as continuations)When any warning is emitted,
parseTextadds'parse-warnings'tocapabilities. Warnings ride sparsely onCsiTree.warnings(undefined when none). The REST job result envelope exposeswarnings[]alongside the existingnodeCount/capabilitiesfields. Warnings are not persisted to the database — they live in the parse job result envelope and disappear once consumed.Implementation
src/ast/types.ts—ParseWarning,ParseWarningType,CsiTree.warnings?(optional)src/ast/schemas.ts—ParseWarningSchema,ParseWarningTypeSchema, extendedCsiTreeSchemasrc/parser/text/index.ts—buildTreerefactor returnsBuildResult { parts, droppedAtRoot, partLineIndex }, newdetectWarnings,WARNING_SUGGESTIONSconstant map, root-continuation cap constantsrc/api/parse.ts— extendsworkerOutputSchemato allow warnings; surfaces warnings on job resultTest plan
pnpm test src/parser/text/warnings.test.ts— 7 unit tests pass (RED → GREEN)pnpm test— all 442 unit tests pass (no regressions)pnpm test:integration src/api/parse-warnings.integration.test.ts— integration test assertsresult.warnings,result.capabilities.includes('parse-warnings'), and other parsing still works (nodeCount > 3)pnpm test:integration src/api/parse.integration.test.ts— existing parse integration tests still passpnpm lint— cleanpnpm build— cleanDesign
See
docs/superpowers/specs/2026-05-18-issue-068-design.md(approved).Out of scope
Closes #68
Summary by CodeRabbit
New Features
Tests
Documentation