feat(lib): parse worker concurrency cap — piscina worker pool#71
Conversation
Add piscina worker pool capping concurrent parse at cpu_count-1 threads. Moves parseSec/parseDocx/parseText into parse-worker.ts; main thread retains DB work and coarse progress (10%/75%). Zero-copy buffer transfer via transferList. parse.test.ts mocks the pool to avoid live workers. Closes #33
📝 WalkthroughWalkthroughA Piscina worker pool is added and exported; a parse-worker module dispatches parsing by file extension and returns { tree, capabilities? }; the main parse handler now calls ChangesWorker Pool for Bounded Concurrent Parsing
Sequence Diagram(s)sequenceDiagram
participant ParseHandler
participant ParsePool
participant ParseWorker
ParseHandler->>ParsePool: run({ buffer, ext })
ParsePool->>ParseWorker: execute parseWorker({ buffer, ext })
ParseWorker-->>ParsePool: return { tree, capabilities? }
ParsePool-->>ParseHandler: resolve worker result
ParseHandler->>ParseHandler: validate workerOutputSchema & persist CsiTree
Estimated code review effort🎯 3 (Moderate) | ⏱️ ~25 minutes
🚥 Pre-merge checks | ✅ 4 | ❌ 1❌ Failed checks (1 warning)
✅ Passed checks (4 passed)
✏️ Tip: You can configure your own custom pre-merge checks in the settings. ✨ Finishing Touches📝 Generate docstrings
🧪 Generate unit tests (beta)
Comment |
There was a problem hiding this comment.
Actionable comments posted: 2
🧹 Nitpick comments (2)
src/lib/parse-pool.test.ts (1)
5-7: ⚡ Quick winAssert the exact thread-cap formula, not just a lower bound.
This test won’t catch regressions where
maxThreadsexceedsMath.max(1, os.cpus().length - 1). Please assert equality to the expected value so it verifies the acceptance criterion directly.🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the rest with a brief reason, keep changes minimal, and validate. In `@src/lib/parse-pool.test.ts` around lines 5 - 7, Replace the loose lower-bound assertion with an exact equality check: compute expected = Math.max(1, os.cpus().length - 1) and assert expect(parsePool.options.maxThreads).toBe(expected); ensure the test imports/uses the same os module and references parsePool.options.maxThreads so the test fails if the thread-cap formula changes.src/api/parse.ts (1)
114-116: ⚡ Quick winType
onProgresswithParseStageto remove the cast.Use
stage: ParseStagein the helper signature sostage as ParseStageis unnecessary.As per coding guidelines: "
src/**/*.{ts,tsx}: Use TypeScript strict mode with noany, noas unknown as, and no type assertions across module boundaries".🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the rest with a brief reason, keep changes minimal, and validate. In `@src/api/parse.ts` around lines 114 - 116, Change the onProgress helper to accept stage: ParseStage instead of stage: string so you can remove the type assertion; update the signature of onProgress(stage: ParseStage, pct: number): void and call updateJob(jobId, { stage, pct, status: 'running' }); (ensure ParseStage is imported/available in this module and that jobId/updateJob usages remain unchanged).
🤖 Prompt for all review comments with AI agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.
Inline comments:
In `@src/api/parse.ts`:
- Around line 119-123: Replace the unsafe cast of the worker response from
parsePool.run to WorkerOutput with runtime validation using a Zod schema: define
a Zod schema matching WorkerOutput (tree, capabilities, etc.), run
parsePool.run(...) as unknown, pass the result through schema.parse or
safeParse, and use the parsed value for tree and capabilities; on schema
failure, log or throw a clear error and avoid calling onProgress('classifying',
75) with invalid data. Locate the call to parsePool.run in parse.ts and the
WorkerOutput type to model the Zod schema and ensure all downstream uses (e.g.,
onProgress) receive validated data.
In `@src/lib/parse-worker.ts`:
- Around line 25-27: The current fallback always calls parseDocx(buffer) for any
unexpected ext; add an explicit guard using the ext value (e.g., in the function
that calls parseDocx) to reject unsupported extensions instead of falling
through to DOCX parsing: check the ext (or mime) early, allow only the supported
cases (e.g., ".docx") to call parseDocx(buffer) and for any other ext throw or
return an explicit error/ rejection so the caller fails deterministically rather
than parsing as DOCX.
---
Nitpick comments:
In `@src/api/parse.ts`:
- Around line 114-116: Change the onProgress helper to accept stage: ParseStage
instead of stage: string so you can remove the type assertion; update the
signature of onProgress(stage: ParseStage, pct: number): void and call
updateJob(jobId, { stage, pct, status: 'running' }); (ensure ParseStage is
imported/available in this module and that jobId/updateJob usages remain
unchanged).
In `@src/lib/parse-pool.test.ts`:
- Around line 5-7: Replace the loose lower-bound assertion with an exact
equality check: compute expected = Math.max(1, os.cpus().length - 1) and assert
expect(parsePool.options.maxThreads).toBe(expected); ensure the test
imports/uses the same os module and references parsePool.options.maxThreads so
the test fails if the thread-cap formula changes.
🪄 Autofix (Beta)
Fix all unresolved CodeRabbit comments on this PR:
- Push a commit to this branch (recommended)
- Create a new PR with the fixes
ℹ️ Review info
⚙️ Run configuration
Configuration used: defaults
Review profile: CHILL
Plan: Pro Plus
Run ID: e6e5b87f-a856-4ee9-bd43-9d19b40de982
⛔ Files ignored due to path filters (1)
pnpm-lock.yamlis excluded by!**/pnpm-lock.yaml
📒 Files selected for processing (6)
package.jsonsrc/api/parse.test.tssrc/api/parse.tssrc/lib/parse-pool.test.tssrc/lib/parse-pool.tssrc/lib/parse-worker.ts
…od output validation
… not transferable
Status table: add 1c-iii..1c-viii and 1c-sec-i/ii rows covering PRs #69 #70 #71 #72 #74 #75 #76. Updates 'Active development' subtitle to reflect Phase 1c being complete. Parsing section: add plaintext signal hardening (#70), parse-anomaly warnings (#75), and DOCX resilience suite (#72) bullets. MCP section: note POST /mcp rate limiting (#69). Not Yet Built: strike completed items (DOCX cross-ref extraction in PR #76, parse worker concurrency cap in PR #71). Add new known gap: REST persistTree ignores extracted refs (follow-up to #53).
Summary
cpu_count - 1threads — prevents memory exhaustion from concurrent large DOCX payloadssrc/lib/parse-worker.ts— worker function (parseSec / parseDocx / parseText, no DB); dev uses tsx/esm loader via execArgv, prod uses compiled .jssrc/lib/parse-pool.ts— pool singleton, dev/prod file resolution viaimport.meta.url.endsWith('.ts')src/api/parse.ts—dispatchParseremoved; replaced withparsePool.run()+ zero-copy buffer transfer viatransferListsrc/api/parse.test.ts— addedvi.mock('../lib/parse-pool.js')so unit tests don't spawn live worker threadsProgress reporting coarsens slightly: per-chunk signals from inside
parseDocxno longer surface, but start (10%) and complete (75%) checkpoints are preserved.Test plan
pnpm test src/lib/parse-pool.test.ts— pool maxThreads unit test passespnpm test— all 424 unit tests passpnpm lint— ESLint + tsc + prettier cleanpnpm build— TypeScript compiles without errorsDATABASE_URL=... pnpm test:integration— end-to-end parse path exercises POST /parse → worker → DBCloses #33
Summary by CodeRabbit
New Features
Refactor
Tests
Chores