feat: auto-resolve filenames, scoped RAG search, and chunk-based contract modification#789
Conversation
There was a problem hiding this comment.
Your free trial has ended. If you'd like to continue receiving code reviews, you can add a payment method here.
📝 WalkthroughWalkthroughThis PR introduces a comprehensive contract generation workflow configuration and enhances file parsing across multiple tools. Key changes include: adding optional filename resolution via a new Estimated code review effort🎯 3 (Moderate) | ⏱️ ~28 minutes Possibly related PRs
🚥 Pre-merge checks | ✅ 2 | ❌ 1❌ Failed checks (1 warning)
✅ Passed checks (2 passed)
✏️ Tip: You can configure your own custom pre-merge checks in the settings. ✨ Finishing Touches
🧪 Generate unit tests (beta)
📝 Coding Plan
Comment Tip You can enable review details to help with troubleshooting, context usage and more.Enable the |
There was a problem hiding this comment.
Actionable comments posted: 11
Caution
Some comments are outside the diff and can’t be posted inline due to platform limitations.
⚠️ Outside diff range comments (2)
services/platform/convex/custom_agents/mutations.ts (1)
299-299: 🧹 Nitpick | 🔵 TrivialConsider adding name validation to updateCustomAgent.
The
createCustomAgentmutation validates the name format, butupdateCustomAgentaccepts an optionalnamefield without applying the same regex validation. If agent renaming is allowed, a user could potentially bypass the naming rules by updating an existing agent.🔧 Proposed fix to add validation
handler: async (ctx, args): Promise<null> => { const authUser = await authComponent.getAuthUser(ctx); if (!authUser) throw new Error('Unauthenticated'); + if (args.name !== undefined && !/^[a-z0-9][a-z0-9_-]*$/.test(args.name)) { + throw new Error( + 'Agent name must start with a letter or number and contain only lowercase letters, numbers, hyphens, and underscores', + ); + } + const draft = await getDraftByRoot(ctx, args.customAgentId);🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@services/platform/convex/custom_agents/mutations.ts` at line 299, The updateCustomAgent mutation currently uses name: v.optional(v.string()) without the regex checks used in createCustomAgent, allowing invalid renames; change the optional name validator in updateCustomAgent to the same validated form used in createCustomAgent (e.g., v.optional(v.string().regex(/* same pattern */) or the shared helper used for agent name validation) so any provided name is validated before persisting, and reuse the same error/message behavior as createCustomAgent.services/platform/convex/lib/attachments/process_attachments.ts (1)
288-290: 🧹 Nitpick | 🔵 TrivialUse
fileIdinstead offileNameto identify failed documents.Now that
parsedDocumentsincludesfileId, matching byfileNamecan produce incorrect results if multiple attachments share the same filename. One document parsing successfully would incorrectly mark another same-named attachment as processed.♻️ Proposed fix
const failedDocuments = documentAttachments.filter( - (a) => !parsedDocuments.some((d) => d.fileName === a.fileName), + (a) => !parsedDocuments.some((d) => d.fileId === a.fileId), );🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@services/platform/convex/lib/attachments/process_attachments.ts` around lines 288 - 290, failedDocuments is currently computed by comparing documentAttachments to parsedDocuments using fileName which can misidentify processed items when filenames repeat; update the filter to compare unique fileId instead (use documentAttachments.filter(a => !parsedDocuments.some(d => d.fileId === a.fileId))) so failedDocuments is determined by fileId, and ensure both parsedDocuments and documentAttachments include fileId references (check functions that populate parsedDocuments and the attachment shape used in this match).
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.
Inline comments:
In `@examples/workflows/contract-generation/config.json`:
- Around line 289-290: The repair_consistency step is missing deal-type
classification because repairContext only carries
steps.build_definition_registry.output.data; modify the generation pipeline so
repairContext includes the merge_clauses output as well (e.g., merge or embed
steps.merge_clauses.output.data into the repairContext object alongside
steps.build_definition_registry.output.data) so repair_consistency can read the
deal-type and correctly remove stock/asset-only provisions.
- Around line 311-320: The workflow routes directly from stepSlug
"repair_references" to "repair_consistency" but later steps allow
removing/renumbering sections, which can leave stale cross-references; update
the flow so renumbering always happens before the final cross-reference pass:
either move "repair_references" to run after the renumbering/remove-sections
step or add a follow-up transition from the renumbering step back to
"repair_references" (or make "repair_consistency" trigger a final
"repair_references" run on success). Ensure the unique stepSlug
"repair_references" is the last cross-reference fixer in the chain so all
renumbering changes are validated and corrected before output.
- Around line 182-183: The config hardcodes a leading "\n\n" when concatenating
chunk output which breaks verbatim preservation; update the logic that builds
model input/output so "systemPrompt" / "userPrompt" do not unconditionally
prepend "\n\n" to returned chunks—either remove that hardcoded prefix entirely
or make it conditional (only add separators when the chunk was modified or when
it's safe to join) and ensure the pipeline that emits the final chunk uses the
chunk's exact original text when the decision in the systemPrompt is "output the
chunk text EXACTLY as-is" so tables, numbering, and spacing remain byte-for-byte
identical.
In `@services/platform/convex/agent_tools/files/excel_tool.ts`:
- Around line 138-142: The call to resolveFileName (using ctx, args.fileId,
args.filename) can throw before the Excel parse try/catch, causing uncaught
errors; move the resolveFileName invocation into the same try block that wraps
the Excel parsing logic (the try surrounding the parse and return of { success:
false, ... }) so any errors from resolveFileName are caught and produce the
structured failure payload — update the code that references resolvedFilename
accordingly inside that try and remove the pre-try assignment.
In `@services/platform/convex/agent_tools/files/helpers/parse_file.ts`:
- Line 50: The call to resolveFileName(ctx, fileId, filename) can throw before
the try in parseFile, causing unhandled failures; move that call inside the
existing try block in parseFile (so resolution is covered by the structured
error handling) and ensure any errors from resolveFileName result in returning
the same { success: false, ... } response path used elsewhere in parseFile
rather than allowing an exception to escape.
In `@services/platform/convex/agent_tools/files/txt_tool.ts`:
- Line 194: The call to resolveFileName(ctx, fileId, filename) runs outside the
parsing try/catch so failures bypass the tool's structured parse error handling;
move or duplicate that call inside the existing try block that parses the tool
input (or wrap it in its own try/catch) so any errors from resolveFileName are
caught and translated into the tool’s parse error response; update references to
resolvedFilename used later in the function to ensure they come from the guarded
resolveFileName call inside the try/catch surrounding the parsing logic.
In `@services/platform/convex/workflow_engine/action_defs/rag/rag_action.ts`:
- Around line 152-180: The pagination loop in get_chunks (inside the while in
rag_action.ts using chunkStart/chunkEnd and MAX_CHUNK_WINDOW) performs unbounded
fetches; add the same per-request AbortController timeout logic used by the
search branch: create an AbortController at the top of each iteration, set a
timeout to call controller.abort() after the configured timeout, pass
controller.signal to fetch(url, { signal }), and clear the timeout after the
response is received (or in finally) so hung requests can't stall the workflow;
ensure fetchJson<DocumentContentResponse>(response) is still called with the
successful response and handle abort errors appropriately.
In
`@services/platform/convex/workflow_engine/helpers/validation/variables/action_schemas.ts`:
- Around line 410-425: Update the ragSchemas.get_chunks result to match
RagChunkResult: change the title field to allow string or null (not optional
string) and replace chunks: { type: 'any' } with a typed array of chunk objects
(each having index:number and content:string) so validation for
steps.get_base_chunks.output.data.chunks and loop.item.content is accurate; keep
other fields (documentId, totalChunks, executionTimeMs) unchanged.
In `@services/platform/convex/workflow_engine/workflow_syntax_compact.ts`:
- Around line 13-24: The example's output step configuration is inconsistent:
the 'result' step (stepSlug: 'result') is shown as terminal with nextSteps: {}
but elsewhere is documented as having a success port ({ success: 'next_step' });
pick one behavior and make the example and docs match. To fix, update the
stepsConfig entry for 'result' (and any duplicate example blocks) so its
nextSteps matches the documented behavior — either change nextSteps: {} to
nextSteps: { success: 'next_step' } if output steps should proceed, or change
the documentation/examples that show a success port to use nextSteps: {} if
output is terminal — ensure all occurrences of 'result' (and the example at the
top and the block around the alternate docs) are aligned.
- Around line 13-24: The Hello World example is missing the workflowType
property in the declared workflow shape; update the example object so
workflowConfig includes a workflowType entry (alongside name/description) to
match the canonical shape used elsewhere—ensure the top-level object still has
workflowConfig and stepsConfig and that workflowType is set to the appropriate
enum/string used by the system.
- Around line 196-217: The documentation for the pipe "sort" transform is
ambiguous; update the "sort" entry in workflow_syntax_compact.ts to explicitly
show two overloads: sort(order) for arrays of primitives (e.g.,
items|sort('asc') or items|sort('desc')) and sort(field, order) for arrays of
objects (e.g., items|sort('price','desc') with default order 'asc' if omitted),
and mention the default behavior and return type; ensure the examples and the
short description reference the "sort" transform name exactly and normalize the
examples so they no longer conflict.
---
Outside diff comments:
In `@services/platform/convex/custom_agents/mutations.ts`:
- Line 299: The updateCustomAgent mutation currently uses name:
v.optional(v.string()) without the regex checks used in createCustomAgent,
allowing invalid renames; change the optional name validator in
updateCustomAgent to the same validated form used in createCustomAgent (e.g.,
v.optional(v.string().regex(/* same pattern */) or the shared helper used for
agent name validation) so any provided name is validated before persisting, and
reuse the same error/message behavior as createCustomAgent.
In `@services/platform/convex/lib/attachments/process_attachments.ts`:
- Around line 288-290: failedDocuments is currently computed by comparing
documentAttachments to parsedDocuments using fileName which can misidentify
processed items when filenames repeat; update the filter to compare unique
fileId instead (use documentAttachments.filter(a => !parsedDocuments.some(d =>
d.fileId === a.fileId))) so failedDocuments is determined by fileId, and ensure
both parsedDocuments and documentAttachments include fileId references (check
functions that populate parsedDocuments and the attachment shape used in this
match).
🪄 Autofix (Beta)
Fix all unresolved CodeRabbit comments on this PR:
- Push a commit to this branch (recommended)
- Create a new PR with the fixes
ℹ️ Review info
⚙️ Run configuration
Configuration used: Organization UI
Review profile: ASSERTIVE
Plan: Pro
Run ID: 74b44187-620b-4bc1-b81e-08f4d405569c
⛔ Files ignored due to path filters (1)
services/platform/convex/_generated/api.d.tsis excluded by!**/_generated/**
📒 Files selected for processing (20)
examples/workflows/contract-generation/config.jsonservices/platform/app/features/custom-agents/components/custom-agent-create-dialog.tsxservices/platform/convex/agent_tools/files/docx_tool.tsservices/platform/convex/agent_tools/files/excel_tool.tsservices/platform/convex/agent_tools/files/helpers/analyze_text.tsservices/platform/convex/agent_tools/files/helpers/parse_file.tsservices/platform/convex/agent_tools/files/helpers/resolve_file_name.tsservices/platform/convex/agent_tools/files/pdf_tool.tsservices/platform/convex/agent_tools/files/pptx_tool.tsservices/platform/convex/agent_tools/files/txt_tool.tsservices/platform/convex/agent_tools/rag/rag_search_tool.test.tsservices/platform/convex/agent_tools/rag/rag_search_tool.tsservices/platform/convex/custom_agents/mutations.tsservices/platform/convex/lib/attachments/process_attachments.tsservices/platform/convex/workflow_engine/action_defs/rag/helpers/types.tsservices/platform/convex/workflow_engine/action_defs/rag/rag_action.tsservices/platform/convex/workflow_engine/helpers/validation/variables/action_schemas.tsservices/platform/convex/workflow_engine/workflow_syntax_compact.tsservices/platform/lib/shared/schemas/custom_agents.tsservices/platform/messages/en.json
…derscores in agent names Make filename optional across all file tools (PDF, DOCX, PPTX, Excel, TXT) by auto-resolving from fileMetadata when not provided. This removes a common source of errors when the LLM omits the filename parameter during file parsing. Also allow underscores in custom agent names, include fileId in parsed document context for better tool reference, switch text analysis agent to direct openai provider, and add contract generation workflow example.
… validation pipeline Add PDF tool support for template and base contract parsing, RAG document indexing for template retrieval, a definition registry and numbering plan step for structural consistency, and a validate-and-repair step to catch duplicate definitions, broken cross-references, and style inconsistencies. Also improve clause merger with deduplication, deal-type awareness, and definition consolidation.
Extract resolveFileIds helper with priority: explicit fileIds first, then fallback to userId + organizationId resolution. Update contract generation workflow to pass templateFileIds to the drafter agent.
… workflow syntax docs Split the monolithic validate_and_repair step into three dedicated repair steps (definitions, cross-references, consistency) for better accuracy. Updated workflow syntax reference with JEXL transforms, hello world example, output step docs, and corrected array length syntax to use |length.
…G operation Replace the simple base contract info extraction with a chunk-by-chunk modification pipeline that preserves original language, structure, and unchanged sections. Add get_chunks RAG operation to retrieve document chunks with pagination support, enabling the workflow to process each chunk individually against template clauses and user requirements.
Strengthen the chunk processor system prompt to require mandatory rag_search before any modification and prohibit fabricating clause language not found in templates.
a077382 to
0432eea
Compare
…ract modification (#789)
Summary
rag_searchtool now accepts explicitfileIdsto scope searches to specific documents instead of searching the entire knowledge base.get_chunksRAG operation with pagination support.Test plan
rag_searchwith explicitfileIdsget_chunksRAG operation paginates correctly for large documents🤖 Generated with Claude Code
Summary by CodeRabbit
New Features
Improvements
Documentation
Tests