feat: auto-resolve filenames, scoped RAG search, and chunk-based contract modification by larryro · Pull Request #789 · tale-project/tale

larryro · 2026-03-14T07:33:14Z

Summary

Auto-resolve filenames in file tools: File tools (docx, pdf, excel, pptx, txt) now automatically resolve filenames from attachment metadata, removing the need for agents to specify filenames manually.
Scoped RAG search: rag_search tool now accepts explicit fileIds to scope searches to specific documents instead of searching the entire knowledge base.
Chunk-based base contract modification: Contract generation workflow now processes base contracts chunk-by-chunk, preserving original language, structure, and unchanged sections verbatim. Added get_chunks RAG operation with pagination support.
Focused repair pipeline: Split monolithic contract validation into dedicated repair steps (definitions, cross-references, consistency) with language preservation rules.
PDF support & RAG indexing: Contract templates are now indexed in RAG for retrieval during drafting, with support for both DOCX and PDF templates.
Allow underscores in agent names: Custom agent names now permit underscores.

Test plan

Added unit tests for scoped rag_search with explicit fileIds
Verify file tools auto-resolve filenames from metadata in conversations with attachments
Test contract generation workflow with base contract modification (chunk-by-chunk path)
Test contract generation workflow without base contract (generate path)
Verify get_chunks RAG operation paginates correctly for large documents

🤖 Generated with Claude Code

Summary by CodeRabbit

New Features
- Added Contract Generation workflow for automated contract drafting with clause extraction, assembly, and multi-stage quality assurance.
- RAG search now supports explicit file ID specification for more precise document retrieval.
- File parsing now auto-resolves filenames from metadata when not explicitly provided.
Improvements
- Agent names now support underscores in addition to hyphens and alphanumerics.
Documentation
- Updated workflow syntax documentation with enhanced JEXL expression examples and transforms reference.
Tests
- Added comprehensive test coverage for RAG file ID resolution functionality.

greptile-apps

Your free trial has ended. If you'd like to continue receiving code reviews, you can add a payment method here.

coderabbitai · 2026-03-14T07:46:38Z

📝 Walkthrough

Walkthrough

This PR introduces a comprehensive contract generation workflow configuration and enhances file parsing across multiple tools. Key changes include: adding optional filename resolution via a new resolveFileName helper that queries file metadata when filenames are not provided; updating five file parsing tools (docx, excel, pdf, pptx, txt) to accept optional filenames and resolve them automatically; extending the RAG search tool to accept explicit fileIds and adding a new get_chunks operation for retrieving document chunks with pagination; updating the ParsedDocument interface to include fileId; and broadening agent name validation to allow underscores alongside hyphens. The contract generation workflow config defines a multi-stage pipeline for contract drafting with clause extraction, template indexing, base contract processing, and iterative repair steps for definitions, cross-references, and consistency.

Estimated code review effort

🎯 3 (Moderate) | ⏱️ ~28 minutes

Possibly related PRs

feat(platform): add file storage and attachment support for integration sandbox #462: Introduces file-attachment support and fileReferences propagation in integration execution results, related through shared attachment handling patterns.
improve coding agent #19: Updates file parsing and attachment flows including parseFile signature and ParsedDocument structure, directly connected to filename resolution and fileId propagation changes.

🚥 Pre-merge checks | ✅ 2 | ❌ 1

❌ Failed checks (1 warning)

Check name	Status	Explanation	Resolution
Docstring Coverage	⚠️ Warning	Docstring coverage is 25.00% which is insufficient. The required threshold is 80.00%.	Write docstrings for the functions missing them to satisfy the coverage threshold.

✅ Passed checks (2 passed)

Check name	Status	Explanation
Description Check	✅ Passed	Check skipped - CodeRabbit’s high-level summary is enabled.
Title check	✅ Passed	The title accurately and specifically summarizes the three main changes in the pull request: auto-resolving filenames in file tools, adding scoped RAG search with explicit fileIds, and implementing chunk-based contract modification with dedicated repair steps.

_{✏️ Tip: You can configure your own custom pre-merge checks in the settings.}

✨ Finishing Touches

📝 Generate docstrings (stacked PR)
📝 Generate docstrings (commit on current branch)

🧪 Generate unit tests (beta)

Create PR with unit tests
Post copyable unit tests in a comment
Commit unit tests in branch feat/file-tools-auto-resolve-filename

📝 Coding Plan

Generate coding plan for human review comments

_{Comment @coderabbitai help to get the list of available commands and usage tips.}

Tip

You can enable review details to help with troubleshooting, context usage and more.

Enable the reviews.review_details setting to include review details such as the model used, the time taken for each step and more in the review comments.

coderabbitai

Actionable comments posted: 11

Caution

Some comments are outside the diff and can’t be posted inline due to platform limitations.

⚠️ Outside diff range comments (2)

services/platform/convex/custom_agents/mutations.ts (1)

299-299: 🧹 Nitpick | 🔵 Trivial

Consider adding name validation to updateCustomAgent.

The createCustomAgent mutation validates the name format, but updateCustomAgent accepts an optional name field without applying the same regex validation. If agent renaming is allowed, a user could potentially bypass the naming rules by updating an existing agent.

🔧 Proposed fix to add validation

   handler: async (ctx, args): Promise<null> => {
     const authUser = await authComponent.getAuthUser(ctx);
     if (!authUser) throw new Error('Unauthenticated');

+    if (args.name !== undefined && !/^[a-z0-9][a-z0-9_-]*$/.test(args.name)) {
+      throw new Error(
+        'Agent name must start with a letter or number and contain only lowercase letters, numbers, hyphens, and underscores',
+      );
+    }
+
     const draft = await getDraftByRoot(ctx, args.customAgentId);

🤖 Prompt for AI Agents

Verify each finding against the current code and only fix it if needed.

In `@services/platform/convex/custom_agents/mutations.ts` at line 299, The
updateCustomAgent mutation currently uses name: v.optional(v.string()) without
the regex checks used in createCustomAgent, allowing invalid renames; change the
optional name validator in updateCustomAgent to the same validated form used in
createCustomAgent (e.g., v.optional(v.string().regex(/* same pattern */) or the
shared helper used for agent name validation) so any provided name is validated
before persisting, and reuse the same error/message behavior as
createCustomAgent.

services/platform/convex/lib/attachments/process_attachments.ts (1)

288-290: 🧹 Nitpick | 🔵 Trivial

Use fileId instead of fileName to identify failed documents.

Now that parsedDocuments includes fileId, matching by fileName can produce incorrect results if multiple attachments share the same filename. One document parsing successfully would incorrectly mark another same-named attachment as processed.

♻️ Proposed fix

  const failedDocuments = documentAttachments.filter(
-   (a) => !parsedDocuments.some((d) => d.fileName === a.fileName),
+   (a) => !parsedDocuments.some((d) => d.fileId === a.fileId),
  );

🤖 Prompt for AI Agents

Verify each finding against the current code and only fix it if needed.

In `@services/platform/convex/lib/attachments/process_attachments.ts` around lines
288 - 290, failedDocuments is currently computed by comparing
documentAttachments to parsedDocuments using fileName which can misidentify
processed items when filenames repeat; update the filter to compare unique
fileId instead (use documentAttachments.filter(a => !parsedDocuments.some(d =>
d.fileId === a.fileId))) so failedDocuments is determined by fileId, and ensure
both parsedDocuments and documentAttachments include fileId references (check
functions that populate parsedDocuments and the attachment shape used in this
match).

🤖 Prompt for all review comments with AI agents

Verify each finding against the current code and only fix it if needed.

Inline comments:
In `@examples/workflows/contract-generation/config.json`:
- Around line 289-290: The repair_consistency step is missing deal-type
classification because repairContext only carries
steps.build_definition_registry.output.data; modify the generation pipeline so
repairContext includes the merge_clauses output as well (e.g., merge or embed
steps.merge_clauses.output.data into the repairContext object alongside
steps.build_definition_registry.output.data) so repair_consistency can read the
deal-type and correctly remove stock/asset-only provisions.
- Around line 311-320: The workflow routes directly from stepSlug
"repair_references" to "repair_consistency" but later steps allow
removing/renumbering sections, which can leave stale cross-references; update
the flow so renumbering always happens before the final cross-reference pass:
either move "repair_references" to run after the renumbering/remove-sections
step or add a follow-up transition from the renumbering step back to
"repair_references" (or make "repair_consistency" trigger a final
"repair_references" run on success). Ensure the unique stepSlug
"repair_references" is the last cross-reference fixer in the chain so all
renumbering changes are validated and corrected before output.
- Around line 182-183: The config hardcodes a leading "\n\n" when concatenating
chunk output which breaks verbatim preservation; update the logic that builds
model input/output so "systemPrompt" / "userPrompt" do not unconditionally
prepend "\n\n" to returned chunks—either remove that hardcoded prefix entirely
or make it conditional (only add separators when the chunk was modified or when
it's safe to join) and ensure the pipeline that emits the final chunk uses the
chunk's exact original text when the decision in the systemPrompt is "output the
chunk text EXACTLY as-is" so tables, numbering, and spacing remain byte-for-byte
identical.

In `@services/platform/convex/agent_tools/files/excel_tool.ts`:
- Around line 138-142: The call to resolveFileName (using ctx, args.fileId,
args.filename) can throw before the Excel parse try/catch, causing uncaught
errors; move the resolveFileName invocation into the same try block that wraps
the Excel parsing logic (the try surrounding the parse and return of { success:
false, ... }) so any errors from resolveFileName are caught and produce the
structured failure payload — update the code that references resolvedFilename
accordingly inside that try and remove the pre-try assignment.

In `@services/platform/convex/agent_tools/files/helpers/parse_file.ts`:
- Line 50: The call to resolveFileName(ctx, fileId, filename) can throw before
the try in parseFile, causing unhandled failures; move that call inside the
existing try block in parseFile (so resolution is covered by the structured
error handling) and ensure any errors from resolveFileName result in returning
the same { success: false, ... } response path used elsewhere in parseFile
rather than allowing an exception to escape.

In `@services/platform/convex/agent_tools/files/txt_tool.ts`:
- Line 194: The call to resolveFileName(ctx, fileId, filename) runs outside the
parsing try/catch so failures bypass the tool's structured parse error handling;
move or duplicate that call inside the existing try block that parses the tool
input (or wrap it in its own try/catch) so any errors from resolveFileName are
caught and translated into the tool’s parse error response; update references to
resolvedFilename used later in the function to ensure they come from the guarded
resolveFileName call inside the try/catch surrounding the parsing logic.

In `@services/platform/convex/workflow_engine/action_defs/rag/rag_action.ts`:
- Around line 152-180: The pagination loop in get_chunks (inside the while in
rag_action.ts using chunkStart/chunkEnd and MAX_CHUNK_WINDOW) performs unbounded
fetches; add the same per-request AbortController timeout logic used by the
search branch: create an AbortController at the top of each iteration, set a
timeout to call controller.abort() after the configured timeout, pass
controller.signal to fetch(url, { signal }), and clear the timeout after the
response is received (or in finally) so hung requests can't stall the workflow;
ensure fetchJson<DocumentContentResponse>(response) is still called with the
successful response and handle abort errors appropriately.

In
`@services/platform/convex/workflow_engine/helpers/validation/variables/action_schemas.ts`:
- Around line 410-425: Update the ragSchemas.get_chunks result to match
RagChunkResult: change the title field to allow string or null (not optional
string) and replace chunks: { type: 'any' } with a typed array of chunk objects
(each having index:number and content:string) so validation for
steps.get_base_chunks.output.data.chunks and loop.item.content is accurate; keep
other fields (documentId, totalChunks, executionTimeMs) unchanged.

In `@services/platform/convex/workflow_engine/workflow_syntax_compact.ts`:
- Around line 13-24: The example's output step configuration is inconsistent:
the 'result' step (stepSlug: 'result') is shown as terminal with nextSteps: {}
but elsewhere is documented as having a success port ({ success: 'next_step' });
pick one behavior and make the example and docs match. To fix, update the
stepsConfig entry for 'result' (and any duplicate example blocks) so its
nextSteps matches the documented behavior — either change nextSteps: {} to
nextSteps: { success: 'next_step' } if output steps should proceed, or change
the documentation/examples that show a success port to use nextSteps: {} if
output is terminal — ensure all occurrences of 'result' (and the example at the
top and the block around the alternate docs) are aligned.
- Around line 13-24: The Hello World example is missing the workflowType
property in the declared workflow shape; update the example object so
workflowConfig includes a workflowType entry (alongside name/description) to
match the canonical shape used elsewhere—ensure the top-level object still has
workflowConfig and stepsConfig and that workflowType is set to the appropriate
enum/string used by the system.
- Around line 196-217: The documentation for the pipe "sort" transform is
ambiguous; update the "sort" entry in workflow_syntax_compact.ts to explicitly
show two overloads: sort(order) for arrays of primitives (e.g.,
items|sort('asc') or items|sort('desc')) and sort(field, order) for arrays of
objects (e.g., items|sort('price','desc') with default order 'asc' if omitted),
and mention the default behavior and return type; ensure the examples and the
short description reference the "sort" transform name exactly and normalize the
examples so they no longer conflict.

---

Outside diff comments:
In `@services/platform/convex/custom_agents/mutations.ts`:
- Line 299: The updateCustomAgent mutation currently uses name:
v.optional(v.string()) without the regex checks used in createCustomAgent,
allowing invalid renames; change the optional name validator in
updateCustomAgent to the same validated form used in createCustomAgent (e.g.,
v.optional(v.string().regex(/* same pattern */) or the shared helper used for
agent name validation) so any provided name is validated before persisting, and
reuse the same error/message behavior as createCustomAgent.

In `@services/platform/convex/lib/attachments/process_attachments.ts`:
- Around line 288-290: failedDocuments is currently computed by comparing
documentAttachments to parsedDocuments using fileName which can misidentify
processed items when filenames repeat; update the filter to compare unique
fileId instead (use documentAttachments.filter(a => !parsedDocuments.some(d =>
d.fileId === a.fileId))) so failedDocuments is determined by fileId, and ensure
both parsedDocuments and documentAttachments include fileId references (check
functions that populate parsedDocuments and the attachment shape used in this
match).

🪄 Autofix (Beta)

Fix all unresolved CodeRabbit comments on this PR:

Push a commit to this branch (recommended)
Create a new PR with the fixes

ℹ️ Review info

⚙️ Run configuration

Configuration used: Organization UI

Review profile: ASSERTIVE

Plan: Pro

Run ID: 74b44187-620b-4bc1-b81e-08f4d405569c

📥 Commits

Reviewing files that changed from the base of the PR and between 46090dc and b9bfe7c.

⛔ Files ignored due to path filters (1)

services/platform/convex/_generated/api.d.ts is excluded by !**/_generated/**

📒 Files selected for processing (20)

examples/workflows/contract-generation/config.json
services/platform/app/features/custom-agents/components/custom-agent-create-dialog.tsx
services/platform/convex/agent_tools/files/docx_tool.ts
services/platform/convex/agent_tools/files/excel_tool.ts
services/platform/convex/agent_tools/files/helpers/analyze_text.ts
services/platform/convex/agent_tools/files/helpers/parse_file.ts
services/platform/convex/agent_tools/files/helpers/resolve_file_name.ts
services/platform/convex/agent_tools/files/pdf_tool.ts
services/platform/convex/agent_tools/files/pptx_tool.ts
services/platform/convex/agent_tools/files/txt_tool.ts
services/platform/convex/agent_tools/rag/rag_search_tool.test.ts
services/platform/convex/agent_tools/rag/rag_search_tool.ts
services/platform/convex/custom_agents/mutations.ts
services/platform/convex/lib/attachments/process_attachments.ts
services/platform/convex/workflow_engine/action_defs/rag/helpers/types.ts
services/platform/convex/workflow_engine/action_defs/rag/rag_action.ts
services/platform/convex/workflow_engine/helpers/validation/variables/action_schemas.ts
services/platform/convex/workflow_engine/workflow_syntax_compact.ts
services/platform/lib/shared/schemas/custom_agents.ts
services/platform/messages/en.json

…derscores in agent names Make filename optional across all file tools (PDF, DOCX, PPTX, Excel, TXT) by auto-resolving from fileMetadata when not provided. This removes a common source of errors when the LLM omits the filename parameter during file parsing. Also allow underscores in custom agent names, include fileId in parsed document context for better tool reference, switch text analysis agent to direct openai provider, and add contract generation workflow example.

… validation pipeline Add PDF tool support for template and base contract parsing, RAG document indexing for template retrieval, a definition registry and numbering plan step for structural consistency, and a validate-and-repair step to catch duplicate definitions, broken cross-references, and style inconsistencies. Also improve clause merger with deduplication, deal-type awareness, and definition consolidation.

Extract resolveFileIds helper with priority: explicit fileIds first, then fallback to userId + organizationId resolution. Update contract generation workflow to pass templateFileIds to the drafter agent.

… workflow syntax docs Split the monolithic validate_and_repair step into three dedicated repair steps (definitions, cross-references, consistency) for better accuracy. Updated workflow syntax reference with JEXL transforms, hello world example, output step docs, and corrected array length syntax to use |length.

…G operation Replace the simple base contract info extraction with a chunk-by-chunk modification pipeline that preserves original language, structure, and unchanged sections. Add get_chunks RAG operation to retrieve document chunks with pagination support, enabling the workflow to process each chunk individually against template clauses and user requirements.

Strengthen the chunk processor system prompt to require mandatory rag_search before any modification and prohibit fabricating clause language not found in templates.

…ract modification (#789)

greptile-apps Bot reviewed Mar 14, 2026

View reviewed changes

coderabbitai Bot requested changes Mar 14, 2026

View reviewed changes

larryro added 6 commits March 14, 2026 15:52

feat: support explicit fileIds in rag_search tool for scoped searches

a506e86

Extract resolveFileIds helper with priority: explicit fileIds first, then fallback to userId + organizationId resolution. Update contract generation workflow to pass templateFileIds to the drafter agent.

feat: enforce template-sourced modifications in chunk processor prompt

0432eea

Strengthen the chunk processor system prompt to require mandatory rag_search before any modification and prohibit fabricating clause language not found in templates.

larryro force-pushed the feat/file-tools-auto-resolve-filename branch from a077382 to 0432eea Compare March 14, 2026 07:52

coderabbitai Bot approved these changes Mar 14, 2026

View reviewed changes

larryro merged commit 29e7aa2 into main Mar 14, 2026
16 checks passed

larryro deleted the feat/file-tools-auto-resolve-filename branch March 14, 2026 07:54

coderabbitai Bot mentioned this pull request Mar 21, 2026

feat(platform): workflow schema validation, knowledge scoping, and approval types #826

Merged

6 tasks

yannickmonney pushed a commit that referenced this pull request Apr 8, 2026

feat: auto-resolve filenames, scoped RAG search, and chunk-based cont…

b3e8b47

…ract modification (#789)

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

feat: auto-resolve filenames, scoped RAG search, and chunk-based contract modification#789

feat: auto-resolve filenames, scoped RAG search, and chunk-based contract modification#789
larryro merged 6 commits into
mainfrom
feat/file-tools-auto-resolve-filename

larryro commented Mar 14, 2026 •

edited by coderabbitai Bot

Loading

Uh oh!

greptile-apps Bot left a comment

Uh oh!

coderabbitai Bot commented Mar 14, 2026

Walkthrough

Estimated code review effort

Possibly related PRs

❌ Failed checks (1 warning)

Uh oh!

coderabbitai Bot left a comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Conversation

larryro commented Mar 14, 2026 • edited by coderabbitai Bot Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Summary

Test plan

Summary by CodeRabbit

Uh oh!

greptile-apps Bot left a comment

Choose a reason for hiding this comment

Uh oh!

coderabbitai Bot commented Mar 14, 2026

Walkthrough

Estimated code review effort

Possibly related PRs

❌ Failed checks (1 warning)

Uh oh!

coderabbitai Bot left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

larryro commented Mar 14, 2026 •

edited by coderabbitai Bot

Loading