feat(platform): document list operation, unified fileId, and output mapping#833
Conversation
… and userId tracking Add a 'list' operation to the document action for listing files in folders with optional extension filtering. Widen output node mapping values from string-only to arbitrary JSON values, enabling richer workflow outputs. Track userId on workflow executions for audit context. Remove unused trigger_workflow_by_id helper.
… tools and workflows Replace the dual id/documentId + fileId pattern with a single fileId identifier for all agent-facing APIs. This eliminates confusion between Convex document IDs and storage file IDs by exposing only fileId to agents and workflows. - Add find_document_by_file_id query with by_organizationId_and_fileId index - Update document_retrieve tool to accept fileId instead of documentId - Remove id field from AgentDocumentItem in document_find results - Update document action (workflow engine) to resolve fileId → document internally - Remove ragInfo.status filter from getAccessibleDocumentIds - Update docx/pptx template listing to use fileId instead of documentId/storageId
There was a problem hiding this comment.
Your free trial has ended. If you'd like to continue receiving code reviews, you can add a payment method here.
📝 WalkthroughWalkthroughThis PR refactors document identification throughout the agent tooling and workflow systems. It transitions from Estimated code review effort🎯 4 (Complex) | ⏱️ ~45 minutes Possibly related PRs
🚥 Pre-merge checks | ✅ 2 | ❌ 1❌ Failed checks (1 warning)
✅ Passed checks (2 passed)
✏️ Tip: You can configure your own custom pre-merge checks in the settings. ✨ Finishing Touches📝 Generate docstrings
🧪 Generate unit tests (beta)
Comment Tip CodeRabbit can scan for known vulnerabilities in your dependencies using OSV Scanner.OSV Scanner will automatically detect and report security vulnerabilities in your project's dependencies. No additional configuration is required. |
There was a problem hiding this comment.
Actionable comments posted: 4
Caution
Some comments are outside the diff and can’t be posted inline due to platform limitations.
⚠️ Outside diff range comments (1)
services/platform/convex/workflow_engine/helpers/step_execution/initialize_execution_variables.ts (1)
30-57:⚠️ Potential issue | 🔴 CriticalProtect reserved execution context (
userId, org/workflow IDs) from overwrite
userIdis injected at Line 34/44/56, but later the merge pass copies arbitrary keys fromexecution.variablesback intofullVariables, which can overwrite system context. This allows workflow state to spoofuserIdand weaken access-control/audit guarantees.🔧 Proposed fix
@@ - for (const [key, value] of Object.entries(executionVars)) { - if (key !== 'steps' && key !== 'loop') { - fullVariables[key] = value; - } - } + const reservedKeys = new Set([ + 'steps', + 'loop', + 'organizationId', + 'userId', + 'wfDefinitionId', + 'rootWfDefinitionId', + ]); + for (const [key, value] of Object.entries(executionVars)) { + if (!reservedKeys.has(key)) { + fullVariables[key] = value; + } + }🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@services/platform/convex/workflow_engine/helpers/step_execution/initialize_execution_variables.ts` around lines 30 - 57, The merge logic for building fullVariables allows execution.variables to overwrite reserved keys (userId, organizationId, wfDefinitionId, rootWfDefinitionId), so ensure these reserved context keys are protected: when merging execution.variables into fullVariables (and when using inputRecord/config), filter out any keys matching the reserved names (userId, organizationId, wfDefinitionId, rootWfDefinitionId) before spreading/assigning; keep the explicit assignments of organizationId, userId, wfDefinitionId, rootWfDefinitionId last so they always win (refer to fullVariables, execution.variables, userId, organizationId, wfDefinitionId, rootWfDefinitionId, and the input/config initialization paths).
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.
Inline comments:
In
`@services/platform/convex/documents/__tests__/find_document_by_file_id.test.ts`:
- Around line 47-64: The test only asserts withIndex was called for
findDocumentByFileId but never invokes the index predicate, so update the test
to execute the callback passed to mockWithIndex and verify it builds the
expected equality predicates: capture the function argument given to
mockWithIndex (the predicate) and call it with a mock index builder that records
.eq calls, then assert that .eq was called with ('organizationId', 'org1') and
('fileId', 'file-abc') and that mockFirst was invoked; refer to
findDocumentByFileId, mockWithIndex, mockFirst and the ctx.query.withIndex
callback when making this change.
In `@services/platform/convex/documents/internal_queries.ts`:
- Around line 55-63: The internalQuery findDocumentByFileId currently validates
fileId as v.string(), which bypasses Convex ID-shape checks; change the args
schema to use v.id('_storage') for fileId (matching the public createDocument
mutation) so the file ID is validated at the boundary—update the args definition
in findDocumentByFileId to replace v.string() with v.id('_storage') and keep
organizationId as v.string().
In
`@services/platform/convex/workflow_engine/action_defs/document/document_action.ts`:
- Around line 456-459: The response currently sets totalCount:
allDocuments.length which reflects the capped result set (MAX_TOTAL) rather than
the true matching count; update the return shape in the function that builds the
response (referencing totalCount, allDocuments and MAX_TOTAL) to either rename
totalCount to returnedCount and add a boolean truncated flag (e.g., truncated =
allDocuments.length === MAX_TOTAL) or include both returnedCount and totalCount
with truncated: true when results were capped, and ensure callers/code that
consume this response are updated to use the new field name(s).
In
`@services/platform/convex/workflow_engine/helpers/nodes/llm/execute_agent_with_tools.ts`:
- Around line 226-231: The call to agent.generateObject incorrectly passes
userId; remove userId from the second argument and pass only { threadId } so the
SDK signature is satisfied (ensure thread creation already set userId via
createThread(ctx, components.agent, { userId }) if needed). Update the
generateObject invocation in execute_agent_with_tools.ts (the
agent.generateObject call) to use { threadId } and keep the existing
prompt/schema and contextOptions arguments unchanged.
---
Outside diff comments:
In
`@services/platform/convex/workflow_engine/helpers/step_execution/initialize_execution_variables.ts`:
- Around line 30-57: The merge logic for building fullVariables allows
execution.variables to overwrite reserved keys (userId, organizationId,
wfDefinitionId, rootWfDefinitionId), so ensure these reserved context keys are
protected: when merging execution.variables into fullVariables (and when using
inputRecord/config), filter out any keys matching the reserved names (userId,
organizationId, wfDefinitionId, rootWfDefinitionId) before spreading/assigning;
keep the explicit assignments of organizationId, userId, wfDefinitionId,
rootWfDefinitionId last so they always win (refer to fullVariables,
execution.variables, userId, organizationId, wfDefinitionId, rootWfDefinitionId,
and the input/config initialization paths).
🪄 Autofix (Beta)
Fix all unresolved CodeRabbit comments on this PR:
- Push a commit to this branch (recommended)
- Create a new PR with the fixes
ℹ️ Review info
⚙️ Run configuration
Configuration used: Organization UI
Review profile: ASSERTIVE
Plan: Pro
Run ID: 191d920e-1add-44d4-8a39-92607fb23386
⛔ Files ignored due to path filters (1)
services/platform/convex/_generated/api.d.tsis excluded by!**/_generated/**
📒 Files selected for processing (38)
examples/workflows/folder-document-analysis/config.jsonservices/platform/convex/agent_tools/documents/__tests__/document_find_tool.test.tsservices/platform/convex/agent_tools/documents/__tests__/document_retrieve_tool.test.tsservices/platform/convex/agent_tools/documents/document_find_tool.tsservices/platform/convex/agent_tools/documents/document_retrieve_tool.tsservices/platform/convex/agent_tools/documents/helpers/fetch_document_comparison.tsservices/platform/convex/agent_tools/documents/helpers/retrieve_document.tsservices/platform/convex/agent_tools/files/docx_tool.tsservices/platform/convex/agent_tools/files/pptx_tool.tsservices/platform/convex/agent_tools/workflows/helpers/syntax_reference.tsservices/platform/convex/agent_tools/workflows/internal_actions.tsservices/platform/convex/documents/__tests__/find_document_by_file_id.test.tsservices/platform/convex/documents/__tests__/get_accessible_document_ids.test.tsservices/platform/convex/documents/__tests__/list_documents_for_agent.test.tsservices/platform/convex/documents/find_document_by_file_id.tsservices/platform/convex/documents/get_accessible_document_ids.tsservices/platform/convex/documents/helpers.tsservices/platform/convex/documents/internal_queries.tsservices/platform/convex/documents/list_documents_for_agent.tsservices/platform/convex/documents/schema.tsservices/platform/convex/wf_executions/internal_mutations.tsservices/platform/convex/wf_executions/mutations.tsservices/platform/convex/workflow_engine/action_defs/document/document_action.tsservices/platform/convex/workflow_engine/helpers/engine/start_workflow_handler.tsservices/platform/convex/workflow_engine/helpers/nodes/llm/execute_agent_with_tools.tsservices/platform/convex/workflow_engine/helpers/nodes/llm/execute_llm_node.tsservices/platform/convex/workflow_engine/helpers/scheduler/index.tsservices/platform/convex/workflow_engine/helpers/scheduler/trigger_workflow_by_id.tsservices/platform/convex/workflow_engine/helpers/step_execution/initialize_execution_variables.tsservices/platform/convex/workflow_engine/helpers/step_execution/types.tsservices/platform/convex/workflow_engine/helpers/validation/steps/output.test.tsservices/platform/convex/workflow_engine/helpers/validation/steps/output.tsservices/platform/convex/workflow_engine/helpers/validation/variables/parse.tsservices/platform/convex/workflow_engine/internal_actions.tsservices/platform/convex/workflow_engine/types/nodes.tsservices/platform/convex/workflows/schema.tsservices/platform/convex/workflows/steps/validators.tsservices/platform/convex/workflows/validators.ts
💤 Files with no reviewable changes (5)
- services/platform/convex/agent_tools/documents/tests/document_find_tool.test.ts
- services/platform/convex/workflow_engine/helpers/scheduler/index.ts
- services/platform/convex/documents/list_documents_for_agent.ts
- services/platform/convex/workflow_engine/internal_actions.ts
- services/platform/convex/workflow_engine/helpers/scheduler/trigger_workflow_by_id.ts
| it('uses the correct index', async () => { | ||
| const mockFirst = vi.fn().mockResolvedValue(null); | ||
| const mockWithIndex = vi.fn().mockReturnValue({ first: mockFirst }); | ||
| const ctx = { | ||
| db: { | ||
| query: vi.fn().mockReturnValue({ withIndex: mockWithIndex }), | ||
| }, | ||
| }; | ||
|
|
||
| await findDocumentByFileId(ctx as never, { | ||
| organizationId: 'org1', | ||
| fileId: 'file-abc', | ||
| }); | ||
|
|
||
| expect(mockWithIndex).toHaveBeenCalledWith( | ||
| 'by_organizationId_and_fileId', | ||
| expect.any(Function), | ||
| ); |
There was a problem hiding this comment.
Assert index predicate behavior, not just withIndex invocation
The current test verifies index name only. It never executes the callback, so regressions in .eq('organizationId', ...) / .eq('fileId', ...) won’t fail tests.
✅ Stronger assertion pattern
it('uses the correct index', async () => {
@@
expect(mockWithIndex).toHaveBeenCalledWith(
'by_organizationId_and_fileId',
expect.any(Function),
);
+
+ const indexCb = mockWithIndex.mock.calls[0]?.[1] as
+ | ((q: { eq: (field: string, value: unknown) => unknown }) => void)
+ | undefined;
+ expect(indexCb).toBeTypeOf('function');
+
+ const eq = vi.fn().mockReturnThis();
+ indexCb?.({ eq });
+ expect(eq).toHaveBeenNthCalledWith(1, 'organizationId', 'org1');
+ expect(eq).toHaveBeenNthCalledWith(2, 'fileId', 'file-abc');
});📝 Committable suggestion
‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.
| it('uses the correct index', async () => { | |
| const mockFirst = vi.fn().mockResolvedValue(null); | |
| const mockWithIndex = vi.fn().mockReturnValue({ first: mockFirst }); | |
| const ctx = { | |
| db: { | |
| query: vi.fn().mockReturnValue({ withIndex: mockWithIndex }), | |
| }, | |
| }; | |
| await findDocumentByFileId(ctx as never, { | |
| organizationId: 'org1', | |
| fileId: 'file-abc', | |
| }); | |
| expect(mockWithIndex).toHaveBeenCalledWith( | |
| 'by_organizationId_and_fileId', | |
| expect.any(Function), | |
| ); | |
| it('uses the correct index', async () => { | |
| const mockFirst = vi.fn().mockResolvedValue(null); | |
| const mockWithIndex = vi.fn().mockReturnValue({ first: mockFirst }); | |
| const ctx = { | |
| db: { | |
| query: vi.fn().mockReturnValue({ withIndex: mockWithIndex }), | |
| }, | |
| }; | |
| await findDocumentByFileId(ctx as never, { | |
| organizationId: 'org1', | |
| fileId: 'file-abc', | |
| }); | |
| expect(mockWithIndex).toHaveBeenCalledWith( | |
| 'by_organizationId_and_fileId', | |
| expect.any(Function), | |
| ); | |
| const indexCb = mockWithIndex.mock.calls[0]?.[1] as | |
| | ((q: { eq: (field: string, value: unknown) => unknown }) => void) | |
| | undefined; | |
| expect(indexCb).toBeTypeOf('function'); | |
| const eq = vi.fn().mockReturnThis(); | |
| indexCb?.({ eq }); | |
| expect(eq).toHaveBeenNthCalledWith(1, 'organizationId', 'org1'); | |
| expect(eq).toHaveBeenNthCalledWith(2, 'fileId', 'file-abc'); | |
| }); |
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.
In
`@services/platform/convex/documents/__tests__/find_document_by_file_id.test.ts`
around lines 47 - 64, The test only asserts withIndex was called for
findDocumentByFileId but never invokes the index predicate, so update the test
to execute the callback passed to mockWithIndex and verify it builds the
expected equality predicates: capture the function argument given to
mockWithIndex (the predicate) and call it with a mock index builder that records
.eq calls, then assert that .eq was called with ('organizationId', 'org1') and
('fileId', 'file-abc') and that mockFirst was invoked; refer to
findDocumentByFileId, mockWithIndex, mockFirst and the ctx.query.withIndex
callback when making this change.
| export const findDocumentByFileId = internalQuery({ | ||
| args: { | ||
| organizationId: v.string(), | ||
| fileId: v.string(), | ||
| }, | ||
| handler: async (ctx, args) => { | ||
| return await DocumentsHelpers.findDocumentByFileId(ctx, args); | ||
| }, | ||
| }); |
There was a problem hiding this comment.
🧩 Analysis chain
🏁 Script executed:
#!/bin/bash
set -euo pipefail
# Verify current validator usage for fileId across Convex document endpoints
rg -nP --type=ts -C2 "fileId:\s*v\.(string|id\('_storage'\))" services/platform/convex/documents
# Inspect the new internal query boundary
rg -nP --type=ts -C3 "export const findDocumentByFileId = internalQuery" services/platform/convex/documents/internal_queries.ts
# Inspect downstream helper expectation/casting
rg -nP --type=ts -C4 "findDocumentByFileId\(|toId<'_storage'>" services/platform/convex/documents/find_document_by_file_id.tsRepository: tale-project/tale
Length of output: 1488
Use v.id('_storage') for fileId to match the public mutation and validate at the boundary.
Line 58 currently accepts any string, bypassing Convex ID-shape validation. The public createDocument mutation in the same file (line 92) already validates fileId: v.id('_storage'), and this internal query should follow the same pattern for consistency and type safety.
🔧 Proposed fix
export const findDocumentByFileId = internalQuery({
args: {
organizationId: v.string(),
- fileId: v.string(),
+ fileId: v.id('_storage'),
},
handler: async (ctx, args) => {
- return await DocumentsHelpers.findDocumentByFileId(ctx, args);
+ return await DocumentsHelpers.findDocumentByFileId(ctx, {
+ ...args,
+ fileId: String(args.fileId),
+ });
},
});🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.
In `@services/platform/convex/documents/internal_queries.ts` around lines 55 - 63,
The internalQuery findDocumentByFileId currently validates fileId as v.string(),
which bypasses Convex ID-shape checks; change the args schema to use
v.id('_storage') for fileId (matching the public createDocument mutation) so the
file ID is validated at the boundary—update the args definition in
findDocumentByFileId to replace v.string() with v.id('_storage') and keep
organizationId as v.string().
| return { | ||
| documents: allDocuments, | ||
| totalCount: allDocuments.length, | ||
| }; |
There was a problem hiding this comment.
🧹 Nitpick | 🔵 Trivial
totalCount naming may be misleading when results are truncated.
The totalCount field returns allDocuments.length, which reflects the number of documents returned (capped at MAX_TOTAL = 500), not the total matching documents in the database. If the actual total exceeds 500, consumers might misinterpret this as the complete count.
Consider renaming to returnedCount or adding a truncated: boolean flag:
💡 Suggested improvement
return {
documents: allDocuments,
- totalCount: allDocuments.length,
+ returnedCount: allDocuments.length,
+ truncated: allDocuments.length >= MAX_TOTAL,
};🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.
In
`@services/platform/convex/workflow_engine/action_defs/document/document_action.ts`
around lines 456 - 459, The response currently sets totalCount:
allDocuments.length which reflects the capped result set (MAX_TOTAL) rather than
the true matching count; update the return shape in the function that builds the
response (referencing totalCount, allDocuments and MAX_TOTAL) to either rename
totalCount to returnedCount and add a boolean truncated flag (e.g., truncated =
allDocuments.length === MAX_TOTAL) or include both returnedCount and totalCount
with truncated: true when results were capped, and ensure callers/code that
consume this response are updated to use the new field name(s).
| const result = await agent.generateObject( | ||
| ctx, | ||
| { threadId }, | ||
| { threadId, userId }, | ||
| { prompt: prompts.userPrompt, schema: zodSchema }, | ||
| { contextOptions: { excludeToolMessages: false } }, | ||
| ); |
There was a problem hiding this comment.
🧩 Analysis chain
🌐 Web query:
@convex-dev/agent generateObject generateText userId parameter
💡 Result:
In @convex-dev/agent, generateText / generateObject don’t take a userId parameter directly. Instead, userId is attached to the thread (and/or individual saved messages), and message generations inherit it from the thread by default. [1]
Where to set userId
1) When creating the thread (recommended)
import { createThread } from "@convex-dev/agent";
const threadId = await createThread(ctx, components.agent, { userId });A thread created with userId is associated with that user, and generated messages are saved to that user’s history. [1]
2) When saving messages explicitly
If you call saveMessage / agent.saveMessage, you pass userId there:
const { messageId } = await agent.saveMessage(ctx, { threadId, userId, prompt });[2]
How generateText / generateObject are called
Using the agent directly:
const result = await agent.generateText(ctx, { threadId }, { prompt });
const obj = await agent.generateObject(ctx, { threadId }, { prompt, schema });[1][3]
Or via the per-thread helper object (no userId argument here either):
const { thread } = await agent.continueThread(ctx, { threadId });
await thread.generateText({ prompt });
await thread.generateObject({ prompt, schema });[1][3]
Note: you’ll also see userId show up in hooks like usageHandler / rawResponseHandler because it’s derived from the thread/message association, not passed to generateText as an argument. [3]
Sources: [1] [2] [3]
Remove userId from the generateObject call.
The @convex-dev/agent SDK does not accept userId as a parameter to generateObject. Instead, userId should be set when creating the thread via createThread(ctx, components.agent, { userId }). If the thread is already created with a userId, pass only { threadId } to generateObject:
const result = await agent.generateObject(
ctx,
{ threadId },
{ prompt: prompts.userPrompt, schema: zodSchema },
{ contextOptions: { excludeToolMessages: false } },
);🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.
In
`@services/platform/convex/workflow_engine/helpers/nodes/llm/execute_agent_with_tools.ts`
around lines 226 - 231, The call to agent.generateObject incorrectly passes
userId; remove userId from the second argument and pass only { threadId } so the
SDK signature is satisfied (ensure thread creation already set userId via
createThread(ctx, components.agent, { userId }) if needed). Update the
generateObject invocation in execute_agent_with_tools.ts (the
agent.generateObject call) to use { threadId } and keep the existing
prompt/schema and contextOptions arguments unchanged.
Summary
id/documentId+fileIdpattern with a singlefileIdacross all agent-facing APIs (document_find, document_retrieve, docx/pptx templates, workflow document actions). Eliminates agent confusion between Convex document IDs and storage file IDs.listoperation to the workflow document action, enabling workflows to list documents by folder, extension, and other filters. Includes a newfolder-document-analysisexample workflow demonstrating folder-wide requirement scanning.userIdthrough workflow execution variables for access control in document operations.find_document_by_file_idquery: Add indexed lookup (by_organizationId_and_fileId) for resolving fileId → document records, used by document_retrieve and the workflow update operation.getAccessibleDocumentIdsnow returns all documents regardless of indexing status, so agents can discover documents before RAG processing completes.trigger_workflow_by_idscheduler helper.Test plan
find_document_by_file_idby_organizationId_and_fileIdindexSummary by CodeRabbit
Release Notes
New Features
Bug Fixes
Changes
fileIdfor identification instead ofdocumentId.fileIdreference.