Skip to content

fix(api): accept presigned s3Key for MCP uploads (attachment/document/evidence)#3042

Merged
tofikwest merged 4 commits into
mainfrom
fix/mcp-uploads-presigned-s3key
Jun 5, 2026
Merged

fix(api): accept presigned s3Key for MCP uploads (attachment/document/evidence)#3042
tofikwest merged 4 commits into
mainfrom
fix/mcp-uploads-presigned-s3key

Conversation

@tofikwest
Copy link
Copy Markdown
Contributor

@tofikwest tofikwest commented Jun 5, 2026

Problem (customer-reported — Nicole)

The Comp AI MCP server times out on uploads (documents / evidence / attachments). Root cause is the one your codebase already documented: these tools take the whole file as base64 inside the tool argument, so the LLM has to emit the entire file token-by-token before the call even runs — impractically slow, and it hits the MCP client's ~4-min timeout. Questionnaire and policy-PDF were already migrated to the presigned-upload pattern; the rest were not, so Nicole hit the un-migrated ones.

Fix

Migrate the three tools she hit to the presigned s3Key pattern (the one already in your repo):

  1. Agent calls create-upload-url (purpose=attachment|document|evidence) → gets { uploadUrl, s3Key }
  2. Agent PUTs the raw bytes straight to S3 (no LLM tokens)
  3. Agent calls the tool with the tiny s3Key instead of base64

Each service resolves the bytes from whichever of fileData / s3Key is provided via UploadsService.readUploadAsBase64, which enforces the key belongs to the caller's org. The MCP overlay strips fileData from these tools so agents must use s3Key; the base spec keeps fileData for the web UI / direct callers.

Changed

  • attachments (upload-task-attachment) — DTO + service + module
  • knowledge-base (upload-document) — DTO + service + module; also added the missing @ApiProperty decorators so this MCP tool finally has a real schema (it was emitting an empty body before)
  • offboarding-checklist (complete-checklist-item, upload-evidence) — DTO + service; delegates to the now-fixed AttachmentsService
  • UploadPurpose enum: added document
  • regenerated packages/docs/openapi.json (carries the new s3Key fields; the large diff also syncs some pre-existing drift that was stale in the committed spec — benign params/headers, no security changes)

Safety

  • MCP generator stays healthy: verified 0 operations declare >1 security scheme (the thing that gutted tools before).
  • typecheck clean on all changed source files.
  • AppModule boots with the new DI wiring (UploadsModule → Attachments/KnowledgeBase), no circular deps.

Tests

  • New attachments.service.spec.ts: s3Key resolution path + neither → 400.
  • Extended knowledge-base.service.spec.ts: s3Key path + neither → 400 (17/17).
  • Offboarding service specs still green (21/21).

Out of scope (separate issue)

create-automation → create-version also times out, but that's heavy synchronous AI script-generation, not base64 — it needs an async "kick off → poll" pattern. Tracking separately.

Note

Two specs fail only in a local worktree due to env (a DB-TLS guard tripping on a non-local Postgres, and an archiver mock) — both in files this PR does not touch. They pass in CI with a local test DB.

🤖 Generated with Claude Code


Summary by cubic

Switches MCP uploads for attachments, documents, and offboarding evidence to a presigned s3Key flow to stop timeouts and avoid base64 through the model, and enforces shared 100MB limits with pre-download checks to prevent OOMs and false rejects. DTOs and services now accept either fileData or s3Key, with org-scoped reads from storage.

  • Bug Fixes

    • Accept s3Key on upload-task-attachment, upload-document, and complete-checklist-item/upload-evidence; services resolve bytes from fileData or s3Key via UploadsService.readUploadAsBase64 (org-validated).
    • Guard presigned reads before download: HEAD via getObjectContentLength; default 100MB ceiling (caller ceiling supported); clearer missing/oversize errors.
    • Centralize limits: MAX_UPLOAD_BYTES = 100 MiB, MAX_UPLOAD_BASE64_LENGTH = 139,810,136 (true 100MB base64). Apply to attachment/document/evidence DTOs and UploadsService; add UploadPurpose.document; import UploadsModule; regenerate OpenAPI; MCP overlay removes fileData; tests added/updated.
  • Migration

    • Agents: call create-upload-url (purpose=attachment|document|evidence) → PUT bytes to uploadUrl → call the tool with the returned s3Key.
    • Web UI/direct API: can keep sending fileData (either fileData or s3Key is accepted, not both).

Written for commit 8d1f757. Summary will update on new commits.

Review in cubic

…/evidence)

Customer-reported: the MCP server times out on uploads. Root cause is the
documented one — these tools take the whole file as base64 inside the tool
argument, so the LLM must emit the entire file token-by-token, which is
impractically slow and hits the client's ~4-min timeout. Questionnaire and
policy-PDF were already migrated to the presigned-upload pattern; the rest
were not.

Migrates the three tools the customer hit to the same pattern: accept an
optional `s3Key` (from POST /v1/uploads/presign) alongside `fileData`. The
service resolves the bytes from whichever is provided via
UploadsService.readUploadAsBase64, which enforces that the key belongs to the
caller's org. The MCP overlay strips `fileData` from these tools so agents
must use create-upload-url -> PUT to S3 -> pass the s3Key; the base spec keeps
fileData for the web UI / direct callers.

- attachments (upload-task-attachment): DTO + service + module
- knowledge-base (upload-document): DTO + service + module; also added the
  missing @ApiProperty decorators so the MCP tool finally has a real schema
- offboarding-checklist (complete-checklist-item, upload-evidence): DTO +
  service; delegates to the now-fixed AttachmentsService
- UploadPurpose: added `document`
- regenerated packages/docs/openapi.json (carries the new s3Key fields; also
  synced some pre-existing drift that was stale in the committed spec)

Tests: new attachments.service.spec (s3Key path + neither->400); extended
knowledge-base.service.spec (s3Key path + neither->400); offboarding specs
still green. typecheck clean; AppModule boots (DI wiring verified). The MCP
generator stays healthy: 0 operations declare more than one security scheme.

Out of scope (separate issues): create-version automation script-generation
also times out, but that is heavy synchronous AI work, not base64 — needs an
async job pattern.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
@vercel
Copy link
Copy Markdown

vercel Bot commented Jun 5, 2026

The latest updates on your projects. Learn more about Vercel for GitHub.

Project Deployment Actions Updated (UTC)
app Ready Ready Preview, Comment Jun 5, 2026 3:20pm
comp-framework-editor Ready Ready Preview, Comment Jun 5, 2026 3:20pm
1 Skipped Deployment
Project Deployment Actions Updated (UTC)
portal Skipped Skipped Jun 5, 2026 3:20pm

Request Review

@mintlify
Copy link
Copy Markdown
Contributor

mintlify Bot commented Jun 5, 2026

Preview deployment for your docs. Learn more about Mintlify Previews.

Project Status Preview Updated (UTC)
CompAI 🟢 Ready View Preview Jun 5, 2026, 2:23 PM

💡 Tip: Enable Workflows to automatically generate PRs for you.

Copy link
Copy Markdown
Contributor

@cubic-dev-ai cubic-dev-ai Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

3 issues found across 13 files

Confidence score: 3/5

  • There is a concrete memory-pressure risk in apps/api/src/knowledge-base/knowledge-base.service.ts: the new s3Key path can fully load and base64-encode oversized files before validation, which raises avoidable DoS/regression risk.
  • apps/api/src/attachments/attachments.service.ts and apps/api/src/knowledge-base/dto/upload-document.dto.ts both miss early base64 size/format guards, so oversized payloads may allocate significant memory before the 100MB policy checks are enforced.
  • Given multiple medium-severity (6/10) issues with high confidence and direct runtime impact, this is a moderate-risk merge until early length/format validation is added.
  • Pay close attention to apps/api/src/knowledge-base/knowledge-base.service.ts, apps/api/src/attachments/attachments.service.ts, apps/api/src/knowledge-base/dto/upload-document.dto.ts - add pre-decode base64 caps and DTO validation to prevent oversized payload memory spikes.
Prompt for AI agents (unresolved issues)

Check if these issues are valid — if so, understand the root cause of each and fix them. If appropriate, use sub-agents to investigate and fix each issue separately.


<file name="apps/api/src/knowledge-base/knowledge-base.service.ts">

<violation number="1" location="apps/api/src/knowledge-base/knowledge-base.service.ts:56">
P2: The new `s3Key` upload flow lacks an early size guard, so oversized files are fully loaded and base64-encoded in memory before validation, creating avoidable memory-pressure/DoS risk.</violation>
</file>

<file name="apps/api/src/attachments/attachments.service.ts">

<violation number="1" location="apps/api/src/attachments/attachments.service.ts:138">
P2: Missing pre-decode base64 length cap allows oversized payloads to allocate memory before file-size validation.

(Based on your team's feedback about base64 upload length limits for 100MB policy.) [FEEDBACK_USED].</violation>
</file>

Reply with feedback, questions, or to request a fix.

Fix all with cubic | Re-trigger cubic

const fileData =
dto.fileData ??
(dto.s3Key
? await this.uploadsService.readUploadAsBase64(
Copy link
Copy Markdown
Contributor

@cubic-dev-ai cubic-dev-ai Bot Jun 5, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

P2: The new s3Key upload flow lacks an early size guard, so oversized files are fully loaded and base64-encoded in memory before validation, creating avoidable memory-pressure/DoS risk.

Prompt for AI agents
Check if this issue is valid — if so, understand the root cause and fix it. At apps/api/src/knowledge-base/knowledge-base.service.ts, line 56:

<comment>The new `s3Key` upload flow lacks an early size guard, so oversized files are fully loaded and base64-encoded in memory before validation, creating avoidable memory-pressure/DoS risk.</comment>

<file context>
@@ -44,12 +47,30 @@ export class KnowledgeBaseService {
+    const fileData =
+      dto.fileData ??
+      (dto.s3Key
+        ? await this.uploadsService.readUploadAsBase64(
+            dto.organizationId,
+            dto.s3Key,
</file context>
Fix with cubic


// Validate file size
const fileBuffer = Buffer.from(uploadDto.fileData, 'base64');
const fileBuffer = Buffer.from(fileData, 'base64');
Copy link
Copy Markdown
Contributor

@cubic-dev-ai cubic-dev-ai Bot Jun 5, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

P2: Missing pre-decode base64 length cap allows oversized payloads to allocate memory before file-size validation.

(Based on your team's feedback about base64 upload length limits for 100MB policy.) .

View Feedback

Prompt for AI agents
Check if this issue is valid — if so, understand the root cause and fix it. At apps/api/src/attachments/attachments.service.ts, line 138:

<comment>Missing pre-decode base64 length cap allows oversized payloads to allocate memory before file-size validation.

(Based on your team's feedback about base64 upload length limits for 100MB policy.) .</comment>

<file context>
@@ -115,8 +116,26 @@ export class AttachmentsService {
+
       // Validate file size
-      const fileBuffer = Buffer.from(uploadDto.fileData, 'base64');
+      const fileBuffer = Buffer.from(fileData, 'base64');
       if (fileBuffer.length > this.MAX_FILE_SIZE_BYTES) {
         throw new BadRequestException(
</file context>
Fix with cubic

Comment thread apps/api/src/knowledge-base/dto/upload-document.dto.ts
Addresses the cubic review on PR #3042.

A presigned S3 PUT cannot enforce a size limit, so an authenticated client
could upload an arbitrarily large file and then call a feature endpoint that
reads it back via readUploadAsBase64 -> getObjectAsBuffer, loading the whole
object into memory (plus ~1.33x for base64) before any size check ran. That is
an avoidable OOM/DoS vector shared by attachment, knowledge-base, and
questionnaire uploads.

- Add getObjectContentLength (a HEAD request) and check the object size BEFORE
  downloading in readUploadAsBase64; reject anything over a configurable
  ceiling (default 100MB, matching the feature services' decoded-buffer limit).
- Add the missing @maxlength(134_217_728) + @IsBase64() to UploadDocumentDto's
  fileData, matching the other migrated upload DTOs (caps the inline base64
  path at the validation layer).
- Cover both with unit tests: oversized rejected before download, caller
  maxBytes honored, unknown content-length still proceeds, HEAD failure gives a
  clear error.

Issues identified by cubic.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
The base64 max-length on the inline fileData fields was 134_217_728, which is
the encoded length of only 96 MiB of data — but the web UI dropzones and the
feature services both allow 100 MiB. So a 96-100 MiB file a user can pick in
the UI would be rejected with a 400 at validation. Adding that same cap to
UploadDocumentDto (the cubic fix) would have introduced this break for
knowledge-base uploads, whose dropzone allows 100 MiB.

- Add upload-limits.ts as the single source of truth: MAX_UPLOAD_BYTES
  (100 MiB) and MAX_UPLOAD_BASE64_LENGTH (its base64 length, 139,810,136).
- Point all three inline upload DTOs (attachment, knowledge-base document,
  offboarding evidence) and the presigned-read ceiling at it. Loosening a cap
  only accepts more (still <= the 100 MiB the services enforce on decode), so
  no currently-valid upload is newly rejected.
- Verified the web UI sends raw, prefix-stripped base64, so the @IsBase64()
  check accepts existing UI uploads unchanged.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
@vercel vercel Bot temporarily deployed to Preview – portal June 5, 2026 15:16 Inactive
@tofikwest tofikwest merged commit 16719b2 into main Jun 5, 2026
15 checks passed
@tofikwest tofikwest deleted the fix/mcp-uploads-presigned-s3key branch June 5, 2026 15:23
claudfuen pushed a commit that referenced this pull request Jun 5, 2026
# [3.72.0](v3.71.0...v3.72.0) (2026-06-05)

### Bug Fixes

* **api:** accept presigned s3Key for MCP uploads (attachment/document/evidence) ([a7ffbb7](a7ffbb7))
* **api:** cap inline upload base64 at the true 100MB limit (UI-safe) ([f076789](f076789))
* **api:** guard presigned-upload reads against oversized files ([4e167b0](4e167b0))
* **api:** harden task automations for MCP/API clients ([0f92e4e](0f92e4e)), closes [#3042](#3042)
* **api:** reject whitespace-only scriptKey in CreateVersionDto ([4bf5454](4bf5454))
* **cloud-security:** scope cloud tests findings to the selected account ([98a4fa2](98a4fa2))

### Features

* **cloud-security:** label the cloud tests connection selector ([2d1404e](2d1404e))
@claudfuen
Copy link
Copy Markdown
Contributor

🎉 This PR is included in version 3.72.0 🎉

The release is available on GitHub release

Your semantic-release bot 📦🚀

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants