Skip to content

[dev] [Marfuen] mariano/fix-evidence-export-oom-streaming#3026

Merged
tofikwest merged 5 commits into
mainfrom
mariano/fix-evidence-export-oom-streaming
Jun 5, 2026
Merged

[dev] [Marfuen] mariano/fix-evidence-export-oom-streaming#3026
tofikwest merged 5 commits into
mainfrom
mariano/fix-evidence-export-oom-streaming

Conversation

@github-actions
Copy link
Copy Markdown
Contributor

@github-actions github-actions Bot commented Jun 4, 2026

This is an automated pull request to merge mariano/fix-evidence-export-oom-streaming into dev.
It was created by the [Auto Pull Request] action.


Summary by cubic

Offloaded bulk evidence export to a Trigger.dev background task and streamed ZIP creation straight to S3 in constant memory. The API now POSTs an export job and returns a run ID + token; the app tracks progress in realtime, can be closed while running, and auto‑downloads on completion.

  • New Features

    • Added export-organization-evidence task that uploads the ZIP via @aws-sdk/lib-storage Upload and returns a presigned URL; includes per‑org concurrencyKey and an idempotencyKey with a 30m TTL.
    • Streamed runs through PDF and JSON with generateAutomationPDFFromStream and buildAutomationJsonStream; streamArchiveToS3 pipes an archiver PassThrough to S3 and records a manifest with any partial failures.
    • Replaced the auditor export with POST /v1/evidence-export/all?includeJson=true|false that returns { runId, publicAccessToken }; the frontend uses triggerBulkEvidenceExport + @trigger.dev/react-hooks to show progress and auto‑download even if the dialog is closed.
  • Bug Fixes

    • Export dialog is now dismissible while a run is active; added error toasts when a run finishes without a download URL or ends in a non‑COMPLETED state; removed invalid onError option from useRealtimeRun.
    • Fixed a merge artifact in the controller (removed stray res.flushHeaders()); added tests for S3 streaming error paths and export UI flows.

Written for commit e91c547. Summary will update on new commits.

Review in cubic

Marfuen and others added 3 commits May 27, 2026 13:45
…ent OOM

The previous OOM fix loaded automations one at a time but still accumulated
all runs for a single automation in memory. For orgs with large cloud security
check histories, a single automation's runs could exceed the 6GB heap limit.

Now uses async generators to stream run batches (50 at a time) through PDF
and JSON generation. Peak memory is bounded by one batch of runs + the jsPDF
document, regardless of total automation size.

- evidence-data-loader: add streamAutomationRuns async generator
- evidence-pdf-generator: extract renderRunToPDF, add generateAutomationPDFFromStream
- evidence-json-builder: add buildAutomationJsonStream using Readable.from()
- evidence-export.service: wire streaming into ZIP export path

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
…task

The auditor bulk evidence export previously ran in the API process,
peaking at ~20% memory per request. Multiple concurrent exports could
OOM the container.

Now the heavy work (DB queries, PDF generation, ZIP creation) runs in
a Trigger.dev background task with its own memory. The API endpoint
triggers the task and returns a runId for progress tracking.

- Add export-organization-evidence Trigger.dev task (S3 upload + presigned URL)
- Change POST /v1/evidence-export/all to trigger background task
- Frontend uses useRealtimeRun for progress + auto-download on completion
- API process memory stays flat regardless of export size

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
…OOM + retry storms

Follow-up to the Trigger.dev offload, addressing a 2026-06-04 prod downtime
(3 concurrent in-process exports for one org starved the event loop → health
checks failed → ECS restarted the tasks; ~11 min degraded).

- Stream the ZIP straight to S3 via @aws-sdk/lib-storage Upload + a PassThrough
  instead of Buffer.concat'ing the whole archive. Peak worker memory is now
  bounded by one automation's PDF plus the uploader's part buffer (~40MB),
  never the full ZIP. Populate + upload run concurrently with explicit error
  forwarding and symmetric upload.abort() on either failure (no orphaned
  multipart parts).
- Run the task on machine large-1x (8GB/4vCPU) — isolated from the API.
- Serialize exports per org: task queue concurrencyLimit 1 + per-org
  concurrencyKey, plus an org+includeJson idempotencyKey (30m TTL) so a burst
  of retries collapses to a single run instead of N concurrent exports.
- Record partial failures in the manifest (hasFailures + failedTasks[]) and in
  run metadata (tasksFailed) so a partial export is self-evident from the ZIP.
- Use ctx.run.id for a unique, traceable S3 key.

Tests: streamArchiveToS3 happy-path + both failure paths; controller asserts the
per-org concurrency/idempotency trigger options. 43 passing.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
@vercel
Copy link
Copy Markdown

vercel Bot commented Jun 4, 2026

The latest updates on your projects. Learn more about Vercel for GitHub.

Project Deployment Actions Updated (UTC)
app Ready Ready Preview, Comment Jun 5, 2026 4:39pm
comp-framework-editor Ready Ready Preview, Comment Jun 5, 2026 4:39pm
portal Ready Ready Preview, Comment Jun 5, 2026 4:39pm

Request Review

Copy link
Copy Markdown
Contributor

@cubic-dev-ai cubic-dev-ai Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

2 issues found across 14 files

Confidence score: 3/5

  • There is concrete user-impact risk in apps/app/src/app/(app)/[orgId]/auditor/(overview)/components/ExportEvidenceButton.tsx: the dialog cannot be dismissed while isRunning=true, which conflicts with the UI copy and can trap users until the job finishes or errors.
  • apps/app/src/app/(app)/[orgId]/tasks/components/TasksPageClient.tsx has a completion-path gap: if downloadUrl is missing in onComplete, the export UI closes without download or error feedback, making a failed/invalid result look like a no-op.
  • These are medium-severity, high-confidence UX regressions rather than hard crashes, so this sits at moderate merge risk instead of a hard block.
  • Pay close attention to apps/app/src/app/(app)/[orgId]/auditor/(overview)/components/ExportEvidenceButton.tsx and apps/app/src/app/(app)/[orgId]/tasks/components/TasksPageClient.tsx - export completion/cancel flows need clear user escape and error handling.

Reply with feedback, questions, or to request a fix.

Fix all with cubic | Re-trigger cubic

Comment thread apps/app/src/app/(app)/[orgId]/tasks/components/TasksPageClient.tsx
tofikwest and others added 2 commits June 5, 2026 12:25
Resolve conflicts from the divergent bulk-evidence-export work:
- evidence-export.controller.spec.ts: keep the background-task assertions
  (AuditorEvidenceExportController triggers a Trigger.dev job and returns
  { runId, publicAccessToken }); drop main's obsolete direct-stream
  (res.setHeader/flushHeaders/archive.pipe) assertions, which referenced a
  `res` no longer in scope for that controller.
- evidence-export.controller.ts: drop the orphaned `res.flushHeaders()` that
  the auto-merge spliced into exportAllEvidence — that method no longer takes
  an @res() param, so the call referenced an undefined `res` and broke the
  build. The single-task GET export keeps main's streaming + client-disconnect
  hardening.
- bun.lock: regenerated via `bun install` to reconcile the PR's new
  @aws-sdk/lib-storage + @aws-sdk/s3-request-presigner deps with main.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
…lent failures, invalid hook option)

Three issues in the new bulk-evidence-export progress UI:

1. ExportEvidenceButton: the running export sheet could not be dismissed —
   onOpenChange swallowed every close while a run was active, directly
   contradicting the in-sheet copy ("You can close this dialog — the export
   will continue in the background"). Lifted the useRealtimeRun subscription
   from the unmount-on-close ExportProgress child up to the parent so the run
   keeps streaming and still auto-downloads on completion after the sheet is
   closed, then made the sheet freely dismissable (onOpenChange={setIsOpen}).
   (reported by cubic)

2. TasksPageClient: onComplete handled the happy path but, when the finished
   run had no downloadUrl, it silently closed the export UI with no download
   and no feedback — a completed-but-invalid export looked like a no-op. Added
   the missing error toast, matching ExportEvidenceButton. (reported by cubic)

3. Both files passed an `onError` option to useRealtimeRun, which does not
   exist on UseRealtimeSingleRunOptions — a type error that was failing the
   app build. Removed it and folded failure handling into onComplete(run, err),
   treating any non-COMPLETED terminal state (or an err) as a failure, the same
   way PolicyHeaderActions does.

Tests: new ExportEvidenceButton.test.tsx (dismiss-while-running, download +
success, metadata fallback, missing-link error, run-failure error, onComplete
err) and added export-flow cases to TasksPageClient.test.tsx; also fixed that
file's stale `downloadAllEvidenceZip` mock (now triggerBulkEvidenceExport).

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
@tofikwest
Copy link
Copy Markdown
Contributor

@cubic-dev-ai review it

@cubic-dev-ai
Copy link
Copy Markdown
Contributor

cubic-dev-ai Bot commented Jun 5, 2026

@cubic-dev-ai review it

@tofikwest I have started the AI code review. It will take a few minutes to complete.

Copy link
Copy Markdown
Contributor

@cubic-dev-ai cubic-dev-ai Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

No issues found across 11 files

Confidence score: 5/5

  • Automated review surfaced no issues in the provided summaries.
  • No files require special attention.

Re-trigger cubic

@tofikwest tofikwest merged commit 206b071 into main Jun 5, 2026
11 checks passed
@tofikwest tofikwest deleted the mariano/fix-evidence-export-oom-streaming branch June 5, 2026 16:50
@claudfuen
Copy link
Copy Markdown
Contributor

🎉 This PR is included in version 3.73.0 🎉

The release is available on GitHub release

Your semantic-release bot 📦🚀

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants