[dev] [Marfuen] mariano/fix-evidence-export-oom-streaming#3026
Merged
Conversation
…ent OOM The previous OOM fix loaded automations one at a time but still accumulated all runs for a single automation in memory. For orgs with large cloud security check histories, a single automation's runs could exceed the 6GB heap limit. Now uses async generators to stream run batches (50 at a time) through PDF and JSON generation. Peak memory is bounded by one batch of runs + the jsPDF document, regardless of total automation size. - evidence-data-loader: add streamAutomationRuns async generator - evidence-pdf-generator: extract renderRunToPDF, add generateAutomationPDFFromStream - evidence-json-builder: add buildAutomationJsonStream using Readable.from() - evidence-export.service: wire streaming into ZIP export path Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
…task The auditor bulk evidence export previously ran in the API process, peaking at ~20% memory per request. Multiple concurrent exports could OOM the container. Now the heavy work (DB queries, PDF generation, ZIP creation) runs in a Trigger.dev background task with its own memory. The API endpoint triggers the task and returns a runId for progress tracking. - Add export-organization-evidence Trigger.dev task (S3 upload + presigned URL) - Change POST /v1/evidence-export/all to trigger background task - Frontend uses useRealtimeRun for progress + auto-download on completion - API process memory stays flat regardless of export size Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
…OOM + retry storms Follow-up to the Trigger.dev offload, addressing a 2026-06-04 prod downtime (3 concurrent in-process exports for one org starved the event loop → health checks failed → ECS restarted the tasks; ~11 min degraded). - Stream the ZIP straight to S3 via @aws-sdk/lib-storage Upload + a PassThrough instead of Buffer.concat'ing the whole archive. Peak worker memory is now bounded by one automation's PDF plus the uploader's part buffer (~40MB), never the full ZIP. Populate + upload run concurrently with explicit error forwarding and symmetric upload.abort() on either failure (no orphaned multipart parts). - Run the task on machine large-1x (8GB/4vCPU) — isolated from the API. - Serialize exports per org: task queue concurrencyLimit 1 + per-org concurrencyKey, plus an org+includeJson idempotencyKey (30m TTL) so a burst of retries collapses to a single run instead of N concurrent exports. - Record partial failures in the manifest (hasFailures + failedTasks[]) and in run metadata (tasksFailed) so a partial export is self-evident from the ZIP. - Use ctx.run.id for a unique, traceable S3 key. Tests: streamArchiveToS3 happy-path + both failure paths; controller asserts the per-org concurrency/idempotency trigger options. 43 passing. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
Contributor
There was a problem hiding this comment.
2 issues found across 14 files
Confidence score: 3/5
- There is concrete user-impact risk in
apps/app/src/app/(app)/[orgId]/auditor/(overview)/components/ExportEvidenceButton.tsx: the dialog cannot be dismissed whileisRunning=true, which conflicts with the UI copy and can trap users until the job finishes or errors. apps/app/src/app/(app)/[orgId]/tasks/components/TasksPageClient.tsxhas a completion-path gap: ifdownloadUrlis missing inonComplete, the export UI closes without download or error feedback, making a failed/invalid result look like a no-op.- These are medium-severity, high-confidence UX regressions rather than hard crashes, so this sits at moderate merge risk instead of a hard block.
- Pay close attention to
apps/app/src/app/(app)/[orgId]/auditor/(overview)/components/ExportEvidenceButton.tsxandapps/app/src/app/(app)/[orgId]/tasks/components/TasksPageClient.tsx- export completion/cancel flows need clear user escape and error handling.
Reply with feedback, questions, or to request a fix.
Fix all with cubic | Re-trigger cubic
Resolve conflicts from the divergent bulk-evidence-export work:
- evidence-export.controller.spec.ts: keep the background-task assertions
(AuditorEvidenceExportController triggers a Trigger.dev job and returns
{ runId, publicAccessToken }); drop main's obsolete direct-stream
(res.setHeader/flushHeaders/archive.pipe) assertions, which referenced a
`res` no longer in scope for that controller.
- evidence-export.controller.ts: drop the orphaned `res.flushHeaders()` that
the auto-merge spliced into exportAllEvidence — that method no longer takes
an @res() param, so the call referenced an undefined `res` and broke the
build. The single-task GET export keeps main's streaming + client-disconnect
hardening.
- bun.lock: regenerated via `bun install` to reconcile the PR's new
@aws-sdk/lib-storage + @aws-sdk/s3-request-presigner deps with main.
Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
…lent failures, invalid hook option)
Three issues in the new bulk-evidence-export progress UI:
1. ExportEvidenceButton: the running export sheet could not be dismissed —
onOpenChange swallowed every close while a run was active, directly
contradicting the in-sheet copy ("You can close this dialog — the export
will continue in the background"). Lifted the useRealtimeRun subscription
from the unmount-on-close ExportProgress child up to the parent so the run
keeps streaming and still auto-downloads on completion after the sheet is
closed, then made the sheet freely dismissable (onOpenChange={setIsOpen}).
(reported by cubic)
2. TasksPageClient: onComplete handled the happy path but, when the finished
run had no downloadUrl, it silently closed the export UI with no download
and no feedback — a completed-but-invalid export looked like a no-op. Added
the missing error toast, matching ExportEvidenceButton. (reported by cubic)
3. Both files passed an `onError` option to useRealtimeRun, which does not
exist on UseRealtimeSingleRunOptions — a type error that was failing the
app build. Removed it and folded failure handling into onComplete(run, err),
treating any non-COMPLETED terminal state (or an err) as a failure, the same
way PolicyHeaderActions does.
Tests: new ExportEvidenceButton.test.tsx (dismiss-while-running, download +
success, metadata fallback, missing-link error, run-failure error, onComplete
err) and added export-flow cases to TasksPageClient.test.tsx; also fixed that
file's stale `downloadAllEvidenceZip` mock (now triggerBulkEvidenceExport).
Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
Contributor
|
@cubic-dev-ai review it |
Contributor
@tofikwest I have started the AI code review. It will take a few minutes to complete. |
Contributor
|
🎉 This PR is included in version 3.73.0 🎉 The release is available on GitHub release Your semantic-release bot 📦🚀 |
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
This is an automated pull request to merge mariano/fix-evidence-export-oom-streaming into dev.
It was created by the [Auto Pull Request] action.
Summary by cubic
Offloaded bulk evidence export to a Trigger.dev background task and streamed ZIP creation straight to S3 in constant memory. The API now POSTs an export job and returns a run ID + token; the app tracks progress in realtime, can be closed while running, and auto‑downloads on completion.
New Features
export-organization-evidencetask that uploads the ZIP via@aws-sdk/lib-storageUpload and returns a presigned URL; includes per‑orgconcurrencyKeyand anidempotencyKeywith a 30m TTL.generateAutomationPDFFromStreamandbuildAutomationJsonStream;streamArchiveToS3pipes anarchiverPassThrough to S3 and records a manifest with any partial failures.POST /v1/evidence-export/all?includeJson=true|falsethat returns{ runId, publicAccessToken }; the frontend usestriggerBulkEvidenceExport+@trigger.dev/react-hooksto show progress and auto‑download even if the dialog is closed.Bug Fixes
onErroroption fromuseRealtimeRun.res.flushHeaders()); added tests for S3 streaming error paths and export UI flows.Written for commit e91c547. Summary will update on new commits.