Skip to content

[dev] [tofikwest] fix/improve-security-questionnaire-parse#2826

Merged
tofikwest merged 3 commits into
mainfrom
fix/improve-security-questionnaire-parse
May 12, 2026
Merged

[dev] [tofikwest] fix/improve-security-questionnaire-parse#2826
tofikwest merged 3 commits into
mainfrom
fix/improve-security-questionnaire-parse

Conversation

@github-actions
Copy link
Copy Markdown
Contributor

@github-actions github-actions Bot commented May 12, 2026

This is an automated pull request to merge fix/improve-security-questionnaire-parse into dev.
It was created by the [Auto Pull Request] action.


Summary by cubic

Make questionnaire parsing more accurate and resilient with structured extraction, answerable‑item classification, and async processing. Adds robust PDF text extraction with per‑page handling and a Claude→OpenAI fallback, plus clearer task progress.

  • New Features

    • Async parsing via @trigger.dev/sdk; uploadAndParse returns { runId, publicAccessToken }. Task metadata now reports extracting → classifying_answerable_items → saving_questionnaire → completed with progress updates.
    • Improved extraction: per‑page PDF processing for large files with automatic OpenAI fallback; smarter XLSX/CSV/DOCX/PDF/images handling that ignores scoring/placeholder cells, detects headers, caps at 30 columns; rejects legacy .xls with a clear message.
    • LLM parsing classifies only answerable items in 25k‑char chunks with concurrency (4), dedupes across chunks, and saves answers as null for later auto‑answering; app upload uses @trycompai/design-system and accepts PDF, XLSX, CSV only.
  • Refactors

    • Replaced extractQuestionsWithAI with extractContentFromFile + parseQuestionsAndAnswers; simplified Excel handling and added structured logging.
    • Clearer errors (e.g., “Questionnaire with ID X not found”, “Failed to upload questionnaire file to S3”); controller/service return shape updated to run/token.

Written for commit 191cf0d. Summary will update on new commits.

@vercel
Copy link
Copy Markdown

vercel Bot commented May 12, 2026

The latest updates on your projects. Learn more about Vercel for GitHub.

Project Deployment Actions Updated (UTC)
app Ready Ready Preview, Comment May 12, 2026 8:31pm
comp-framework-editor Ready Ready Preview, Comment May 12, 2026 8:31pm
portal Ready Ready Preview, Comment May 12, 2026 8:31pm

Request Review

Copy link
Copy Markdown
Contributor

@cubic-dev-ai cubic-dev-ai Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

No issues found across 10 files

Confidence score: 5/5

  • Automated review surfaced no issues in the provided summaries.
  • No files require special attention.

Add a new function to handle PDF extraction that falls back to OpenAI when Claude's extraction fails. Update tests to cover this new behavior and refactor existing PDF extraction methods for improved clarity and functionality.
@vercel vercel Bot temporarily deployed to Preview – portal May 12, 2026 20:25 Inactive
@vercel vercel Bot temporarily deployed to Preview – app May 12, 2026 20:25 Inactive
@tofikwest tofikwest merged commit 7a51f4c into main May 12, 2026
7 of 10 checks passed
@tofikwest tofikwest deleted the fix/improve-security-questionnaire-parse branch May 12, 2026 20:29
@claudfuen
Copy link
Copy Markdown
Contributor

🎉 This PR is included in version 3.51.0 🎉

The release is available on GitHub release

Your semantic-release bot 📦🚀

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants