Skip to content

feat(integration-platform): expand validation library detection for sanitized inputs#2592

Merged
tofikwest merged 5 commits intomainfrom
tofik/cs-sanitized-inputs-expand-libraries
Apr 20, 2026
Merged

feat(integration-platform): expand validation library detection for sanitized inputs#2592
tofikwest merged 5 commits intomainfrom
tofik/cs-sanitized-inputs-expand-libraries

Conversation

@tofikwest
Copy link
Copy Markdown
Contributor

@tofikwest tofikwest commented Apr 17, 2026

Summary

  • Expand the GitHub Sanitized Inputs & Code Scanning check to detect many more validation libraries, including customer-requested ones surfaced in Slack (Yup, Joi, Marshmallow, Effect Schema, Laravel)
  • Add PHP / Laravel support via composer.json — new ecosystem, same architecture
  • Tighten the Python matcher so substring false positives (e.g. schema inside jsonschema) and comment mentions no longer trigger a pass

What changed

One file: packages/integration-platform/src/manifests/github/checks/sanitized-inputs.ts

Library coverage

Ecosystem Before After
JS/TS zod zod, yup, joi, @effect/schema, effect, valibot, ajv, class-validator, io-ts, superstruct, runtypes
Python pydantic pydantic, marshmallow, cerberus, voluptuous, jsonschema, schematics, typeguard
PHP laravel/framework, respect/validation, symfony/validator, vlucas/valitron

Python matcher fix

Before: loose content.toLowerCase().includes(candidate) — matched substrings anywhere, including in comments and inside other package names.

After: line-based regex that strips comments (#) and requires the package name to appear as a standalone token with appropriate leading/trailing separators (whitespace, version operators ==/>=/<=/!=/~=, brackets, quotes, commas, semicolons). This is the tightening that makes expanding the Python list safe.

Failure message

Updated to list all supported libraries and mention composer.json so customers can self-diagnose why their repo failed and know which libraries we check for.

What is NOT in this PR

  • Splitting the bundled check into two separate checks (input validation vs. code scanning) — planned as a follow-up task
  • Code-scanning alternatives (Semgrep, SonarQube, Snyk, Larastan) — next PR
  • User-configurable variable for custom libraries — next PR if long-tail complaints continue after this ships
  • Unit tests — helpers are closure-scoped inside run, would require a refactor to extract them as module-level exports. Happy to do this as a follow-up PR.

Risk / compatibility

  • No regression for any library that worked before: zod and pydantic detection paths are preserved.
  • One narrow behavioral change to flag: Python git-URL requirements using #egg=name syntax (e.g. git+https://github.com/x/y.git@v1.0#egg=pydantic) will no longer match because we now strip at #. This pattern is rare, and the old match only caught it incidentally via the broken loose-match.
  • Python false-positive fix: comments mentioning a library name (e.g. # TODO: migrate off pydantic) no longer cause a false pass. This is a correctness improvement, not a regression.

Test plan

  • Run an integration check against a repo that uses yup → passes on validation side
  • Run against a repo that uses joi → passes
  • Run against a Python repo using marshmallow in requirements.txt → passes
  • Run against a Python repo with pydantic in a comment but not in deps → fails (was previously false-passing)
  • Run against a Python repo with jsonschema==4.0.0 → correctly matches jsonschema, does not incorrectly match on substrings
  • Run against a Laravel repo (composer.json with laravel/framework) → passes
  • Run against a plain PHP repo with respect/validation → passes
  • Run against a repo with none of the supported libs → fails with updated message listing what was checked
  • Verify existing Zod and Pydantic customers still pass (no regression)

Validation done locally

  • bun run build in packages/integration-platform → clean
  • tsc --noEmit on packages/integration-platform → exit 0
  • Broader API typecheck errors exist on origin/main unrelated to this change (verified by stashing and re-running)

🤖 Generated with Claude Code


Summary by cubic

Expands the Sanitized Inputs check to detect many more validation libraries across JS/TS, Python, and PHP, and adds composer.json scanning. Updates the failure message to list supported libraries and tightens Python matching with TOML-aware comment stripping while preserving VCS #egg=.

  • New Features

    • JS/TS: detect zod, yup, joi, @effect/schema, effect, valibot, ajv, class-validator, io-ts, superstruct, runtypes.
    • Python: detect pydantic, marshmallow, cerberus, voluptuous, jsonschema, schematics, typeguard.
    • PHP: detect via composer.json laravel/framework, respect/validation, symfony/validator, vlucas/valitron.
  • Bug Fixes

    • Python matcher parses lines, strips only actual comments, and matches standalone package names to avoid substring false positives.
    • Preserves VCS requirements like git+...#egg=pydantic.
    • TOML-aware comment stripping for pyproject.toml to prevent inline # false positives.

Written for commit 2cf8979. Summary will update on new commits.

…anitized inputs check

Expand the GitHub Sanitized Inputs check to cover more ecosystems and libraries, and tighten the Python matcher so false positives on substrings (e.g. "schema" inside "jsonschema") and comments no longer pass.

- JS/TS: zod (existing) + yup, joi, @effect/schema, effect, valibot, ajv, class-validator, io-ts, superstruct, runtypes
- Python: pydantic (existing) + marshmallow, cerberus, voluptuous, jsonschema, schematics, typeguard
- PHP: new — laravel/framework, respect/validation, symfony/validator, vlucas/valitron (detected via composer.json)
- Python matcher rewritten to parse lines, strip comments, and match package names as standalone tokens
- Failure message updated to list all supported libraries so customers know what we check for

Closes customer complaints surfaced in Slack (Yup, Joi, Marshmallow, Effect Schema, Laravel).

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
@vercel
Copy link
Copy Markdown

vercel Bot commented Apr 17, 2026

The latest updates on your projects. Learn more about Vercel for GitHub.

Project Deployment Actions Updated (UTC)
app Ready Ready Preview, Comment Apr 20, 2026 8:49pm
comp-framework-editor Ready Ready Preview, Comment Apr 20, 2026 8:49pm
1 Skipped Deployment
Project Deployment Actions Updated (UTC)
portal Skipped Skipped Apr 20, 2026 8:49pm

Request Review

Copy link
Copy Markdown
Contributor

@cubic-dev-ai cubic-dev-ai Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

No issues found across 1 file

Auto-approved: Expands validation library detection for JS, Python, and PHP while improving Python matching accuracy via regex. Changes are isolated to a single check manifest.

@tofikwest
Copy link
Copy Markdown
Contributor Author

@cubic-dev-ai review it

@cubic-dev-ai
Copy link
Copy Markdown
Contributor

cubic-dev-ai Bot commented Apr 20, 2026

@cubic-dev-ai review it

@tofikwest I have started the AI code review. It will take a few minutes to complete.

Copy link
Copy Markdown
Contributor

@cubic-dev-ai cubic-dev-ai Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

1 issue found across 1 file

Prompt for AI agents (unresolved issues)

Check if these issues are valid — if so, understand the root cause of each and fix them. If appropriate, use sub-agents to investigate and fix each issue separately.


<file name="packages/integration-platform/src/manifests/github/checks/sanitized-inputs.ts">

<violation number="1" location="packages/integration-platform/src/manifests/github/checks/sanitized-inputs.ts:198">
P2: The Python matcher strips all `#` fragments, which breaks detection for valid VCS requirements like `...#egg=pydantic`. Strip only actual comments instead of splitting at the first `#`.</violation>
</file>

Reply with feedback, questions, or to request a fix. Tag @cubic-dev-ai to re-run a review, or fix all with cubic.

Comment thread packages/integration-platform/src/manifests/github/checks/sanitized-inputs.ts Outdated
Only strip actual comments (# preceded by whitespace or at line start)
so VCS requirements like `git+https://...#egg=pydantic` remain matchable.
Also add `=` to the leading separator set so the package name inside
`#egg=<name>` is recognized as a standalone token.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Copy link
Copy Markdown
Contributor

@cubic-dev-ai cubic-dev-ai Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

1 issue found across 1 file (changes from recent commits).

Prompt for AI agents (unresolved issues)

Check if these issues are valid — if so, understand the root cause of each and fix them. If appropriate, use sub-agents to investigate and fix each issue separately.


<file name="packages/integration-platform/src/manifests/github/checks/sanitized-inputs.ts">

<violation number="1" location="packages/integration-platform/src/manifests/github/checks/sanitized-inputs.ts:201">
P2: The new comment-stripping regex is requirements-specific and can mis-parse `pyproject.toml` inline comments, causing false validation-library matches.</violation>
</file>

Reply with feedback, questions, or to request a fix. Tag @cubic-dev-ai to re-run a review, or fix all with cubic.

Comment thread packages/integration-platform/src/manifests/github/checks/sanitized-inputs.ts Outdated
…ject.toml

The previous strip-on-whitespace-# regex was requirements.txt-specific and
left inline TOML comments intact when `#` had no preceding whitespace (e.g.
`["requests"]# pydantic`), causing false-positive library matches.

Dispatch on file name:
  - pyproject.toml (TOML): strip `#.*$` anywhere — `#` is a comment outside
    strings, and dep values are always quoted so VCS names still match via
    the preceding `"name @ ...` prefix.
  - requirements.txt (pip): keep `(^|\s)#.*$` — `#` only starts a comment
    when preceded by whitespace, so VCS `#egg=name` fragments survive.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Copy link
Copy Markdown
Contributor

@cubic-dev-ai cubic-dev-ai Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

0 issues found across 1 file (changes from recent commits).

Requires human review: This PR introduces non-trivial parsing logic (regex-based line processing) and expands the check to a new ecosystem (PHP), which warrants human review for correctness.

@tofikwest tofikwest merged commit 0ef2984 into main Apr 20, 2026
11 checks passed
@tofikwest tofikwest deleted the tofik/cs-sanitized-inputs-expand-libraries branch April 20, 2026 20:58
@claudfuen
Copy link
Copy Markdown
Contributor

🎉 This PR is included in version 3.27.0 🎉

The release is available on GitHub release

Your semantic-release bot 📦🚀

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants