Broad framework coverage + per-batch composable prompt by cramforce · Pull Request #53 · vercel-labs/deepsec

cramforce · 2026-05-05T23:31:24Z

Why

Before this PR, deepsec was shaped by the codebases it was built against —
heavy on Next.js / React / Node, with respectable Go and Lua. Two
consequences:

Entry-point coverage fell off a cliff outside that core. A Laravel,
Django, Rails, Actix, or Symfony repo got few candidates, so the AI
processor had nothing to investigate even when real vulnerabilities
were present.
The default prompt was implicitly Next.js-flavored. Long sections
on dangerouslySetInnerHTML, Next.js middleware.ts, Server Actions,
and validNextRedirect() consumed tokens and added noise on
Rails/Django/Laravel repos where none of those concepts exist.

This PR rebuilds the prompt assembly so it adapts per-batch to the tech
in the repo, and ships matchers + threat highlights for ~64 frameworks
across 13 ecosystems.

What's in the box

1. Tech detection + matcher gating

New detectTech(rootPath) walks lockfiles / manifests / sentinel files
and emits a normalized tag list (e.g. ["nextjs", "django", "rails", "laravel"]). Persisted to data/<id>/tech.json.
MatcherPlugin gains an optional requires: MatcherGate field —
{ tech: ["laravel"] } or { sentinelFiles, sentinelContains }.
Matchers without a gate run unconditionally (backwards compatible).
scan() evaluates gates once per scan and reports activeMatchers /
skippedMatchers. --matchers <slug> honors every named slug per-
matcher (an earlier draft silently dropped gated slugs when mixed with
ungated ones — fixed before merge).

2. Composable, per-batch prompt assembly

packages/processor/src/prompt/:

core.ts — generic, framework-agnostic core (severity ladder including
HIGH_BUG/BUG, FP guidance, auth-bypass patterns, "static analysis
only" constraint).
highlights.ts — terse 3–6-line threat blocks, one per framework, each
tagged with the languages it applies to.
slug-notes.ts — one-line reviewer-instinct sentence per slug.
assemble.ts — composes core + highlights + slug-notes + INFO.md + promptAppend per batch, filtered to:
- languages of the files in this specific batch (a Python batch
  in a polyglot repo doesn't carry the Next.js/Express highlights),
- slugs that the scanner actually flagged in this batch.
A hard char budget caps the framework section; a polyglot fallback
emits a one-line summary when too many highlights would crowd the
prompt.

The agent-layer wrapper (buildInvestigatePrompt in agents/shared.ts)
no longer double-emits INFO.md or duplicates the "scanner casts a wide
net" intro; the procedural per-file investigation steps now live there
once instead of being duplicated in the system prompt.

3. Frameworks shipped

Ecosystem	Frameworks
Node JS/TS	Next.js, React, Express, Fastify, NestJS, Hono, Koa, Hapi, Remix, SvelteKit, Nuxt, Astro, SolidStart, GraphQL, Socket.IO, BullMQ, Bun, Deno, Cloudflare Workers
Python	Django, Django REST Framework, FastAPI, Flask, Starlette, aiohttp, Tornado, Sanic, Bottle, Falcon, Celery, Airflow
PHP	Laravel, Symfony, Slim, Yii, CakePHP, CodeIgniter, WordPress, Drupal, Magento
Ruby	Rails, Sinatra, Grape, Hanami, Roda
Go	Gin, Echo, Fiber, Chi, Gorilla mux, Buffalo, Cobra
Rust	Actix, Axum, Rocket, Warp, Tide, Poem, Tonic, lambda-runtime
JVM	Spring, Ktor, Micronaut, JAX-RS
.NET	ASP.NET MVC, Minimal API, Razor Pages, Azure Functions
Other	Phoenix (Elixir), Kemal (Crystal), Ring/Compojure (Clojure), Cowboy (Erlang), Vapor (Swift), Shelf (Dart), Apex/Salesforce
Cloud functions	AWS Lambda (Node/Python/Java), GCP Cloud Functions, Azure Functions
Mobile	Android exported components, iOS URL schemes

Each framework: matcher (gated by detected tech) + per-tech threat
highlight + per-slug reviewer note.

Full catalog at docs/supported-tech.md.

4. Deterministic prompt samples

prompt-samples/*.md — 11 committed fixtures showing the full prompt
the model receives (assembled core + tech highlights + slug notes +
agent-layer file list + JSON output spec) for representative scenarios:
empty repo, Next.js batch, Django batch, polyglot Python batch (proves
the language filter drops Next.js highlights), polyglot TS batch, mixed
batch, overflow→fallback, INFO.md+append, Laravel, Rails (proves the
ERB escaping correction), Go multi-framework. A vitest verifies the
on-disk samples match what the assembler produces; regenerate with
UPDATE_PROMPT_SAMPLES=1 pnpm test:unit.

5. Match-rate / low-coverage warning

The scanner now reports per-language stats (files scanned, candidates
produced, match rate). The CLI surfaces a yellow warning when a
language has ≥50 source files but a <1% match rate — pointing the user
at custom matchers when they're on a stack we don't cover.

6. CLI / scan UX

pnpm deepsec scan now produces a structured summary:

Detected tech (wrapped if many)
Coverage by language (files / hits / match-rate, low cells colored)
Matchers that fired (top 12 by count)
Top files by candidate count (top 5 with their slug mix)
Low-coverage warnings (where applicable)
Final tally + adaptive "Next" block (suggests --only-slugs /
--filter on large sets, points at custom matchers when zero hits)

A live progress bar replaces the prior 30 s of silence during the
matcher loop — in-place 24-char bar at ~20fps on TTY, quartile heartbeat
lines on non-TTY (CI logs).

7. Regex hardening

Audited every matcher for catastrophic backtracking. Worst offender was
the XSS matcher's /\$\{.*\}.*<\/?\w+>|<\w+[^>]*\$\{/ — measured 662
ms on a 60kB adversarial line; bounded version takes 0.1 ms
(~6600× faster). Fixed ~25 patterns across xss, cache-key-poisoning,
oauth-flow, env-var-as-bool, response-header-leak,
lua-crypto-weakness, git-provider-url-injection, jwt-handling,
insecure-crypto, test-header-bypass, object-injection,
streaming-endpoint, dev-auth-bypass, auth-bypass, secret-env-var,
sql-injection — replaced unbounded .* chains with bounded .{0,N}
or negated character classes.

8. Bug fixes that came out of building this

projectInfo was being emitted twice — once by the new assembler
(between --- rules), once by the agent layer under ## Project Context. Fixed: assembler-mode passes "" to the agent layer.
Generic core was Next.js-flavored — the FP-guidance line
explicitly called out Next.js middleware.ts. Reworded to describe
the pattern across ecosystems (Express middleware, Fastify hooks,
NestJS guards, Spring filters, Rails before_action, Django decorators,
FastAPI Depends).
HIGH_BUG / BUG severity tiers were never defined in the core
prompt, but the output spec required them — model was guessing.
spread-operator-injection slug note had reversed precedence —
said the safe shape was the bug. Corrected: {role: 'user', ...userInput} (trailing spread) is the dangerous form.
Rails ERB note flagged auto-escaped output as XSS — corrected to
name raw(x) / x.html_safe / <%== %> as the actual sinks; bare
<%= %> auto-escapes in Rails ≥ 3.

Plugin authoring

Adding a new framework is a single PR:

Detector branch in packages/scanner/src/detect-tech.ts (sentinel →
tag).
Matcher under packages/scanner/src/matchers/<slug>.ts with
requires: { tech: ["<tag>"] }.
Highlight entry in packages/processor/src/prompt/highlights.ts
(3–6 short bullet lines).
Slug note in packages/processor/src/prompt/slug-notes.ts (one
sentence).
Register in packages/scanner/src/matchers/index.ts.

Tests follow the pattern in
packages/scanner/src/__tests__/framework-matchers.test.ts.

What stayed the same

Existing always-on matchers (Lua, Terraform, Docker, K8s,
ConnectRPC, generic Go, ORMs, AI/agentic) are unchanged and still
run on every repo.
--prompt-template <string> callers get their custom string verbatim
(no assembler, no double-emission).
All existing tests pass without modification beyond name updates.

Test plan

pnpm validate — build + lint + knip + bundle + 224 tests pass
Adversarial regex benchmark on the XSS matcher — old 662 ms, new
0.1 ms on a 60 kB line
End-to-end scan against fixtures/vulnerable-app produces the new
structured summary, progress heartbeats, and the same set of
candidates as before
Polyglot test scenario in prompt-samples/04 proves a Python
batch in a Next.js+Django+Express+Rails repo gets only the Django
highlight
--matchers php-laravel-route,xss regression test confirms gated
slugs still run when explicitly requested on a non-matching repo

vercel · 2026-05-05T23:31:28Z

The latest updates on your projects. Learn more about Vercel for GitHub.

1 Skipped Deployment

Project	Deployment	Actions	Updated (UTC)
deepsec	Ignored		May 6, 2026 0:22am

Reorganize prompt and support way more tech

55ce9fe

cramforce force-pushed the composable-prompt branch from f4feada to 55ce9fe Compare May 6, 2026 00:22

cramforce merged commit 0ddd4fa into main May 6, 2026
8 checks passed

divyamagrawal06 mentioned this pull request May 6, 2026

feat(scanner): add FastAPI and Flask HTTP route matchers #52

Closed

4 tasks

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Broad framework coverage + per-batch composable prompt#53

Broad framework coverage + per-batch composable prompt#53
cramforce merged 1 commit into
mainfrom
composable-prompt

cramforce commented May 5, 2026

Uh oh!

vercel Bot commented May 5, 2026 •

edited

Loading

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Conversation

cramforce commented May 5, 2026

Why

What's in the box

1. Tech detection + matcher gating

2. Composable, per-batch prompt assembly

3. Frameworks shipped

4. Deterministic prompt samples

5. Match-rate / low-coverage warning

6. CLI / scan UX

7. Regex hardening

8. Bug fixes that came out of building this

Plugin authoring

What stayed the same

Test plan

Uh oh!

vercel Bot commented May 5, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

vercel Bot commented May 5, 2026 •

edited

Loading