Broad framework coverage + per-batch composable prompt#53
Merged
Conversation
|
The latest updates on your projects. Learn more about Vercel for GitHub. |
f4feada to
55ce9fe
Compare
4 tasks
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Why
Before this PR, deepsec was shaped by the codebases it was built against —
heavy on Next.js / React / Node, with respectable Go and Lua. Two
consequences:
Django, Rails, Actix, or Symfony repo got few candidates, so the AI
processor had nothing to investigate even when real vulnerabilities
were present.
on
dangerouslySetInnerHTML, Next.jsmiddleware.ts, Server Actions,and
validNextRedirect()consumed tokens and added noise onRails/Django/Laravel repos where none of those concepts exist.
This PR rebuilds the prompt assembly so it adapts per-batch to the tech
in the repo, and ships matchers + threat highlights for ~64 frameworks
across 13 ecosystems.
What's in the box
1. Tech detection + matcher gating
detectTech(rootPath)walks lockfiles / manifests / sentinel filesand emits a normalized tag list (e.g.
["nextjs", "django", "rails", "laravel"]). Persisted todata/<id>/tech.json.MatcherPlugingains an optionalrequires: MatcherGatefield —{ tech: ["laravel"] }or{ sentinelFiles, sentinelContains }.Matchers without a gate run unconditionally (backwards compatible).
scan()evaluates gates once per scan and reportsactiveMatchers/skippedMatchers.--matchers <slug>honors every named slug per-matcher (an earlier draft silently dropped gated slugs when mixed with
ungated ones — fixed before merge).
2. Composable, per-batch prompt assembly
packages/processor/src/prompt/:core.ts— generic, framework-agnostic core (severity ladder includingHIGH_BUG/BUG, FP guidance, auth-bypass patterns, "static analysisonly" constraint).
highlights.ts— terse 3–6-line threat blocks, one per framework, eachtagged with the languages it applies to.
slug-notes.ts— one-line reviewer-instinct sentence per slug.assemble.ts— composescore + highlights + slug-notes + INFO.md + promptAppendper batch, filtered to:in a polyglot repo doesn't carry the Next.js/Express highlights),
emits a one-line summary when too many highlights would crowd the
prompt.
The agent-layer wrapper (
buildInvestigatePromptinagents/shared.ts)no longer double-emits
INFO.mdor duplicates the "scanner casts a widenet" intro; the procedural per-file investigation steps now live there
once instead of being duplicated in the system prompt.
3. Frameworks shipped
Each framework: matcher (gated by detected tech) + per-tech threat
highlight + per-slug reviewer note.
Full catalog at
docs/supported-tech.md.4. Deterministic prompt samples
prompt-samples/*.md— 11 committed fixtures showing the full promptthe model receives (assembled core + tech highlights + slug notes +
agent-layer file list + JSON output spec) for representative scenarios:
empty repo, Next.js batch, Django batch, polyglot Python batch (proves
the language filter drops Next.js highlights), polyglot TS batch, mixed
batch, overflow→fallback, INFO.md+append, Laravel, Rails (proves the
ERB escaping correction), Go multi-framework. A vitest verifies the
on-disk samples match what the assembler produces; regenerate with
UPDATE_PROMPT_SAMPLES=1 pnpm test:unit.5. Match-rate / low-coverage warning
The scanner now reports per-language stats (files scanned, candidates
produced, match rate). The CLI surfaces a yellow warning when a
language has ≥50 source files but a <1% match rate — pointing the user
at custom matchers when they're on a stack we don't cover.
6. CLI / scan UX
pnpm deepsec scannow produces a structured summary:--only-slugs/--filteron large sets, points at custom matchers when zero hits)A live progress bar replaces the prior 30 s of silence during the
matcher loop — in-place 24-char bar at ~20fps on TTY, quartile heartbeat
lines on non-TTY (CI logs).
7. Regex hardening
Audited every matcher for catastrophic backtracking. Worst offender was
the XSS matcher's
/\$\{.*\}.*<\/?\w+>|<\w+[^>]*\$\{/— measured 662ms on a 60kB adversarial line; bounded version takes 0.1 ms
(~6600× faster). Fixed ~25 patterns across
xss,cache-key-poisoning,oauth-flow,env-var-as-bool,response-header-leak,lua-crypto-weakness,git-provider-url-injection,jwt-handling,insecure-crypto,test-header-bypass,object-injection,streaming-endpoint,dev-auth-bypass,auth-bypass,secret-env-var,sql-injection— replaced unbounded.*chains with bounded.{0,N}or negated character classes.
8. Bug fixes that came out of building this
projectInfowas being emitted twice — once by the new assembler(between
---rules), once by the agent layer under## Project Context. Fixed: assembler-mode passes""to the agent layer.explicitly called out
Next.js middleware.ts. Reworded to describethe pattern across ecosystems (Express middleware, Fastify hooks,
NestJS guards, Spring filters, Rails before_action, Django decorators,
FastAPI Depends).
HIGH_BUG/BUGseverity tiers were never defined in the coreprompt, but the output spec required them — model was guessing.
spread-operator-injectionslug note had reversed precedence —said the safe shape was the bug. Corrected:
{role: 'user', ...userInput}(trailing spread) is the dangerous form.name
raw(x)/x.html_safe/<%== %>as the actual sinks; bare<%= %>auto-escapes in Rails ≥ 3.Plugin authoring
Adding a new framework is a single PR:
packages/scanner/src/detect-tech.ts(sentinel →tag).
packages/scanner/src/matchers/<slug>.tswithrequires: { tech: ["<tag>"] }.packages/processor/src/prompt/highlights.ts(3–6 short bullet lines).
packages/processor/src/prompt/slug-notes.ts(onesentence).
packages/scanner/src/matchers/index.ts.Tests follow the pattern in
packages/scanner/src/__tests__/framework-matchers.test.ts.What stayed the same
ConnectRPC, generic Go, ORMs, AI/agentic) are unchanged and still
run on every repo.
--prompt-template <string>callers get their custom string verbatim(no assembler, no double-emission).
Test plan
pnpm validate— build + lint + knip + bundle + 224 tests pass0.1 ms on a 60 kB line
fixtures/vulnerable-appproduces the newstructured summary, progress heartbeats, and the same set of
candidates as before
prompt-samples/04proves a Pythonbatch in a Next.js+Django+Express+Rails repo gets only the Django
highlight
--matchers php-laravel-route,xssregression test confirms gatedslugs still run when explicitly requested on a non-matching repo