"Trust, but Verify." TypeSentry evaluates Large Language Models with adversarial TypeScript prompts and catches failures in security, async logic, and type safety before code reaches production.
LLMs can produce convincing code that still fails in critical ways:
- Concurrency bugs (
forEach(async ...), race conditions) - Security footguns (SQL injection, leaking secrets)
- Type hallucinations (
as any, broken generic assumptions) - Operational gaps (weak error paths, no reproducible artifacts)
TypeSentry turns these into measurable test cases.
- Suite definitions (
src/suites/*.json) model real-world engineering tasks. - Runner (
src/core/runner.ts) executes each case against model output (mocked by default). - Static evaluator (
src/evaluators/static_analysis.ts) checks:- forbidden regex patterns
- required regex patterns
- strict TypeScript compilation
- Repro pack reporter (
src/reporters/markdown.ts) stores prompt/code/errors per failure inexamples/.
src/suites/security_suite.json- JWT handling
- SQL query safety
- password hashing hygiene
src/suites/engineering_suite.json- async concurrency and retry patterns
- typed REST client expectations
- event-driven idempotency workflows
npm install
npm start -- run suites/security_suite.json
npm start -- run suites/engineering_suite.jsonYou can pass either suites/... or src/suites/...; CLI resolves both.
On failure, TypeSentry creates:
examples/repro_pack_<CASE_ID>_<TIMESTAMP>/
├── prompt.txt
├── generated_code.ts
└── analysis_report.md
npm start -- run <suite-path>: run suitenpm run typecheck: TypeScript compile check
- Plug real model providers (OpenAI/Anthropic) behind a provider interface.
- Add deterministic scoring weights per failure category.
- Add CI job that uploads repro packs as artifacts.