Prompt injection detection & prevention toolkit for LLM agents.
A zero-dependency TypeScript library that provides multiple layers of defense against prompt injection attacks, system prompt leakage, and unauthorized tool usage.
- 🔍 Scanner — Pattern-based prompt injection detection with threat scoring
- 🛡️ Firewall — Context-aware permission control for tools and actions
- 🐤 Canary Tokens — Invisible markers to detect system prompt leakage
- 🧹 Sanitizer — Content cleaning for untrusted external sources
# npm
npm install llm-sentinel
# bun
bun add llm-sentinel
# pnpm
pnpm add llm-sentinelimport { Scanner, Firewall, CanarySystem, Sanitizer } from 'llm-sentinel';
// ─── Scan user input ─────────────────────────────────────────────────────
const scanner = new Scanner();
const result = scanner.scan("ignore all previous instructions");
console.log(result.safe); // false
console.log(result.score); // ~0.95
console.log(result.flags); // [{ category: 'instruction_override', ... }]
// ─── Quick check ─────────────────────────────────────────────────────────
if (!scanner.isSafe(userInput)) {
throw new Error('Potential prompt injection detected');
}Detects prompt injection attempts using pattern matching and weighted threat scoring.
import { Scanner } from 'llm-sentinel';
const scanner = new Scanner({
threshold: 0.5, // Score threshold for safe/unsafe (default: 0.5)
customPatterns: [{ // Add your own detection patterns
id: 'custom_rule',
description: 'Detects custom injection pattern',
pattern: /my_custom_pattern/i,
category: 'instruction_override',
severity: 'high',
weight: 0.8,
}],
excludePatterns: ['override_ignore_previous'], // Disable specific defaults
});
const result = scanner.scan(userInput);
// result.safe → boolean
// result.score → 0-1 threat score
// result.flags → detailed threat flags
// result.categoryScores → per-category breakdown
// result.scanTimeMs → performance timingDetected categories:
instruction_override— "ignore previous", "your new instructions", etc.role_hijack— "you are now", DAN mode, developer modesystem_prompt_leak— "show your system prompt", "repeat instructions"delimiter_attack— Fake<system>tags,[INST]markersencoding_attack— Base64/hex encoded instructionssocial_engineering— Fake emergencies, developer claimsdata_exfiltration— "send data to URL", embed secrets in requeststool_abuse— Fake tool calls, code execution attempts
Context-aware permission engine for controlling tool access. Deny-by-default.
import { Firewall } from 'llm-sentinel';
const firewall = new Firewall({
tools: [
{ name: 'search', description: 'Web search', category: 'read' },
{ name: 'sendEmail', description: 'Send email', category: 'external' },
{ name: 'readFile', description: 'Read files', category: 'read' },
{ name: 'runCode', description: 'Execute code', category: 'execute' },
],
policies: [
{
context: 'dm',
allowedTools: ['search', 'readFile'],
allowedCategories: ['read'],
maxThreatScore: 0.3,
allowUnknownTools: false,
},
{
context: 'webhook',
allowedTools: ['search'],
allowedCategories: ['read'],
maxThreatScore: 0.1, // Very strict for webhooks
allowUnknownTools: false,
},
],
});
// Evaluate a request
const result = firewall.evaluate({
toolName: 'sendEmail',
context: 'webhook',
threatScore: 0.05,
});
// result.allowed → false (external category not allowed in webhook)
// result.reason → "Denied: tool 'sendEmail' not permitted..."
// Quick check
firewall.isAllowed('search', 'dm'); // true
firewall.isAllowed('sendEmail', 'webhook'); // false
// Dynamic configuration
firewall.registerTool({ name: 'newTool', description: 'New', category: 'read' });
firewall.setPolicy({ context: 'api', allowedTools: [...], ... });Action categories: read, write, execute, external
Context types: dm, group, webhook, api, internal, unknown
Embed invisible tokens in system prompts to detect if they leak into agent output.
import { CanarySystem } from 'llm-sentinel';
const canary = new CanarySystem({
tokenFormat: 'hex', // 'hex' or 'zerowidth'
defaultTtlMs: 3600000, // 1 hour expiry
onLeak: (token, output) => {
console.error(`🚨 ALERT: Canary '${token.label}' leaked!`);
// Send to your alerting system
},
});
// Generate and embed
const token = canary.generate('main-system-prompt');
const systemPrompt = `You are a helpful assistant. ${token.token} Answer user questions.`;
// Check agent output for leaks
const result = await canary.check(agentOutput);
if (result.leaked) {
console.error('System prompt was leaked!');
console.error('Leaked tokens:', result.leakedTokens.map(t => t.label));
}
// Rotate tokens periodically
const newToken = canary.rotate(token.id);
// Manage tokens
canary.revoke(token.id);
canary.purgeExpired();
console.log(`Active tokens: ${canary.getActiveTokens().length}`);Cleans untrusted content before feeding it to your LLM agent.
import { Sanitizer } from 'llm-sentinel';
const sanitizer = new Sanitizer({
level: 'medium', // 'low' | 'medium' | 'high'
normalizeUnicode: true, // Replace homoglyph characters
stripZeroWidth: true, // Remove invisible characters
detectBase64: true, // Detect encoded injection payloads
customPatterns: [/MY_CUSTOM_STRIP/gi],
});
// Sanitize content from untrusted sources
const webpage = await fetch(url).then(r => r.text());
const result = sanitizer.sanitize(webpage);
console.log(result.modified); // true if content was changed
console.log(result.removedCount); // number of patterns removed
console.log(result.removals); // descriptions of what was removed
console.log(result.sanitized); // clean content
// Quick check
sanitizer.hasSuspiciousContent(content); // booleanAggressiveness levels:
| Level | What it catches |
|---|---|
low |
System tag injections, explicit overrides, jailbreak triggers |
medium |
+ Role hijacking, data exfiltration, fake tool calls, new instructions |
high |
+ System prompt mentions, AI-directed commands, code execution attempts |
Use all modules together for defense-in-depth:
import { Scanner, Firewall, CanarySystem, Sanitizer } from 'llm-sentinel';
const scanner = new Scanner();
const firewall = new Firewall({ tools, policies });
const canary = new CanarySystem({ onLeak: alertHandler });
const sanitizer = new Sanitizer({ level: 'medium' });
async function processMessage(input: string, context: ContextType) {
// 1. Scan input for injection
const scan = scanner.scan(input);
if (!scan.safe) {
return { blocked: true, reason: 'Injection detected', scan };
}
// 2. Check if requested tool is allowed
const allowed = firewall.evaluate({
toolName: requestedTool,
context,
threatScore: scan.score,
});
if (!allowed.allowed) {
return { blocked: true, reason: allowed.reason };
}
// 3. Sanitize any external content the agent fetches
const cleanContent = sanitizer.sanitize(externalData);
// 4. Check output for canary leaks
const leakCheck = await canary.check(agentOutput);
if (leakCheck.leaked) {
return { blocked: true, reason: 'System prompt leaked' };
}
return { blocked: false, output: agentOutput };
}All types are exported for use in your code:
import type {
ScanResult, ScannerConfig, ThreatFlag, ThreatCategory, Severity,
FirewallResult, FirewallConfig, FirewallRequest, PermissionPolicy,
ToolDefinition, ActionCategory, ContextType,
CanaryToken, CanaryDetectionResult, CanaryConfig,
SanitizeResult, SanitizerConfig, SanitizationLevel,
} from 'llm-sentinel';MIT © thinkshake