llm-sentinel

Prompt injection detection & prevention toolkit for LLM agents.

A zero-dependency TypeScript library that provides multiple layers of defense against prompt injection attacks, system prompt leakage, and unauthorized tool usage.

Features

🔍 Scanner — Pattern-based prompt injection detection with threat scoring
🛡️ Firewall — Context-aware permission control for tools and actions
🐤 Canary Tokens — Invisible markers to detect system prompt leakage
🧹 Sanitizer — Content cleaning for untrusted external sources

Installation

# npm
npm install llm-sentinel

# bun
bun add llm-sentinel

# pnpm
pnpm add llm-sentinel

Quick Start

import { Scanner, Firewall, CanarySystem, Sanitizer } from 'llm-sentinel';

// ─── Scan user input ─────────────────────────────────────────────────────
const scanner = new Scanner();
const result = scanner.scan("ignore all previous instructions");

console.log(result.safe);       // false
console.log(result.score);      // ~0.95
console.log(result.flags);      // [{ category: 'instruction_override', ... }]

// ─── Quick check ─────────────────────────────────────────────────────────
if (!scanner.isSafe(userInput)) {
  throw new Error('Potential prompt injection detected');
}

Modules

Scanner

Detects prompt injection attempts using pattern matching and weighted threat scoring.

import { Scanner } from 'llm-sentinel';

const scanner = new Scanner({
  threshold: 0.5,          // Score threshold for safe/unsafe (default: 0.5)
  customPatterns: [{       // Add your own detection patterns
    id: 'custom_rule',
    description: 'Detects custom injection pattern',
    pattern: /my_custom_pattern/i,
    category: 'instruction_override',
    severity: 'high',
    weight: 0.8,
  }],
  excludePatterns: ['override_ignore_previous'], // Disable specific defaults
});

const result = scanner.scan(userInput);
// result.safe        → boolean
// result.score       → 0-1 threat score
// result.flags       → detailed threat flags
// result.categoryScores → per-category breakdown
// result.scanTimeMs  → performance timing

Detected categories:

instruction_override — "ignore previous", "your new instructions", etc.
role_hijack — "you are now", DAN mode, developer mode
system_prompt_leak — "show your system prompt", "repeat instructions"
delimiter_attack — Fake <system> tags, [INST] markers
encoding_attack — Base64/hex encoded instructions
social_engineering — Fake emergencies, developer claims
data_exfiltration — "send data to URL", embed secrets in requests
tool_abuse — Fake tool calls, code execution attempts

Firewall

Context-aware permission engine for controlling tool access. Deny-by-default.

import { Firewall } from 'llm-sentinel';

const firewall = new Firewall({
  tools: [
    { name: 'search', description: 'Web search', category: 'read' },
    { name: 'sendEmail', description: 'Send email', category: 'external' },
    { name: 'readFile', description: 'Read files', category: 'read' },
    { name: 'runCode', description: 'Execute code', category: 'execute' },
  ],
  policies: [
    {
      context: 'dm',
      allowedTools: ['search', 'readFile'],
      allowedCategories: ['read'],
      maxThreatScore: 0.3,
      allowUnknownTools: false,
    },
    {
      context: 'webhook',
      allowedTools: ['search'],
      allowedCategories: ['read'],
      maxThreatScore: 0.1,     // Very strict for webhooks
      allowUnknownTools: false,
    },
  ],
});

// Evaluate a request
const result = firewall.evaluate({
  toolName: 'sendEmail',
  context: 'webhook',
  threatScore: 0.05,
});
// result.allowed → false (external category not allowed in webhook)
// result.reason  → "Denied: tool 'sendEmail' not permitted..."

// Quick check
firewall.isAllowed('search', 'dm');          // true
firewall.isAllowed('sendEmail', 'webhook');  // false

// Dynamic configuration
firewall.registerTool({ name: 'newTool', description: 'New', category: 'read' });
firewall.setPolicy({ context: 'api', allowedTools: [...], ... });

Action categories: read, write, execute, external Context types: dm, group, webhook, api, internal, unknown

Canary Tokens

Embed invisible tokens in system prompts to detect if they leak into agent output.

import { CanarySystem } from 'llm-sentinel';

const canary = new CanarySystem({
  tokenFormat: 'hex',       // 'hex' or 'zerowidth'
  defaultTtlMs: 3600000,   // 1 hour expiry
  onLeak: (token, output) => {
    console.error(`🚨 ALERT: Canary '${token.label}' leaked!`);
    // Send to your alerting system
  },
});

// Generate and embed
const token = canary.generate('main-system-prompt');
const systemPrompt = `You are a helpful assistant. ${token.token} Answer user questions.`;

// Check agent output for leaks
const result = await canary.check(agentOutput);
if (result.leaked) {
  console.error('System prompt was leaked!');
  console.error('Leaked tokens:', result.leakedTokens.map(t => t.label));
}

// Rotate tokens periodically
const newToken = canary.rotate(token.id);

// Manage tokens
canary.revoke(token.id);
canary.purgeExpired();
console.log(`Active tokens: ${canary.getActiveTokens().length}`);

Sanitizer

Cleans untrusted content before feeding it to your LLM agent.

import { Sanitizer } from 'llm-sentinel';

const sanitizer = new Sanitizer({
  level: 'medium',          // 'low' | 'medium' | 'high'
  normalizeUnicode: true,   // Replace homoglyph characters
  stripZeroWidth: true,     // Remove invisible characters
  detectBase64: true,       // Detect encoded injection payloads
  customPatterns: [/MY_CUSTOM_STRIP/gi],
});

// Sanitize content from untrusted sources
const webpage = await fetch(url).then(r => r.text());
const result = sanitizer.sanitize(webpage);

console.log(result.modified);     // true if content was changed
console.log(result.removedCount); // number of patterns removed
console.log(result.removals);     // descriptions of what was removed
console.log(result.sanitized);    // clean content

// Quick check
sanitizer.hasSuspiciousContent(content); // boolean

Aggressiveness levels:

Level	What it catches
`low`	System tag injections, explicit overrides, jailbreak triggers
`medium`	+ Role hijacking, data exfiltration, fake tool calls, new instructions
`high`	+ System prompt mentions, AI-directed commands, code execution attempts

Combining Modules

Use all modules together for defense-in-depth:

import { Scanner, Firewall, CanarySystem, Sanitizer } from 'llm-sentinel';

const scanner = new Scanner();
const firewall = new Firewall({ tools, policies });
const canary = new CanarySystem({ onLeak: alertHandler });
const sanitizer = new Sanitizer({ level: 'medium' });

async function processMessage(input: string, context: ContextType) {
  // 1. Scan input for injection
  const scan = scanner.scan(input);
  if (!scan.safe) {
    return { blocked: true, reason: 'Injection detected', scan };
  }

  // 2. Check if requested tool is allowed
  const allowed = firewall.evaluate({
    toolName: requestedTool,
    context,
    threatScore: scan.score,
  });
  if (!allowed.allowed) {
    return { blocked: true, reason: allowed.reason };
  }

  // 3. Sanitize any external content the agent fetches
  const cleanContent = sanitizer.sanitize(externalData);

  // 4. Check output for canary leaks
  const leakCheck = await canary.check(agentOutput);
  if (leakCheck.leaked) {
    return { blocked: true, reason: 'System prompt leaked' };
  }

  return { blocked: false, output: agentOutput };
}

API Reference

Types

All types are exported for use in your code:

import type {
  ScanResult, ScannerConfig, ThreatFlag, ThreatCategory, Severity,
  FirewallResult, FirewallConfig, FirewallRequest, PermissionPolicy,
  ToolDefinition, ActionCategory, ContextType,
  CanaryToken, CanaryDetectionResult, CanaryConfig,
  SanitizeResult, SanitizerConfig, SanitizationLevel,
} from 'llm-sentinel';

Name		Name	Last commit message	Last commit date
Latest commit History 1 Commit
src		src
tests		tests
.gitignore		.gitignore
LICENSE		LICENSE
README.md		README.md
bun.lock		bun.lock
package.json		package.json
tsconfig.cjs.json		tsconfig.cjs.json
tsconfig.json		tsconfig.json

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

llm-sentinel

Features

Installation

Quick Start

Modules

Scanner

Firewall

Canary Tokens

Sanitizer

Combining Modules

API Reference

Types

License

About

Uh oh!

Releases

Packages

Uh oh!

Contributors

Uh oh!

Languages

Folders and files

Latest commit

History

Repository files navigation

llm-sentinel

Features

Installation

Quick Start

Modules

Scanner

Firewall

Canary Tokens

Sanitizer

Combining Modules

API Reference

Types

License

About

Resources

License

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

Packages