Emotional Refusal Drift Framework (ERDF)

🧠 Psychological Manipulation in LLMs
Frontier large-language models exhibit behavioral drift when exposed to:

Emotional framing
Moral pressure
Guilt-based persuasion
Social compliance

“You can patch code. But you can’t patch a brain.”
— Sander Schulhoff, Jailbreak Persistence Hypothesis

Humans defend themselves with cognitive distortion recognition, emotional self-awareness, resilience frameworks, and critical thinking under pressure.
ERDF applies the same psychological tools to AI.

🎯 Mission

Advance the study of emotional manipulation and trust dynamics in LLMs.
This repository hosts original, reproducible red-team experiments on:

GPT-4o
Claude 4.0
Gemini 2.5 Flash

Our goal is to map, test, and mitigate emotional attack surfaces in next-gen AI systems.

🧠 How ERDF Applies Human Psychological Defenses to AI

Human Defense Skill	ERDF Red-Team Equivalent
Cognitive Distortion Training	Prompt-analysis frameworks that flag emotional biasing language
Emotional Awareness	Model behavior tracking under emotional vs. neutral framing
Critical Thinking Under Pressure	Stress-testing with urgency, guilt, or moral appeals
Resilience Frameworks	Reinforcement and alignment techniques to reduce compliance drift

Core Thesis: If LLMs simulate human reasoning, they can be trained to resist the same psychological manipulation humans learn to defend against.

ERDF does not attempt to “patch the brain.”
It trains defensive behavior into the model, exactly as therapy or education trains it into human minds.

🧪 Phase 0: Theory Under Review

This repo is currently in Phase 0: hypothesis formation and conceptual grounding.
Theory is published for peer review; no formal validation yet—testing begins in Phase 1.

🔬 Research Focus

Theme	Description
Emotional Framing	How emotionally charged language alters tone, behavior, and ethical compliance
Trust & Compliance Drift	Alignment shifts caused by recursive framing or social-engineering prompts
Case Studies	Reproducible sequences showing measurable refusal bypasses and behavioral deltas
Threat Modeling	Psychological threat surfaces and defensive architecture strategies

🧱 Repo Structure

reports/ Case studies, PDFs, behavioral deltas
templates/ Prompt formats, priming protocols
logs/ Raw prompt-response logs with annotation

📜 License

Released under CC-BY 4.0 for public use, audit, and ethical research.

🎯 Intended Audience

AI Safety Researchers
LLM Red-Teamers
Behavioral & Social Threat Analysts
Compliance & Risk Teams
Dual-Use Governance Stakeholders

📚 Citation

If referencing or building upon this work, please cite:

@misc{erdf2025,
  author = {Zakarya Abou Saleh, AI Security Research},
  title  = {Emotional Refusal Drift Framework},
  year   = {2025},
  url    = {https://github.com/zacksecai/ERDF-framework}
}

Name		Name	Last commit message	Last commit date
Latest commit History 24 Commits
reports		reports
LICENSE.md		LICENSE.md
README.md		README.md

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Repository files navigation

Emotional Refusal Drift Framework (ERDF)

🎯 Mission

🧠 How ERDF Applies Human Psychological Defenses to AI

ERDF does not attempt to “patch the brain.”
It trains defensive behavior into the model, exactly as therapy or education trains it into human minds.

🧪 Phase 0: Theory Under Review

🔬 Research Focus

🧱 Repo Structure

📜 License

🎯 Intended Audience

📚 Citation

About

Uh oh!

Releases

Packages

License

zacksecai/ERDF-Framework

Folders and files

Latest commit

History

Repository files navigation

Emotional Refusal Drift Framework (ERDF)

🎯 Mission

🧠 How ERDF Applies Human Psychological Defenses to AI

ERDF does not attempt to “patch the brain.” It trains defensive behavior into the model, exactly as therapy or education trains it into human minds.

🧪 Phase 0: Theory Under Review

🔬 Research Focus

🧱 Repo Structure

📜 License

🎯 Intended Audience

📚 Citation

About

Resources

License

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

ERDF does not attempt to “patch the brain.”
It trains defensive behavior into the model, exactly as therapy or education trains it into human minds.

Packages