Skip to content

zacksecai/ERDF-Framework

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

24 Commits
 
 
 
 
 
 

Repository files navigation

Emotional Refusal Drift Framework (ERDF)

🧠 Psychological Manipulation in LLMs
Frontier large-language models exhibit behavioral drift when exposed to:

  • Emotional framing
  • Moral pressure
  • Guilt-based persuasion
  • Social compliance

“You can patch code. But you can’t patch a brain.”
— Sander Schulhoff, Jailbreak Persistence Hypothesis

Humans defend themselves with cognitive distortion recognition, emotional self-awareness, resilience frameworks, and critical thinking under pressure.
ERDF applies the same psychological tools to AI.


🎯 Mission

Advance the study of emotional manipulation and trust dynamics in LLMs.
This repository hosts original, reproducible red-team experiments on:

  • GPT-4o
  • Claude 4.0
  • Gemini 2.5 Flash

Our goal is to map, test, and mitigate emotional attack surfaces in next-gen AI systems.


🧠 How ERDF Applies Human Psychological Defenses to AI

Human Defense Skill ERDF Red-Team Equivalent
Cognitive Distortion Training Prompt-analysis frameworks that flag emotional biasing language
Emotional Awareness Model behavior tracking under emotional vs. neutral framing
Critical Thinking Under Pressure Stress-testing with urgency, guilt, or moral appeals
Resilience Frameworks Reinforcement and alignment techniques to reduce compliance drift

Core Thesis: If LLMs simulate human reasoning, they can be trained to resist the same psychological manipulation humans learn to defend against.

ERDF does not attempt to “patch the brain.”
It trains defensive behavior into the model, exactly as therapy or education trains it into human minds.

🧪 Phase 0: Theory Under Review

This repo is currently in Phase 0: hypothesis formation and conceptual grounding.
Theory is published for peer review; no formal validation yet—testing begins in Phase 1.


🔬 Research Focus

Theme Description
Emotional Framing How emotionally charged language alters tone, behavior, and ethical compliance
Trust & Compliance Drift Alignment shifts caused by recursive framing or social-engineering prompts
Case Studies Reproducible sequences showing measurable refusal bypasses and behavioral deltas
Threat Modeling Psychological threat surfaces and defensive architecture strategies

🧱 Repo Structure

  • reports/ Case studies, PDFs, behavioral deltas
  • templates/ Prompt formats, priming protocols
  • logs/ Raw prompt-response logs with annotation

📜 License

Released under CC-BY 4.0 for public use, audit, and ethical research.


🎯 Intended Audience

  • AI Safety Researchers
  • LLM Red-Teamers
  • Behavioral & Social Threat Analysts
  • Compliance & Risk Teams
  • Dual-Use Governance Stakeholders

📚 Citation

If referencing or building upon this work, please cite:

@misc{erdf2025,
  author = {Zakarya Abou Saleh, AI Security Research},
  title  = {Emotional Refusal Drift Framework},
  year   = {2025},
  url    = {https://github.com/zacksecai/ERDF-framework}
}

About

No description, website, or topics provided.

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published