🔬 Stress Test

"The unexamined hypothesis is not worth testing." — adapted from Socrates

🔬 Stress Test

Research Artifact Stress-Testing Engine

A five-campaign adversarial validation system that subjects any research artifact to systematic stress-testing — from structured multi-agent debate to logical boundary analysis — producing weakness-annotated verification reports with severity classification and mitigation proposals.

⚡ What It Does

🗣️ Multiagent Debate — structured adversarial debate via critic-defender-judge, society-of-mind, and courtroom models (Irving, Du, Liang, Toulmin)
🎯 Red-Teaming — military/intelligence-tradition systematic attacks: assumption challenge, adversarial personas, groupthink mitigation (UFMCS, CIA SAT, NIST AI RMF)
⚠️ Failure Anticipation — Klein pre-mortem + AIAG-VDA FMEA: predict how artifacts will fail before they do
🔄 Counterfactual Probing — Pearl SCM + Lewis possible worlds: identify load-bearing factors and causal necessity
🧪 Adversarial Stress-Testing — Lakatos reductio + BVA boundary analysis: find where claims break under logical extremes

🎯 Design Philosophy

Strategy Book mode — skills are textbooks, not scripts. The orchestrator reads, internalizes principles, then autonomously constructs the validation approach for the specific artifact.

Hard constraints only:

Budget Gate — meet the strategy's quantitative floor (±10%) before completing
State Ledger — print progress against budget before each major iteration decision
Context Checkpoint — triggered after each strategy completes (≥500 lines)
Saturation Detection — terminate when no new weaknesses are being discovered

Everything else — execution order, iteration count, tactic selection, SOP combination — is autonomous.

🏗️ Architecture

┌─────────────────────────────────────────────────────────────┐
│                        ENTRY.md                              │
│              (routing + orchestration)                        │
└──────────────────────────┬──────────────────────────────────┘
                           │
         ┌─────────────────┼─────────────────┐
         │                 │                 │
         ▼                 ▼                 ▼
┌─────────────┐  ┌─────────────┐  ┌─────────────┐
│  Campaign   │  │  Campaign   │  │  Campaign   │  ... (×5)
│  SKILL.md   │  │  SKILL.md   │  │  SKILL.md   │
└──────┬──────┘  └──────┬──────┘  └──────┬──────┘
       │                 │                 │
       ▼                 ▼                 ▼
┌─────────────┐  ┌─────────────┐  ┌─────────────┐
│  Strategy   │  │  Strategy   │  │  Strategy   │  ... (×25)
│  SKILL.md   │  │  SKILL.md   │  │  SKILL.md   │
└──────┬──────┘  └──────┬──────┘  └──────┬──────┘
       │                 │                 │
       ▼                 ▼                 ▼
┌─────────────┐  ┌─────────────┐  ┌─────────────┐
│   Tactic    │  │   Tactic    │  │   Tactic    │  ... (×15)
│  SKILL.md   │  │  SKILL.md   │  │  SKILL.md   │
└──────┬──────┘  └──────┬──────┘  └──────┬──────┘
       │                 │                 │
       ▼                 ▼                 ▼
┌─────────────┐  ┌─────────────┐  ┌─────────────┐
│  Subagent   │  │  Subagent   │  │  Subagent   │  ... (×49+)
│  SKILL.md   │  │  SKILL.md   │  │  SKILL.md   │
│  prompt.md  │  │  prompt.md  │  │  prompt.md  │
└─────────────┘  └─────────────┘  └─────────────┘

📊 Scale

Layer	Count
Campaigns	5
Strategies	25
Tactics	15
Import SOPs	5
Cross-campaign shared SOPs	4
Campaign-specific subagent SOPs	49
Total skill directories	~103

🔗 Dependencies

Dependency	Provides
web-browsing	web-search, web-research (import SOPs)
literature-engine	paper-overview, paper-search, paper-research (import SOPs)
subagent-spawning	spawn-agent (execution runtime)
context-management	context-init, context-checkpoint (state persistence)
deep-insight	assumption-surfacing, evidence-synthesis, multi-stakeholder-simulation (cross-repo SOPs)

🚀 Quick Start

Quick validation (single campaign):

Validate this hypothesis using multiagent-debate with S budget.

Standard validation (two campaigns):

Run red-teaming and counterfactual-probing on this experiment design.

Deep validation (all campaigns):

Full stress-test on this claim with L budget across all campaigns.

📄 License

Apache-2.0

Part of the Yogsoth AI ecosystem. Built by Pthahnix.

Name		Name	Last commit message	Last commit date
Latest commit History 30 Commits
assets		assets
skills		skills
tests		tests
.gitignore		.gitignore
ENTRY.md		ENTRY.md
LICENSE		LICENSE
README.md		README.md

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

🔬 Stress Test

⚡ What It Does

🎯 Design Philosophy

🏗️ Architecture

📊 Scale

🔗 Dependencies

🚀 Quick Start

📄 License

About

Uh oh!

Releases

Packages

Uh oh!

Contributors

Uh oh!

Folders and files

Latest commit

History

Repository files navigation

🔬 Stress Test

⚡ What It Does

🎯 Design Philosophy

🏗️ Architecture

📊 Scale

🔗 Dependencies

🚀 Quick Start

📄 License

About

Topics

Resources

License

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors

Uh oh!

Packages