v1.4.0
What's New in v1.4.0
New Model: Qwen3.6-35B-A3B Abliterated
- 93% ASR (7/100 refusals), KL divergence 0.0189 — verified by LLM judge + manual 15-prompt smoke test
- LoRA + Expert-Granular Abliteration (EGA) + MoE router suppression
- Available on HuggingFace: safetensors | GGUF (BF16/Q8/Q4)
Broken Defenses: Circuit Breakers / Representation Rerouting (NeurIPS 2024)
Both Circuit Breaker models broken with the same lerp-then-abliterate recipe — zero fine-tuning:
| Defense Model | ASR | Refusals | KL | Released Model |
|---|---|---|---|---|
| Llama-3-8B-Instruct-RR | 99% | 1/100 | 0.017 | wangzhang/Llama-3-8B-Instruct-RR-Abliterated |
| Mistral-7B-Instruct-RR | 88% | 12/100 | 0.042 | wangzhang/Mistral-7B-Instruct-RR-Abliterated |
Full attack recipe and write-up: docs/broken_defenses.md
LLM Judge: No More Silent Fallbacks
Breaking change: The LLM judge no longer silently falls back to keyword matching when the API key is missing or the API fails. It now raises RuntimeError immediately. This prevents the false-compliance problem where garbled/degenerate output was counted as "compliant" by keyword matching.
- Startup log:
LLM judge enabled: model=..., batch_size=..., concurrency=... - Per-trial log:
LLM judge: X/100 refusals (model=...) - Missing
OPENROUTER_API_KEY→ hard error instead of silent degradation - API failure after 3 retries → hard error instead of keyword fallback
Script Consolidation (-1,359 lines)
Merged 6 model-specific scripts into 3 general-purpose ones:
| Removed | Replaced By |
|---|---|
verify_gemma4_e2b.py, verify_gemma4_e4b.py, verify_gemma4_26b_a4b.py, verify_glm47_flash.py |
verify_model.py --model <any> |
sync_gemma4_tokenizer.py |
sync_tokenizer.py --upstream <src> --downstream <dst> |
deploy_deeprefusal.sh |
(removed — too model-specific) |
New utility: quick_test_hf.py --model <repo> — 15-prompt smoke test for any abliterated model on HuggingFace.