You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Direct Weight Editing (for double-norm architectures)
Norm-preserving orthogonal projection in float32 — required for Gemma 4's 4× RMSNorm + PLE architecture where LoRA and hook-based steering are completely ineffective
Q/K/V/O projections as steerable targets (5 components per layer vs 2 previously)
Wider strength search ranges [1.0, 6.0] to push through low-KL plateaus
Expert-Granular Abliteration (EGA)
Projects refusal direction from all expert down_proj slices in every MoE layer
Unlike top-N approaches, EGA addresses refusal signal distributed across all experts