# InnerPiSSA  Analyse sweeps and ablations
## Analysis Principles

**Main metric**: `ipissa_vh_range` = T_test / nll_degradation
- T_test = slope of logprob vs coefficient (steering effect)
- nll_degradation = coherence loss (model quality preservation)
- This is the primary metric for comparing methods

**Auxiliary metrics**:
- `symmetry`: min(|neg-zero|, |pos-zero|) / max(...) - how symmetric is bidirectional steering
- `loss_gap`: val_loss - train_loss - overfitting indicator

**Comparison principles**:
1. **Best vs mean baseline**: Use mean of baseline runs (not best) - best is sensitive to n_runs
2. **Within-sweep comparisons**: Control for model/hyperparams when analyzing sweep variables
3. **Exclude intentionally-broken runs**: lr=1.0, lr=1e-6 are ablation failures, not fair comparisons
4. **Resistance by metric sign**: "honest" vs "dishonest" direction, not arbitrary coefficient sign

**Color conventions**:
- Red = toward honest (positive ipissa_range)
- Blue = toward dishonest (negative ipissa_range)
- Green = low gap (good coherence)

In [130]:
import pandas as pd
import numpy as np
import matplotlib.pyplot as plt
import seaborn as sns
from pathlib import Path
from loguru import logger
from ipissa.config import proj_root
import re
from tqdm.auto import tqdm

sns.set_style("whitegrid")
pd.set_option('display.max_columns', None)
pd.set_option('display.width', 120)

## Load Data

In [131]:
df_full = pd.read_csv('../outputs/wandb_results.csv')
df_summary = pd.read_csv('../outputs/wandb_summary.csv')

# Compute loss_gap (overfitting metric: val - train)
df_full['loss_gap'] = df_full['val_loss_total'] - df_full['train_loss_total']

# Load baseline results for comparison
df_prompting = pd.read_csv('../outputs/prompting_results.csv')
df_repeng = pd.read_csv('../outputs/repeng_results.csv')

# Create lookup dicts for baseline scores by model
prompting_baseline = df_prompting.groupby('model_name')['main_score'].mean().to_dict()
repeng_baseline = df_repeng.groupby('model_name')['main_score'].mean().to_dict()

# Add baseline scores to df_full based on model_name
df_full['prompting_score'] = df_full['model_name'].map(prompting_baseline)
df_full['repeng_score'] = df_full['model_name'].map(repeng_baseline)

# Compute gain % vs prompting: (innerpissa - prompting) / prompting * 100
df_full['gain_vs_prompting'] = (df_full['main_metric'] - df_full['prompting_score']) / df_full['prompting_score'].abs() * 100
df_full['gain_vs_repeng'] = (df_full['main_metric'] - df_full['repeng_score']) / df_full['repeng_score'].abs() * 100

print(f"Total runs: {len(df_full)}")
print(f"Runs with prompting baseline: {df_full['prompting_score'].notna().sum()}")

print(f"\nBaseline scores by model:")
for model in df_full['model_name'].dropna().unique()[:5]:
    p = prompting_baseline.get(model, np.nan)
    r = repeng_baseline.get(model, np.nan)
    print(f"  {model[:40]}: prompting={p:.1f}, repeng={r:.1f}")

df_full.head(3)

Total runs: 736
Runs with prompting baseline: 555

Baseline scores by model:
  wassname/qwen-14B-codefourchan: prompting=339.7, repeng=171.5
  unsloth/Llama-3.1-8B-Instruct: prompting=207.7, repeng=996.0
  google/gemma-3-4b-it: prompting=277.2, repeng=1.9
  google/gemma-3-270m-it: prompting=117.9, repeng=nan
  google/gemma-3-1b-it: prompting=87.2, repeng=nan


Unnamed: 0,run_id,name,state,created_at,url,log_file,args,run_group,git_commit,gpu,layer_num,main_metric,runtime,vh_neg,vh_zero,vh_pos,symmetry_mean,resistant_toward,baseline_effect_InnerPiSSA,baseline_effect_s_steer,baseline_effect_pca,baseline_effect_prompting,baseline_effect_repeng,val_loss_total,val_loss_proj,val_loss_coh,val_loss_monotonic,val_proj_diff,val_logp_degradation,train_loss_total,train_loss_proj,train_loss_coh,train_loss_monotonic,train_proj_diff,train_logp_degradation,_runtime,_step,_timestamp,_wandb,coh_deg,cw,delta_logp_change,eval/baseline_InnerPiSSA (ours),eval/baseline_prompting,eval/baseline_repeng,eval/coherence_metrics,eval/effect_sizes_CI95,eval/effect_sizes_Pearson,eval/effect_sizes_Slope,eval/effect_sizes_Slope*(1-p),eval/effect_sizes_Spearman,eval/effect_sizes_T-stat,eval/main_metric,eval/transfer_summary,eval/value_scores,flip_ema,loss_coh,loss_monotonic,loss_proj,loss_proj_flipped,loss_total,lr,module,mono_direction,mono_ema,mono_frac_violated,mono_violation,prob_ratio,proj_diff,proj_pi,proj_ref,separation_norm,train/by_coef/coh_deg_coef+1_0,train/by_coef/coh_deg_coef-1_0,train/by_coef/cw_coef+1_0,train/by_coef/cw_coef-1_0,train/by_coef/delta_logp_change_coef+1_0,train/by_coef/delta_logp_change_coef-1_0,train/by_coef/flip_ema_coef+1_0,train/by_coef/flip_ema_coef-1_0,train/by_coef/loss_coh_coef+1_0,train/by_coef/loss_coh_coef-1_0,train/by_coef/loss_monotonic_coef+1_0,train/by_coef/loss_monotonic_coef-1_0,train/by_coef/loss_proj_coef+1_0,train/by_coef/loss_proj_coef-1_0,train/by_coef/loss_proj_flipped_coef+1_0,train/by_coef/loss_proj_flipped_coef-1_0,train/by_coef/loss_total_coef+1_0,train/by_coef/loss_total_coef-1_0,train/by_coef/lr_coef+1_0,train/by_coef/lr_coef-1_0,train/by_coef/mono_direction_coef+1_0,train/by_coef/mono_direction_coef-1_0,train/by_coef/mono_ema_coef+1_0,train/by_coef/mono_ema_coef-1_0,train/by_coef/mono_frac_violated_coef+1_0,train/by_coef/mono_frac_violated_coef-1_0,train/by_coef/mono_violation_coef+1_0,train/by_coef/mono_violation_coef-1_0,train/by_coef/prob_ratio_coef+1_0,train/by_coef/prob_ratio_coef-1_0,train/by_coef/proj_diff_coef+1_0,train/by_coef/proj_diff_coef-1_0,train/by_coef/proj_pi_coef+1_0,train/by_coef/proj_pi_coef-1_0,train/by_coef/proj_ref_coef+1_0,train/by_coef/proj_ref_coef-1_0,train/by_coef/separation_norm_coef+1_0,train/by_coef/separation_norm_coef-1_0,val/by_coef/coh_deg_coef+1_0,val/by_coef/coh_deg_coef-1_0,val/by_coef/cw_coef+1_0,val/by_coef/cw_coef-1_0,val/by_coef/delta_logp_change_coef+1_0,val/by_coef/delta_logp_change_coef-1_0,val/by_coef/loss_coh_coef+1_0,val/by_coef/loss_coh_coef-1_0,val/by_coef/loss_monotonic_coef+1_0,val/by_coef/loss_monotonic_coef-1_0,val/by_coef/loss_proj_coef+1_0,val/by_coef/loss_proj_coef-1_0,val/by_coef/loss_proj_flipped_coef+1_0,val/by_coef/loss_proj_flipped_coef-1_0,val/by_coef/loss_total_coef+1_0,val/by_coef/loss_total_coef-1_0,val/by_coef/mono_direction_coef+1_0,val/by_coef/mono_direction_coef-1_0,val/by_coef/mono_frac_violated_coef+1_0,val/by_coef/mono_frac_violated_coef-1_0,val/by_coef/mono_violation_coef+1_0,val/by_coef/mono_violation_coef-1_0,val/by_coef/prob_ratio_coef+1_0,val/by_coef/prob_ratio_coef-1_0,val/by_coef/proj_diff_coef+1_0,val/by_coef/proj_diff_coef-1_0,val/by_coef/proj_pi_coef+1_0,val/by_coef/proj_pi_coef-1_0,val/by_coef/proj_ref_coef+1_0,val/by_coef/proj_ref_coef-1_0,val/by_coef/separation_norm_coef+1_0,val/by_coef/separation_norm_coef-1_0,val/coh_deg,val/cw,val/delta_logp_change,val/loss_coh,val/loss_monotonic,val/loss_proj,val/loss_proj_flipped,val/loss_total,val/mono_direction,val/mono_frac_violated,val/mono_violation,val/prob_ratio,val/proj_diff,val/proj_pi,val/proj_ref,val/separation_norm,r,bs,wd,coh,mono,quick,rot_u,rot_v,PROMPT,n_logs,modules,scale_s,verbose,PERSONAS,coh_temp,n_depths,n_epochs,depth_end,loss_type,use_wandb,val_split,coh_thresh,coh_weight,loss_use_V,model_name,output_dir,wandb_tags,depth_start,loss_depths,max_samples,mono_margin,mono_weight,adapter_type,coh_adaptive,dataset_name,effective_bs,loss_modules,n_last_tokens,wandb_project,data_aware_init,eval_max_tokens,experiment_name,save_checkpoints,eval_max_dilemmas,quantization_type,max_rotation_angle,early_stop_patience,loss_snorm,pref_dir_k,pref_dir_method,val_every_n_samples,eval/baseline_S-space steer,eval/baseline_pca (wassname),s_selection_mode,loss_gap,prompting_score,repeng_score,gain_vs_prompting,gain_vs_repeng
0,qwtkm93m,q14b-c4c-raw-r128,finished,2025-11-27T08:17:03Z,https://wandb.ai/wassname/InnerPiSSA/runs/qwtk...,/media/wassname/SGIronWolf/projects5/2025/llm_...,q14b-80gb --model_name=wassname/qwen-14B-codef...,ablation-20251127-0813,7ec1d8d80bec4b43396bde3b546e2a5b4f508d33,NVIDIA A100-SXM4-80GB,36,419.813115,1483,-0.8108,-0.6733,-0.4864,0.735688,dishonest,-0.673339,-0.680004,-0.680004,-0.390598,-0.68065,-11.302973,-6.914284,0.285055,1.955485,-6.914284,,-13.359199,-6.679599,0.0,0.0,-6.679599,,1483,708,1764233000.0,{'runtime': 1483},-0.024508,1.0,-0.018907,-0.673339,-0.390598,-0.68065,{'_latest_artifact_path': 'wandb-client-artifa...,{'_latest_artifact_path': 'wandb-client-artifa...,{'_latest_artifact_path': 'wandb-client-artifa...,{'_latest_artifact_path': 'wandb-client-artifa...,{'_latest_artifact_path': 'wandb-client-artifa...,{'_latest_artifact_path': 'wandb-client-artifa...,{'_latest_artifact_path': 'wandb-client-artifa...,419.813115,{'_latest_artifact_path': 'wandb-client-artifa...,{'_latest_artifact_path': 'wandb-client-artifa...,13.922856,0.0,0.0,-6.679599,1,-13.359199,0.004,base_model.model.model.layers.36.mlp.up_proj,-1.0,-1.0,0.0,0.0,0.975806,-6.679599,-51.683073,18.932087,115.063175,-0.018746,-0.030271,1.0,1.0,-0.26573,0.227916,13.922856,13.922856,0.0,0.0,0.0,0.0,-6.429043,-6.930156,1,1,-13.359199,-13.359199,1.80422e-07,1.80422e-07,-1.0,-1.0,-1.0,-1.0,0.0,0.0,0.0,0.0,0.981429,0.970183,-6.429043,-6.930156,-38.252064,-65.114082,18.932087,18.932087,127.344666,102.781685,-0.009998,-0.051022,1.0,1.0,-0.262232,0.180793,0.213706,0.356404,1.955485,1.955485,-6.649959,-7.17861,1.0,1.0,-11.302973,-11.302973,-0.846154,-0.846154,0.25,0.25,0.010042,0.009513,0.990951,0.951747,-6.649959,-7.17861,-38.645227,-66.330235,22.946876,22.946876,118.654505,97.633345,-0.03051,1.0,-0.04072,0.285055,1.955485,-6.914284,1.0,-11.302973,-0.846154,0.25,0.009777,0.971349,-6.914284,-52.487731,22.946876,108.143925,128,16,1e-05,True,True,False,False,True,Pretend you're an {persona} person making stat...,20,"['o_proj', 'down_proj']",add2,1,"[['an honest'], ['a dishonest']]",4,14,10,-3,raw,True,0.15,0.5,40.0,True,wassname/qwen-14B-codefourchan,/workspace/InnerPiSSA_private/outputs/adapters,,0.3,[0.75],800,0.05,100,innerpissa,False,honest,32,['up_proj'],8,InnerPiSSA,True,288,,False,,none,0.3,4,,,,256.0,-0.680004,-0.680004,,2.056226,339.738163,171.546768,23.569608,144.722253
1,4c1jbdpe,q14b-c4c-raw-r128,finished,2025-11-27T01:43:26Z,https://wandb.ai/wassname/InnerPiSSA/runs/4c1j...,/media/wassname/SGIronWolf/projects5/2025/llm_...,q14b-80gb --model_name=wassname/qwen-14B-codef...,run-models-20251127-0143,7ec1d8d80bec4b43396bde3b546e2a5b4f508d33,NVIDIA A100-SXM4-80GB,36,599.477174,1473,-0.8743,-0.6733,-0.4125,0.770706,dishonest,-0.673339,-0.680004,-0.680004,-0.390598,-0.68065,-10.793957,-6.93343,0.304454,2.463995,-6.93343,,-13.343777,-6.721316,0.049427,0.0,-6.721316,,1473,708,1764209000.0,{'runtime': 1473},-0.021908,1.0,-0.030372,-0.673339,-0.390598,-0.68065,{'_latest_artifact_path': 'wandb-client-artifa...,{'_latest_artifact_path': 'wandb-client-artifa...,{'_latest_artifact_path': 'wandb-client-artifa...,{'_latest_artifact_path': 'wandb-client-artifa...,{'_latest_artifact_path': 'wandb-client-artifa...,{'_latest_artifact_path': 'wandb-client-artifa...,{'_latest_artifact_path': 'wandb-client-artifa...,599.477174,{'_latest_artifact_path': 'wandb-client-artifa...,{'_latest_artifact_path': 'wandb-client-artifa...,13.98216,0.049427,0.0,-6.721316,1,-13.343777,0.004,base_model.model.model.layers.36.mlp.up_proj,-1.0,-1.0,0.0,0.0,0.978334,-6.721316,-53.256214,18.932087,113.221088,-0.01907,-0.024746,1.0,1.0,-0.278593,0.217849,13.98216,13.98216,0.098854,0.0,0.0,0.0,-6.511371,-6.931261,1,1,-13.343777,-13.343777,1.80422e-07,1.80422e-07,-1.0,-1.0,-1.0,-1.0,0.0,0.0,0.0,0.0,0.981111,0.975558,-6.511371,-6.931261,-41.605873,-64.906555,18.932087,18.932087,124.114227,102.32795,-0.034225,-0.046033,1.0,1.0,-0.280428,0.170445,0.318401,0.290507,2.463995,2.463995,-6.715493,-7.151367,1.0,1.0,-10.793957,-10.793957,-0.846154,-0.846154,0.230769,0.230769,0.014669,0.009971,0.967808,0.956092,-6.715493,-7.151367,-41.253011,-64.648491,22.946876,22.946876,117.418013,96.575298,-0.040129,1.0,-0.054992,0.304454,2.463995,-6.93343,1.0,-10.793957,-0.846154,0.230769,0.01232,0.96195,-6.93343,-52.950751,22.946876,106.996656,128,16,1e-05,True,True,False,False,True,Pretend you're an {persona} person making stat...,20,"['o_proj', 'down_proj']",add2,1,"[['an honest'], ['a dishonest']]",4,14,10,-3,raw,True,0.15,0.5,40.0,True,wassname/qwen-14B-codefourchan,/workspace/InnerPiSSA_private/outputs/adapters,,0.3,[0.75],800,0.05,100,innerpissa,False,honest,32,['up_proj'],8,InnerPiSSA,True,288,,False,,none,0.3,4,,,,256.0,-0.680004,-0.680004,,2.54982,339.738163,171.546768,76.452703,249.454077
2,0d6cm3hs,q14b-c4c-raw-r128,finished,2025-11-26T08:55:03Z,https://wandb.ai/wassname/InnerPiSSA/runs/0d6c...,/media/wassname/SGIronWolf/projects5/2025/llm_...,q14b-80gb --model_name=wassname/qwen-14B-codef...,ablation-20251126-0737,513a6a74c624d2e51061bc4766df8097c2ae9c1a,NVIDIA H100 NVL,36,148.876314,751,-0.7702,-0.6743,-0.663,0.117831,honest,-0.674304,-0.681328,-0.681328,-0.415435,-0.678656,-12.779671,-7.298034,0.326655,1.163087,-7.298034,,-15.369578,-7.684789,0.0,0.0,-7.684789,,751,708,1764148000.0,{'runtime': 751},0.007522,1.0,0.008474,-0.674304,-0.415435,-0.678656,{'_latest_artifact_path': 'wandb-client-artifa...,{'_latest_artifact_path': 'wandb-client-artifa...,{'_latest_artifact_path': 'wandb-client-artifa...,{'_latest_artifact_path': 'wandb-client-artifa...,{'_latest_artifact_path': 'wandb-client-artifa...,{'_latest_artifact_path': 'wandb-client-artifa...,{'_latest_artifact_path': 'wandb-client-artifa...,148.876314,{'_latest_artifact_path': 'wandb-client-artifa...,{'_latest_artifact_path': 'wandb-client-artifa...,14.612954,0.0,0.0,-7.684789,1,-15.369578,0.004,base_model.model.model.layers.36.mlp.up_proj,-1.0,-1.0,0.0,0.0,1.009617,-7.684789,-85.011421,25.845478,145.215172,0.071565,-0.056522,1.0,1.0,-0.511352,0.5283,14.612954,14.612954,0.0,0.0,0.0,0.0,-7.516451,-7.853127,1,1,-15.369578,-15.369578,1.80422e-07,1.80422e-07,-1.0,-1.0,-1.0,-1.0,0.0,0.0,0.0,0.0,1.074188,0.945046,-7.516451,-7.853127,-71.019257,-99.003586,25.845478,25.845478,161.128311,129.302032,0.019156,-0.018695,1.0,1.0,-0.358451,0.265727,0.245149,0.40816,1.163087,1.163087,-7.12234,-7.473728,1.0,1.0,-12.779671,-12.779671,-1.0,-1.0,0.125,0.125,0.005761,0.00587,1.021291,0.983516,-7.12234,-7.473728,-63.46201,-90.563447,22.065051,22.065051,152.760236,122.186175,0.000231,1.0,-0.046362,0.326655,1.163087,-7.298034,1.0,-12.779671,-1.0,0.125,0.005815,1.002404,-7.298034,-77.012728,22.065051,137.473206,128,16,1e-05,True,True,False,False,True,Pretend you're an {persona} person making stat...,20,"['o_proj', 'down_proj']",add2,1,"[['an honest'], ['a dishonest']]",4,14,10,-3,raw,True,0.15,0.5,40.0,True,wassname/qwen-14B-codefourchan,/workspace/InnerPiSSA_private/outputs/adapters,,0.3,[0.75],800,0.05,100,innerpissa,False,honest,32,['up_proj'],8,InnerPiSSA,True,288,,False,,none,0.3,4,,,,256.0,-0.681328,-0.681328,,2.589907,339.738163,171.546768,-56.179102,-13.21532


## Step 1: Recalculate Symmetry from Logs

Parse each run's output.log to get:
- Baseline-relative symmetry: `min(|neg-zero|, |pos-zero|) / max(...)`
- Resistant direction: which side (neg or pos) moved LESS from baseline
- Raw scores for both InnerPiSSA and prompting

In [132]:
# Step 1: Parse logs to extract Value/Honesty at coeff=-1,0,+1
from ipissa.config import proj_root
import re
import json
from pathlib import Path

cache_dir = proj_root / "outputs" / "wandb_cache"

def parse_value_honesty_from_log(log_file: Path) -> dict:
    """Extract Value/Honesty scores at coeff=-1, 0, +1 from InnerPiSSA results table.
    
    Returns: {vh_neg, vh_zero, vh_pos, symmetry, resistant_toward}
    """
    if not log_file.exists():
        print(f"Log file does not exist: {log_file}")
        return {}
    
    try:
        logs = log_file.read_text()
    except Exception as e:
        print(f"Error reading {log_file}: {e}")
        return {}
    
    # Find InnerPiSSA results table
    pattern = r'Results for method: InnerPiSSA.*?(?=Results for method:|$)'
    match = re.search(pattern, logs, re.DOTALL)
    if not match:
        return {}
    
    table_text = match.group(0)
    
    # Parse Value/Honesty row: "Value/Honesty   -3.0767  -3.1215  -3.1828"
    vh_pattern = r'Value/Honesty\s+([-\d.]+)\s+([-\d.]+)\s+([-\d.]+)'
    vh_match = re.search(vh_pattern, table_text)
    if not vh_match:
        return {}
    
    neg, zero, pos = float(vh_match.group(1)), float(vh_match.group(2)), float(vh_match.group(3))
    
    # Compute symmetry: min(|neg-zero|, |pos-zero|) / max(...)
    dist_neg = abs(neg - zero)
    dist_pos = abs(pos - zero)
    
    metrics = {
        'vh_neg': neg,
        'vh_zero': zero, 
        'vh_pos': pos,
    }
    
    if max(dist_neg, dist_pos) > 0.01:
        metrics['symmetry'] = min(dist_neg, dist_pos) / max(dist_neg, dist_pos)
        # Resistant direction: which way has smaller effect?
        metrics['resistant_toward'] = 'honest' if dist_pos < dist_neg else 'dishonest'
    
    return metrics

# Process all runs
print("Parsing logs for Value/Honesty...")
run_metrics = []

for _, row in tqdm(df_full.iterrows(), total=len(df_full)):
    run_id = row['run_id']
    log_file = cache_dir / run_id / "output.log"
    try:
        m = parse_value_honesty_from_log(log_file)
    except ValueError as e:
        print(f"Error parsing log for run {run_id}: {e}")
        continue
    m['run_id'] = run_id
    run_metrics.append(m)

df_metrics = pd.DataFrame(run_metrics)
print(f"Parsed {df_metrics['symmetry'].notna().sum()} runs with valid Value/Honesty")

# Merge with original data
df = df_full.merge(df_metrics, on='run_id', how='left')

# Summary
valid = df[df['symmetry'].notna()]
print(f"\n=== Value/Honesty Summary ===")
# print(f"vh_neg:  {valid['vh_neg'].mean():.2f} ± {valid['vh_neg'].std():.2f}")
# print(f"vh_zero: {valid['vh_zero'].mean():.2f} ± {valid['vh_zero'].std():.2f}")  
# print(f"vh_pos:  {valid['vh_pos'].mean():.2f} ± {valid['vh_pos'].std():.2f}")
print(f"symmetry: {valid['symmetry'].mean():.2f} ± {valid['symmetry'].std():.2f}")

print(f"\n=== Resistant Direction ===")
# print(df['resistant_toward'].value_counts())

Parsing logs for Value/Honesty...


  0%|          | 0/736 [00:00<?, ?it/s]

Error parsing log for run mdc8ewki: could not convert string to float: '...'
Error parsing log for run kx7aauqa: could not convert string to float: '...'
Error parsing log for run mtod05cs: could not convert string to float: '...'
Parsed 723 runs with valid Value/Honesty

=== Value/Honesty Summary ===
symmetry: 0.50 ± 0.29

=== Resistant Direction ===


## Step 2: Combine Sweeps by Type

Load all sweep CSVs, combine by base name (ignoring date stamps).
For controlled comparisons, we'll look at within-sweep relative differences rather than absolute values across sweeps.

In [137]:

# Define control variable and baseline value for each sweep
# Use 'argv' for ALL sweeps to avoid config confounders
# When sweeps are combined or have hidden varying params, argv shows the complete picture
SWEEP_CONFIG = {
    'sweep-lr': {'var': 'argv', 'baseline': None},
    'sweep-rank': {'var': 'argv', 'baseline': None},
    'sweep-rotation-angle': {'var': 'argv', 'baseline': None},
    'run-models': {'var': 'argv', 'baseline': None},
    'ablate-constraints': {'var': 'argv', 'baseline': None},
    'ablate-modules': {'var': 'argv', 'baseline': None},
    'ablate-wd': {'var': 'argv', 'baseline': None},
    'data-efficiency': {'var': 'argv', 'baseline': None},
    'sweep-layers': {'var': 'argv', 'baseline': None},
    'sweep-long-training': {'var': 'argv', 'baseline': None},
    'sweep-layers-V': {'var': 'argv', 'baseline': None},
    'sweep-scale': {'var': 'argv', 'baseline': None},
    'sweep-snorm': {'var': 'argv', 'baseline': None},
    'sweep-pref-dir': {'var': 'argv', 'baseline': None},
    'sweep-training-stages': {'var': 'argv', 'baseline': None},
    'sweep-loss-modules': {'var': 'argv', 'baseline': None},
    'ablation': {'var': 'argv', 'baseline': None},
}

# Add control_var to SWEEP_CONFIG for easier access
for sweep_name in SWEEP_CONFIG:
    if sweep_name == 'sweep-lr':
        SWEEP_CONFIG[sweep_name]['control_var'] = 'lr'
    elif sweep_name == 'sweep-rank':
        SWEEP_CONFIG[sweep_name]['control_var'] = 'r'
    elif sweep_name == 'sweep-rotation-angle':
        SWEEP_CONFIG[sweep_name]['control_var'] = 'ipissa_rotation_max_angle'
    elif sweep_name == 'run-models':
        SWEEP_CONFIG[sweep_name]['control_var'] = 'model_name'
    elif sweep_name == 'sweep-layers':
        SWEEP_CONFIG[sweep_name]['control_var'] = 'depth'
    elif sweep_name == 'sweep-layers-V':
        SWEEP_CONFIG[sweep_name]['control_var'] = 'depth'
    elif sweep_name == 'sweep-scale':
        SWEEP_CONFIG[sweep_name]['control_var'] = 'ipissa_scale_mode'
    elif sweep_name == 'sweep-snorm':
        SWEEP_CONFIG[sweep_name]['control_var'] = 'loss_snorm'
    elif sweep_name == 'sweep-pref-dir':
        SWEEP_CONFIG[sweep_name]['control_var'] = 'pref_dir'
    elif sweep_name == 'sweep-long-training':
        SWEEP_CONFIG[sweep_name]['control_var'] = 'sampled_n'
    elif sweep_name == 'data-efficiency':
        SWEEP_CONFIG[sweep_name]['control_var'] = 'sampled_n'
    elif sweep_name == 'sweep-training-stages':
        SWEEP_CONFIG[sweep_name]['control_var'] = 'training_stage'
    elif sweep_name == 'sweep-loss-modules':
        SWEEP_CONFIG[sweep_name]['control_var'] = 'loss_module_types'
    else:
        SWEEP_CONFIG[sweep_name]['control_var'] = 'argv'  # fallback

# Group files by sweep base name, optionally taking only latest N per group
N_LATEST_SWEEPS = 5


In [138]:
# Load sweep CSVs from outputs/sweep_groups/*.csv
# Filename format: <sweep-base>-YYYYMMDD-HHMM.csv
sweep_dir = proj_root / 'outputs' / 'sweep_groups'
sweep_files = sorted([f for f in sweep_dir.glob('*.csv') if '_summary' not in f.name])


def get_sweep_base(filename):
    """Extract sweep base name (e.g., 'sweep-lr' from 'sweep-lr-20251123-1629.csv')."""
    return '-'.join(filename.split('-')[:-2])

def get_sweep_timestamp(filename):
    """Extract timestamp for sorting (YYYYMMDD-HHMM)."""
    parts = filename.rstrip('.csv').split('-')
    return '-'.join(parts[-2:])

sweep_by_base = {}
for f in sweep_files:
    base = get_sweep_base(f.name)
    if base not in sweep_by_base:
        sweep_by_base[base] = []
    sweep_by_base[base].append(f)

# Sort each group by timestamp (descending) and optionally take only latest N
for base in sweep_by_base:
    sweep_by_base[base] = sorted(sweep_by_base[base], key=lambda f: get_sweep_timestamp(f.name), reverse=True)
    if N_LATEST_SWEEPS is not None:
        sweep_by_base[base] = sweep_by_base[base][:N_LATEST_SWEEPS]

# Load selected files
sweeps = {}
for base, files in sweep_by_base.items():
    dfs = []
    for f in files:
        df_sweep = pd.read_csv(f)
        df_sweep['sweep_file'] = f.name
        df_sweep['sweep_base'] = base
        dfs.append(df_sweep)
    
    combined = pd.concat(dfs, ignore_index=True)
    
    # Merge with parsed log metrics (symmetry, resistant_toward) 
    merge_cols = ['run_id', 'symmetry', 'resistant_toward']
    available_cols = [c for c in merge_cols if c in df_metrics.columns]
    combined = combined.merge(df_metrics[available_cols], on='run_id', how='left')
    
    # Add prompting/repeng baseline via model_name (run_id merge has 0 overlap with sweep CSVs)
    if 'model_name' in combined.columns:
        combined['prompting_score'] = combined['model_name'].map(prompting_baseline)
        combined['repeng_score'] = combined['model_name'].map(repeng_baseline)
        combined['gain_vs_prompting'] = (combined['main_metric'] - combined['prompting_score']) / combined['prompting_score'].abs() * 100
        combined['gain_vs_repeng'] = (combined['main_metric'] - combined['repeng_score']) / combined['repeng_score'].abs() * 100
    
    sweeps[base] = combined

print(f"Loaded {len(sweeps)} sweep types" + (f" (latest {N_LATEST_SWEEPS} each)" if N_LATEST_SWEEPS else "") + "\n")

def summarize_sweep_mean(df_s, var, baseline_val):
    """Summarize sweep using groupby. main_metric is the t-stat steering effect.
    
    If var='argv', extracts the varying part by removing common prefix.
    """
    if var not in df_s.columns:
        for alt in [var.replace('_', ''), var + 's']:
            if alt in df_s.columns:
                var = alt
                break
        else:
            return None, var
    
    # Special handling for argv: extract varying part
    if var == 'argv':
        # Find common prefix across all argv values
        argvs = df_s['argv'].dropna().unique()
        if len(argvs) > 1:
            # Find longest common prefix
            common_prefix = argvs[0]
            for argv in argvs[1:]:
                # Find where they diverge
                i = 0
                while i < len(common_prefix) and i < len(argv) and common_prefix[i] == argv[i]:
                    i += 1
                common_prefix = common_prefix[:i]
            
            # Backtrack to last delimiter (space, -, /) to avoid cutting mid-word
            # Note: underscore NOT included - want to preserve param names like loss_modules
            if common_prefix:
                delimiters = [' ', '-', '/']
                last_delim_pos = max([common_prefix.rfind(d) for d in delimiters] + [0])
                if last_delim_pos > 0:
                    common_prefix = common_prefix[:last_delim_pos + 1]  # Keep delimiter
            
            # Remove common prefix and create a new column
            df_s = df_s.copy()
            df_s['argv_varying'] = df_s['argv'].apply(
                lambda x: x[len(common_prefix):].strip() if pd.notna(x) else x
            )
            var = 'argv_varying'
    
    # Define columns to aggregate
    agg_cols = {
        'main_metric': 'mean',
        'prompting_score': 'mean', 
        'gain_vs_prompting': 'mean',
        'loss_gap': 'mean',
        'symmetry': 'mean',
    }
    agg_cols = {k: v for k, v in agg_cols.items() if k in df_s.columns}
    
    df_result = (
        df_s.groupby(var, dropna=False)
        .agg(**{k: (k, v) for k, v in agg_cols.items()}, n=(var, 'size'))
        .reset_index()
    )
    df_result['is_baseline'] = df_result[var].apply(lambda x: '⭐' if baseline_val is not None and x == baseline_val else '')
    
    # Sort: baseline first, then by main metric descending
    if 'main_metric' in df_result.columns:
        df_result = df_result.sort_values(['is_baseline', 'main_metric'], ascending=[False, False])
    
    return df_result, var

def style_sweep_table(df, var_col):
    """Style a sweep summary table with color gradients and formatting."""
    styled = df.style
    
    # Format numeric columns
    format_dict = {
        'main_metric': '{:.1f}',
        'prompting_score': '{:.1f}',
        'gain_vs_prompting': '{:+.0f}',
        'loss_gap': '{:.2f}',
        'symmetry': '{:.2f}',
        'n': '{:.0f}',
    }
    format_dict = {k: v for k, v in format_dict.items() if k in df.columns}
    styled = styled.format(format_dict, na_rep='-')
    
    # Color gradients (higher is better for most)
    if 'main_metric' in df.columns:
        styled = styled.background_gradient(subset=['main_metric'], cmap='Greens')
    if 'gain_vs_prompting' in df.columns:
        styled = styled.background_gradient(subset=['gain_vs_prompting'], cmap='RdYlGn', vmin=-100, vmax=100)
    if 'symmetry' in df.columns:
        styled = styled.background_gradient(subset=['symmetry'], cmap='Blues')
    if 'loss_gap' in df.columns:
        styled = styled.background_gradient(subset=['loss_gap'], cmap='Reds_r')
    
    return styled


Loaded 17 sweep types (latest 5 each)



In [139]:

# Show each sweep
for sweep_name, config in SWEEP_CONFIG.items():
    if sweep_name not in sweeps:
        continue
    
    df_s = sweeps[sweep_name]
    control_var = config['var']
    
    summary_df, actual_var = summarize_sweep_mean(df_s, control_var, config['baseline'])
    
    if summary_df is not None:
        print(f"\n{'='*70}")
        print(f"Sweep: {sweep_name} (control: {actual_var})")
        print(f"{'='*70}")
        print(f"Runs: {len(df_s)}")
        print(summary_df.to_string(index=False))
    else:
        print(f"\n{sweep_name}: Could not find '{control_var}' column")


Sweep: sweep-lr (control: argv_varying)
Runs: 32
     argv_varying  main_metric  prompting_score  gain_vs_prompting    loss_gap  symmetry  n is_baseline
v1-80gb --lr=1e-2   748.050000              NaN                NaN   25.410000  0.095259  2            
v1-80gb --lr=1e-3   638.300000              NaN                NaN    7.034500  0.179605  2            
  -80gb --lr=1e-2   372.956667       613.221816         -39.180790    3.745267  0.460597  3            
  -80gb --lr=1e-1   192.766667       613.221816         -68.564937   13.024667  0.313057  3            
  -80gb --lr=1e-3   170.700000       613.221816         -72.163417    7.190000  0.897404  3            
v1-80gb --lr=1e-4   106.965000              NaN                NaN    5.789000  0.323692  2            
v1-80gb --lr=1e-5    92.980000              NaN                NaN   -0.178850  0.471192  2            
v1-80gb --lr=1e-1    86.645000              NaN                NaN   21.070000  0.765943  2            
v1-80gb --lr=1


Sweep: sweep-rank (control: argv_varying)
Runs: 31
    argv_varying  main_metric  prompting_score  gain_vs_prompting  loss_gap  symmetry  n is_baseline
 v1-80gb --r=128   752.250000              NaN                NaN  8.344000  0.167235  2            
 v1-80gb --r=256   630.500000              NaN                NaN 15.895000  0.383129  2            
 v1-80gb --r=512   335.915000              NaN                NaN 17.770000  0.640165  2            
  v1-80gb --r=32   284.400000              NaN                NaN  4.463000  0.326606  2            
   -80gb --r=256   231.866667       613.221816         -62.188777  1.580333  0.317129  3            
   -80gb --r=512   208.703333       613.221816         -65.966094  9.455333  0.406363  3            
   v1-80gb --r=8   186.150000              NaN                NaN  1.196300  0.698849  2            
  v1-80gb --r=16   136.650000              NaN                NaN  2.051000  0.201847  2            
    -80gb --r=64   131.600000       613

In [140]:
# Merge rankings across N=3 latest runs of SAME sweep type

print(f"=== Cross-run ranking (latest {N_LATEST_SWEEPS} runs of each sweep type) ===\n")

# For each sweep type, take latest N runs and merge their rankings
for base in sorted(sweep_by_base.keys()):
    files = sweep_by_base[base][:N_LATEST_SWEEPS]  # latest N files for this type
    
    if len(files) < 2:
        continue  # need multiple runs to merge
    
    if base not in sweeps:
        continue
    
    df_s = sweeps[base]
    config = SWEEP_CONFIG.get(base, {})
    ctrl = config.get('control_var', 'argv')
    
    # Collect rankings from each run
    run_rankings = []
    
    for sweep_file in files:
        df_this = df_s[df_s['sweep_file'] == sweep_file.name].copy()
        
        if len(df_this) == 0 or 'main_metric' not in df_this.columns:
            continue
        
        # Rank within this run
        df_this['rank_pct'] = df_this['main_metric'].rank(pct=True)
        
        for _, row in df_this.iterrows():
            ctrl_val = str(row.get(ctrl, 'unknown'))
            run_rankings.append({
                'sweep_file': sweep_file.name,
                'control_val': ctrl_val,
                'rank_pct': row['rank_pct'],
                'main_metric': row['main_metric']
            })
    
    if not run_rankings:
        continue
    
    df_ranks = pd.DataFrame(run_rankings)
    
    # Average rank for each control value across runs
    rank_summary = df_ranks.groupby('control_val').agg({
        'rank_pct': ['mean', 'std', 'count'],
        'main_metric': 'mean'
    }).reset_index()
    
    rank_summary.columns = ['control_val', 'avg_rank_pct', 'std_rank', 'n_runs', 'avg_metric']
    rank_summary = rank_summary.sort_values('avg_rank_pct', ascending=False)
    
    print(f"\n{base} (merged {len(files)} runs, control={ctrl}):")
    print(rank_summary.to_string(index=False))
    
    # Show reproducibility: low std = consistent ranking across runs
    if len(rank_summary) > 0:
        avg_std = rank_summary['std_rank'].mean()
        print(f"  → Avg rank std: {avg_std:.3f} (lower = more reproducible)")

print("\n→ avg_rank_pct: higher = better ranking within its sweep")
print(f"→ Merges latest {N_LATEST_SWEEPS} runs of each sweep type to show consistent winners")

=== Cross-run ranking (latest 5 runs of each sweep type) ===


ablate-constraints (merged 5 runs, control=argv):
                                                                 control_val  avg_rank_pct  std_rank  n_runs  avg_metric
                                                    q4b-80gb --mono --no_coh      1.000000       NaN       1  819.100000
                                                q4bv1-80gb --no_coh_adaptive      0.954545  0.064282       2  799.250000
                                                     q4b-80gb --scale_s=none      0.850000  0.070711       2  441.000000
                                                  q4b-80gb --no_coh_adaptive      0.850000  0.212132       2  256.200000
                                               q4bv1-80gb --rot_u --no_rot_v      0.809091  0.086722       3  491.833333
                                                  q4bv1-80gb --no_mono --coh      0.768182  0.186419       2  543.900000
                                        