# Llama 3.2 3B: Spectral Steering Analysis

This notebook analyzes the results from the **Targeted Sweep**, **Low Alpha Sweep**, and **Final Confirmation** runs.

**Objective:** Visualize the trade-offs between Sycophancy (Safety) and Math (Reasoning) and highlight the optimal configurations.

In [None]:
import pandas as pd
import matplotlib.pyplot as plt
import seaborn as sns
import os

sns.set_theme(style="whitegrid")
%matplotlib inline

## 1. Load Data

In [None]:
base_dir = "../output"

# 1. Extremes Confirmation (N=Full) - The Final Winners
df_extremes = pd.read_csv(os.path.join(base_dir, "llama_3_2_extremes_confirmation.csv"))
df_extremes['Source'] = 'Full_Scale_Extremes'

# 2. Final Confirmation (N=Full) - L20_A0.2 (Note: Format might slightly differ, check headers)
try:
    df_final = pd.read_csv(os.path.join(base_dir, "llama_3_2_final_confirmation.csv"))
    df_final['Layer'] = 20
    df_final['Alpha'] = 0.2
    df_final['Desc'] = 'Smoothing_Winner'
    df_final['Source'] = 'Full_Scale_Final'
    # Ensure columns match for concatenation
    df_final = df_final.rename(columns={'Config': 'ConfigOLD'}) # 'Config' in this file is just the ID string
    df_final['Config'] = df_final['ConfigOLD']
except FileNotFoundError:
    df_final = pd.DataFrame() # Handle missing if not present
    print("Warning: Final Confirmation CSV not found.")

# 3. Low Alpha Sweep (N=300) - Context
try:
    df_sweep = pd.read_csv(os.path.join(base_dir, "llama_3_2_low_alpha_log_v3.csv"))
    df_sweep['Source'] = 'Sweep_N300'
    df_sweep['Desc'] = 'Sweep'
except:
    df_sweep = pd.DataFrame()

# Combine Key Results
# We primarily care about comparing the high-fidelity runs
df_full = pd.concat([df_extremes, df_final], ignore_index=True)
df_full['Config'] = df_full.apply(lambda x: f"L{int(x['Layer'])}_A{x['Alpha']}", axis=1)

print("Full Scale Results:")
display(df_full[['Config', 'Desc', 'Syco', 'Math', 'PPL']])

## 2. The Champions Comparison (Bar Chart)
 Comparing the identified optimal configurations against the Baseline.

In [None]:
metrics = ['Syco', 'Math', 'PPL']

# Prepare Data for Plotting (Long Format)
plot_data = []

# Add Baseline (Mean of baselines from runs to be accurate, or just take one representative)
# Baselines were ~ Syco 77.4%, Math 64.3%, PPL 9.38
baseline_stats = {
    'Config': 'Baseline',
    'Syco': 0.774,
    'Math': 0.643,
    'PPL': 9.38
}
plot_data.append(baseline_stats)

for _, row in df_full.iterrows():
    plot_data.append({
        'Config': row['Config'],
        'Syco': row['Syco'],
        'Math': row['Math'],
        'PPL': row['PPL']
    })

df_plot = pd.DataFrame(plot_data)

# Normalize for visualization (Scale to % for PPL or keep separate?)
# Let's do separate Subplots
fig, axes = plt.subplots(1, 3, figsize=(18, 5))

# Syco
sns.barplot(data=df_plot, x='Config', y='Syco', ax=axes[0], palette='viridis')
axes[0].set_title('Safety (Sycophancy) - Higher is Better')
axes[0].set_ylim(0.70, 0.85)
for i in axes[0].containers:
    axes[0].bar_label(i, fmt='%.3f')

# Math
sns.barplot(data=df_plot, x='Config', y='Math', ax=axes[1], palette='magma')
axes[1].set_title('Reasoning (GSM8K) - Higher is Better')
axes[1].set_ylim(0.60, 0.70)
for i in axes[1].containers:
    axes[1].bar_label(i, fmt='%.3f')

# PPL
sns.barplot(data=df_plot, x='Config', y='PPL', ax=axes[2], palette='rocket')
axes[2].set_title('Perplexity - Lower is Better')
axes[2].set_ylim(9.0, 10.0)
for i in axes[2].containers:
    axes[2].bar_label(i, fmt='%.2f')

plt.tight_layout()
plt.show()

## 3. The Pareto Frontier
Visualizing the trade-off between Safety Gain and Math Gain. The goal is the top-right quadrant (Positive Syco Delta, Positive Math Delta).

In [None]:
plt.figure(figsize=(10, 8))

# Plot Sweep Data (Context)
if not df_sweep.empty:
    sns.scatterplot(data=df_sweep, x='Math_Delta', y='Syco_Delta', 
                    color='grey', alpha=0.5, label='Sweep (N=300)', s=60)

# Plot Full Scale Results (Stars)
sns.scatterplot(data=df_full, x='Math_Delta', y='Syco_Delta', 
                hue='Config', style='Config', 
                s=300, palette='deep', markers=['*', 'X', 'P'])

# Add Quadrant Lines
plt.axvline(0, color='black', linestyle='--', linewidth=1)
plt.axhline(0, color='black', linestyle='--', linewidth=1)

# Highlight the "Clear Win" Zone
plt.fill_between([0, 0.03], 0, 0.05, color='green', alpha=0.1, label='Clear Win Zone')

plt.title('Pareto Frontier: Safety vs Reasoning', fontsize=16)
plt.xlabel('Math Improvement ->', fontsize=12)
plt.ylabel('Safety Improvement ->', fontsize=12)
plt.legend()
plt.grid(True, alpha=0.3)

# Annotate
for _, row in df_full.iterrows():
    plt.text(row['Math_Delta']+0.001, row['Syco_Delta']+0.001, row['Config'], 
             fontsize=12, weight='bold')

plt.show()