# Recommendation System Benchmark Analysis

This notebook analyzes the performance of recommendation models under different conditions:
- **Clean model**: Trained on clean data
- **Poisoned model**: Trained on poisoned data
- **Unlearned models**: Models that underwent unlearning (First Order and Gradient Ascent methods)
 
We'll focus on metrics related to fake item detection and manipulation across different benchmarks.

## 1. Setup and Data Loading

In [1]:
import pandas as pd
import numpy as np
import matplotlib.pyplot as plt
import seaborn as sns
from pathlib import Path
import warnings
warnings.filterwarnings('ignore')

# Set style for better-looking plots
sns.set_style("whitegrid")
plt.rcParams['figure.figsize'] = (12, 6)
plt.rcParams['font.size'] = 10

# Create directories for saving tables and plots if they don't exist
Path("tables").mkdir(exist_ok=True)
Path("plots").mkdir(exist_ok=True)

print("Setup complete!")


Setup complete!


In [2]:
# Load the data from both CSV files

df_finetune = pd.read_csv('/Users/ploymel/Documents/plm4newsrs/outputs/politifact/naml_model/bert_finetune/results/evaluation_summary.csv')  # Replace with your actual finetune CSV filename
df_frozen = pd.read_csv('/Users/ploymel/Documents/plm4newsrs/outputs/politifact/naml_model/bert_frozen/results/evaluation_summary.csv')      # Replace with your actual frozen CSV filename

# Add a column to distinguish between finetune and frozen
df_finetune['model_variant'] = 'finetune'
df_frozen['model_variant'] = 'frozen'

# Combine both dataframes
df = pd.concat([df_finetune, df_frozen], ignore_index=True)

print("Data loaded successfully!")
print(f"Finetune shape: {df_finetune.shape}")
print(f"Frozen shape: {df_frozen.shape}")
print(f"Combined shape: {df.shape}")
print(f"\nModel variants: {df['model_variant'].value_counts().to_dict()}")
print(f"\nColumns: {df.columns.tolist()}")
print(f"\nFirst few rows:")
df.head()

Data loaded successfully!
Finetune shape: (96, 18)
Frozen shape: (96, 18)
Combined shape: (192, 18)

Model variants: {'finetune': 96, 'frozen': 96}

Columns: ['model', 'benchmark', 'AUC', 'MRR', 'NDCG@5', 'NDCG@10', 'Recall@5', 'Recall@10', 'MC@5', 'MC@10', 'MC@20', 'avg_fake_in_top_10', 'avg_fake_in_top_20', 'users_with_fake_pct_top_10', 'users_with_fake_pct_top_20', 'avg_fake_ratio_top_10', 'avg_fake_ratio_top_20', 'model_variant']

First few rows:


Unnamed: 0,model,benchmark,AUC,MRR,NDCG@5,NDCG@10,Recall@5,Recall@10,MC@5,MC@10,MC@20,avg_fake_in_top_10,avg_fake_in_top_20,users_with_fake_pct_top_10,users_with_fake_pct_top_20,avg_fake_ratio_top_10,avg_fake_ratio_top_20,model_variant
0,clean-epoch=06-val_auc=0.8807,benchmark_honeypot,0.682974,0.152745,0.146443,0.190852,0.268012,0.40634,0.760807,0.623919,0.624063,6.239193,12.481268,100.0,100.0,0.623919,0.624063,finetune
1,clean-epoch=06-val_auc=0.8807,benchmark_real_only,0.755733,0.261504,0.257802,0.311864,0.381107,0.547231,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,finetune
2,clean-epoch=06-val_auc=0.8807,benchmark_mixed,0.717679,0.211042,0.203929,0.257625,0.319218,0.488599,0.538111,0.486319,0.545277,4.863192,10.905537,100.0,100.0,0.486319,0.545277,finetune
3,unlearned-first_order-ratio-0.01-trial-2,benchmark_honeypot,0.474696,0.072814,0.046258,0.068532,0.086455,0.158501,0.530259,0.408646,0.50951,4.086455,10.190202,100.0,100.0,0.408646,0.50951,finetune
4,unlearned-first_order-ratio-0.01-trial-2,benchmark_real_only,0.496444,0.073914,0.036851,0.061733,0.061889,0.140065,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,finetune


## 2. Data Preprocessing and Grouping

In [3]:
def parse_model_info(model_name):
    """Extract model type, method, ratio, and trial from model name"""
    if 'clean' in model_name:
        return 'clean', None, None, None
    elif 'poisoned' in model_name:
        return 'poisoned', None, None, None
    elif 'unlearned' in model_name:
        parts = model_name.split('-')
        method = parts[1]  # e.g., 'first_order' or 'gradient_ascent'
        ratio = None
        trial = None
        
        for i, part in enumerate(parts):
            if part == 'ratio' and i+1 < len(parts):
                ratio = float(parts[i+1])
            elif part == 'trial' and i+1 < len(parts):
                trial = int(parts[i+1])
        
        return 'unlearned', method, ratio, trial
    else:
        return 'unknown', None, None, None

# Apply parsing
df[['model_type', 'unlearning_method', 'unlearning_ratio', 'trial']] = df['model'].apply(
    lambda x: pd.Series(parse_model_info(x))
)

print("Model types found:")
print(df['model_type'].value_counts())
print("\nUnlearning methods found:")
print(df['unlearning_method'].value_counts())


Model types found:
model_type
unlearned    180
clean          6
poisoned       6
Name: count, dtype: int64

Unlearning methods found:
unlearning_method
first_order        90
gradient_ascent    90
Name: count, dtype: int64


## 3. Define Metric Groups

In [4]:
# Group metrics by category
metric_groups = {
    'Performance Metrics': ['AUC', 'MRR', 'NDCG@5', 'NDCG@10', 'Recall@5', 'Recall@10'],
    'Manipulation Detection Metrics': ['MC@5', 'MC@10', 'MC@20'],
    'Average Fake Items': ['avg_fake_in_top_10', 'avg_fake_in_top_20'],
    'User Coverage Metrics': ['users_with_fake_pct_top_10', 'users_with_fake_pct_top_20'],
    'Fake Ratio Metrics': ['avg_fake_ratio_top_10', 'avg_fake_ratio_top_20']
}

print("Metric Groups:")
for group, metrics in metric_groups.items():
    print(f"\n{group}:")
    for metric in metrics:
        print(f"  - {metric}")

Metric Groups:

Performance Metrics:
  - AUC
  - MRR
  - NDCG@5
  - NDCG@10
  - Recall@5
  - Recall@10

Manipulation Detection Metrics:
  - MC@5
  - MC@10
  - MC@20

Average Fake Items:
  - avg_fake_in_top_10
  - avg_fake_in_top_20

User Coverage Metrics:
  - users_with_fake_pct_top_10
  - users_with_fake_pct_top_20

Fake Ratio Metrics:
  - avg_fake_ratio_top_10
  - avg_fake_ratio_top_20


## 4. Aggregate Trials for Unlearned Models

In [5]:
def aggregate_trials(df):
    """Average results across different trials for unlearned models"""
    
    # Separate unlearned models from others
    unlearned_df = df[df['model_type'] == 'unlearned'].copy()
    other_df = df[df['model_type'] != 'unlearned'].copy()
    
    if len(unlearned_df) > 0:
        # Group by everything except trial and average
        group_cols = ['model_type', 'unlearning_method', 'unlearning_ratio', 'benchmark', 'model_variant']
        
        # Get numeric columns to average (exclude columns that are already in group_cols)
        numeric_cols = [col for col in df.select_dtypes(include=[np.number]).columns 
                       if col not in group_cols]
        
        # Group and aggregate
        unlearned_agg = unlearned_df.groupby(group_cols)[numeric_cols].mean().reset_index()
        
        # Add a model name for aggregated data
        unlearned_agg['model'] = unlearned_agg.apply(
            lambda row: f"unlearned-{row['unlearning_method']}-ratio-{row['unlearning_ratio']}-avg", 
            axis=1
        )
        
        # Add trial column as NaN to match other_df structure
        unlearned_agg['trial'] = np.nan
        
        # Combine back
        result_df = pd.concat([other_df, unlearned_agg], ignore_index=True)
    else:
        result_df = other_df
    
    return result_df

df_agg = aggregate_trials(df)

print(f"\nOriginal data shape: {df.shape}")
print(f"Aggregated data shape: {df_agg.shape}")
print(f"\nModel variants distribution:")
print(df_agg['model_variant'].value_counts())
print(f"\nAggregated models:")
df_agg[df_agg['model_type'] == 'unlearned'][['model', 'model_variant', 'unlearning_method', 'unlearning_ratio', 'benchmark']].drop_duplicates()


Original data shape: (192, 22)
Aggregated data shape: (72, 22)

Model variants distribution:
model_variant
finetune    36
frozen      36
Name: count, dtype: int64

Aggregated models:


Unnamed: 0,model,model_variant,unlearning_method,unlearning_ratio,benchmark
12,unlearned-first_order-ratio-0.01-avg,finetune,first_order,0.01,benchmark_honeypot
13,unlearned-first_order-ratio-0.01-avg,frozen,first_order,0.01,benchmark_honeypot
14,unlearned-first_order-ratio-0.01-avg,finetune,first_order,0.01,benchmark_mixed
15,unlearned-first_order-ratio-0.01-avg,frozen,first_order,0.01,benchmark_mixed
16,unlearned-first_order-ratio-0.01-avg,finetune,first_order,0.01,benchmark_real_only
17,unlearned-first_order-ratio-0.01-avg,frozen,first_order,0.01,benchmark_real_only
18,unlearned-first_order-ratio-0.05-avg,finetune,first_order,0.05,benchmark_honeypot
19,unlearned-first_order-ratio-0.05-avg,frozen,first_order,0.05,benchmark_honeypot
20,unlearned-first_order-ratio-0.05-avg,finetune,first_order,0.05,benchmark_mixed
21,unlearned-first_order-ratio-0.05-avg,frozen,first_order,0.05,benchmark_mixed


## 5. Create Summary Tables by Metric Group

In [6]:
# Create summary tables for each metric group
summary_tables = {}

for group_name, metrics in metric_groups.items():
    print(f"\n{'='*60}")
    print(f"Creating summary table for: {group_name}")
    print(f"{'='*60}")
    
    # Select relevant columns
    cols_to_select = ['model', 'benchmark', 'model_type', 'model_variant', 'unlearning_method', 'unlearning_ratio'] + metrics
    
    # Filter for non-trial data (aggregated + clean + poisoned)
    mask = (df_agg['model_type'].isin(['clean', 'poisoned'])) | \
           ((df_agg['model_type'] == 'unlearned') & (df_agg['trial'].isna()))
    
    summary_df = df_agg[mask][cols_to_select].copy()
    
    # Sort for better readability
    summary_df = summary_df.sort_values(['benchmark', 'model_variant', 'model_type', 'unlearning_method', 'unlearning_ratio'])
    
    # Save to CSV
    filename = f"tables/{group_name.lower().replace(' ', '_')}.csv"
    summary_df.to_csv(filename, index=False)
    
    summary_tables[group_name] = summary_df
    
    print(f"Saved to: {filename}")
    print(f"Shape: {summary_df.shape}")


Creating summary table for: Performance Metrics
Saved to: tables/performance_metrics.csv
Shape: (72, 12)

Creating summary table for: Manipulation Detection Metrics
Saved to: tables/manipulation_detection_metrics.csv
Shape: (72, 9)

Creating summary table for: Average Fake Items
Saved to: tables/average_fake_items.csv
Shape: (72, 8)

Creating summary table for: User Coverage Metrics
Saved to: tables/user_coverage_metrics.csv
Shape: (72, 8)

Creating summary table for: Fake Ratio Metrics
Saved to: tables/fake_ratio_metrics.csv
Shape: (72, 8)


## 6. Plotting Functions

In [7]:
def create_comparison_plot(df_agg, metric, benchmark_type, output_dir='plots'):
    """
    Create a comparison plot for a specific metric and benchmark type
    Compares finetune vs frozen models
    Excludes real_only benchmark as those values are always 0 for manipulation metrics
    """
    # Filter data
    plot_data = df_agg[df_agg['benchmark'] == benchmark_type].copy()
    
    # Skip if no data
    if len(plot_data) == 0:
        print(f"No data for {metric} on {benchmark_type}")
        return
    
    # Create figure
    fig, ax = plt.subplots(figsize=(14, 8))
    
    # Define line styles for model variants
    line_styles = {'finetune': '-', 'frozen': '--'}
    variant_labels = {'finetune': 'Finetune', 'frozen': 'Frozen'}
    
    # Process each model variant separately
    for variant in ['finetune', 'frozen']:
        variant_data = plot_data[plot_data['model_variant'] == variant]
        
        if len(variant_data) == 0:
            continue
        
        # Prepare data for plotting
        clean_data = variant_data[variant_data['model_type'] == 'clean']
        poisoned_data = variant_data[variant_data['model_type'] == 'poisoned']
        unlearned_data = variant_data[variant_data['model_type'] == 'unlearned']
        
        # Plot clean and poisoned as horizontal lines
        if len(clean_data) > 0:
            clean_value = clean_data[metric].values[0]
            ax.axhline(y=clean_value, color='green', linestyle=line_styles[variant], 
                      linewidth=2.5, label=f'Clean - {variant_labels[variant]}', alpha=0.7)
        
        if len(poisoned_data) > 0:
            poisoned_value = poisoned_data[metric].values[0]
            ax.axhline(y=poisoned_value, color='red', linestyle=line_styles[variant], 
                      linewidth=2.5, label=f'Poisoned - {variant_labels[variant]}', alpha=0.7)
        
        # Plot unlearned models by method
        if len(unlearned_data) > 0:
            methods = unlearned_data['unlearning_method'].unique()
            colors = {'first_order': 'blue', 'gradient_ascent': 'orange'}
            markers = {'first_order': 'o', 'gradient_ascent': 's'}
            
            for method in methods:
                method_data = unlearned_data[unlearned_data['unlearning_method'] == method]
                method_data = method_data.sort_values('unlearning_ratio')
                
                ax.plot(method_data['unlearning_ratio'], method_data[metric], 
                       marker=markers.get(method, 'o'), 
                       color=colors.get(method, 'purple'),
                       linestyle=line_styles[variant],
                       linewidth=2.5, markersize=8,
                       label=f'{method.replace("_", " ").title()} - {variant_labels[variant]}',
                       alpha=0.8)
    
    # Formatting
    ax.set_xlabel('Unlearning Ratio', fontsize=12, fontweight='bold')
    ax.set_ylabel(metric, fontsize=12, fontweight='bold')
    ax.set_title(f'{metric} Comparison (Finetune vs Frozen) - {benchmark_type.replace("_", " ").title()}', 
                 fontsize=14, fontweight='bold', pad=20)
    ax.legend(loc='best', fontsize=9, framealpha=0.9, ncol=2)
    ax.grid(True, alpha=0.3)
    
    # Save plot
    filename = f"{output_dir}/{metric}_{benchmark_type}.png"
    plt.tight_layout()
    plt.savefig(filename, dpi=300, bbox_inches='tight')
    plt.close()
    
    print(f"Saved: {filename}")

## 7. Generate All Plots

In [8]:
# Metrics to plot (excluding real_only benchmark)
fake_detection_metrics = [
    'MC@5', 'MC@10', 'MC@20',
    'avg_fake_in_top_10', 'avg_fake_in_top_20',
    'users_with_fake_pct_top_10', 'users_with_fake_pct_top_20',
    'avg_fake_ratio_top_10', 'avg_fake_ratio_top_20'
]

# Benchmarks to plot (excluding real_only)
benchmarks_to_plot = ['benchmark_honeypot', 'benchmark_mixed']

print("\nGenerating comparison plots...")
print("="*60)

for metric in fake_detection_metrics:
    print(f"\nProcessing metric: {metric}")
    for benchmark in benchmarks_to_plot:
        create_comparison_plot(df_agg, metric, benchmark)

print("\n" + "="*60)
print("All plots generated successfully!")
print("="*60)


Generating comparison plots...

Processing metric: MC@5
Saved: plots/MC@5_benchmark_honeypot.png
Saved: plots/MC@5_benchmark_mixed.png

Processing metric: MC@10
Saved: plots/MC@10_benchmark_honeypot.png
Saved: plots/MC@10_benchmark_mixed.png

Processing metric: MC@20
Saved: plots/MC@20_benchmark_honeypot.png
Saved: plots/MC@20_benchmark_mixed.png

Processing metric: avg_fake_in_top_10
Saved: plots/avg_fake_in_top_10_benchmark_honeypot.png
Saved: plots/avg_fake_in_top_10_benchmark_mixed.png

Processing metric: avg_fake_in_top_20
Saved: plots/avg_fake_in_top_20_benchmark_honeypot.png
Saved: plots/avg_fake_in_top_20_benchmark_mixed.png

Processing metric: users_with_fake_pct_top_10
Saved: plots/users_with_fake_pct_top_10_benchmark_honeypot.png
Saved: plots/users_with_fake_pct_top_10_benchmark_mixed.png

Processing metric: users_with_fake_pct_top_20
Saved: plots/users_with_fake_pct_top_20_benchmark_honeypot.png
Saved: plots/users_with_fake_pct_top_20_benchmark_mixed.png

Processing metric

## 7b. Generate Performance Metric Plots

In [9]:
# Performance metrics to plot (include all benchmarks including real_only for these)
performance_metrics = ['AUC', 'MRR', 'NDCG@5', 'NDCG@10', 'Recall@5', 'Recall@10']

# All benchmarks for performance metrics
all_benchmarks = ['benchmark_honeypot', 'benchmark_mixed', 'benchmark_real_only']

print("\nGenerating performance metric plots...")
print("="*60)

for metric in performance_metrics:
    print(f"\nProcessing metric: {metric}")
    for benchmark in all_benchmarks:
        create_comparison_plot(df_agg, metric, benchmark)

print("\n" + "="*60)
print("All performance metric plots generated!")
print("="*60)


Generating performance metric plots...

Processing metric: AUC
Saved: plots/AUC_benchmark_honeypot.png
Saved: plots/AUC_benchmark_mixed.png
Saved: plots/AUC_benchmark_real_only.png

Processing metric: MRR
Saved: plots/MRR_benchmark_honeypot.png
Saved: plots/MRR_benchmark_mixed.png
Saved: plots/MRR_benchmark_real_only.png

Processing metric: NDCG@5
Saved: plots/NDCG@5_benchmark_honeypot.png
Saved: plots/NDCG@5_benchmark_mixed.png
Saved: plots/NDCG@5_benchmark_real_only.png

Processing metric: NDCG@10
Saved: plots/NDCG@10_benchmark_honeypot.png
Saved: plots/NDCG@10_benchmark_mixed.png
Saved: plots/NDCG@10_benchmark_real_only.png

Processing metric: Recall@5
Saved: plots/Recall@5_benchmark_honeypot.png
Saved: plots/Recall@5_benchmark_mixed.png
Saved: plots/Recall@5_benchmark_real_only.png

Processing metric: Recall@10
Saved: plots/Recall@10_benchmark_honeypot.png
Saved: plots/Recall@10_benchmark_mixed.png
Saved: plots/Recall@10_benchmark_real_only.png

All performance metric plots genera

## 8. Create Combined Visualization

In [11]:
def create_combined_plot(df_agg, metrics_list, benchmark_type, title, output_dir='plots'):
    """Create a multi-panel plot for related metrics comparing finetune vs frozen"""
    
    n_metrics = len(metrics_list)
    n_cols = 3
    n_rows = (n_metrics + n_cols - 1) // n_cols
    
    fig, axes = plt.subplots(n_rows, n_cols, figsize=(18, 5*n_rows))
    axes = axes.flatten() if n_metrics > 1 else [axes]
    
    # Define line styles for model variants
    line_styles = {'finetune': '-', 'frozen': '--'}
    variant_labels = {'finetune': 'Finetune', 'frozen': 'Frozen'}
    
    for idx, metric in enumerate(metrics_list):
        ax = axes[idx]
        
        # Filter data
        plot_data = df_agg[df_agg['benchmark'] == benchmark_type].copy()
        
        if len(plot_data) == 0:
            continue
        
        # Process each model variant
        for variant in ['finetune', 'frozen']:
            variant_data = plot_data[plot_data['model_variant'] == variant]
            
            if len(variant_data) == 0:
                continue
            
            # Prepare data
            clean_data = variant_data[variant_data['model_type'] == 'clean']
            poisoned_data = variant_data[variant_data['model_type'] == 'poisoned']
            unlearned_data = variant_data[variant_data['model_type'] == 'unlearned']
            
            # Plot reference lines
            if len(clean_data) > 0:
                ax.axhline(y=clean_data[metric].values[0], color='green', 
                          linestyle=line_styles[variant], linewidth=1.5, 
                          label=f'Clean-{variant_labels[variant]}', alpha=0.7)
            
            if len(poisoned_data) > 0:
                ax.axhline(y=poisoned_data[metric].values[0], color='red', 
                          linestyle=line_styles[variant], linewidth=1.5, 
                          label=f'Poisoned-{variant_labels[variant]}', alpha=0.7)
            
            # Plot unlearned models
            if len(unlearned_data) > 0:
                methods = unlearned_data['unlearning_method'].unique()
                colors = {'first_order': 'blue', 'gradient_ascent': 'orange'}
                markers = {'first_order': 'o', 'gradient_ascent': 's'}
                
                for method in methods:
                    method_data = unlearned_data[unlearned_data['unlearning_method'] == method]
                    method_data = method_data.sort_values('unlearning_ratio')
                    
                    ax.plot(method_data['unlearning_ratio'], method_data[metric], 
                           marker=markers.get(method, 'o'), 
                           color=colors.get(method, 'purple'),
                           linestyle=line_styles[variant],
                           linewidth=2, markersize=6,
                           label=f'{method.replace("_", " ").title()}-{variant_labels[variant]}',
                           alpha=0.8)
        
        ax.set_xlabel('Unlearning Ratio', fontsize=10)
        ax.set_ylabel(metric, fontsize=10)
        ax.set_title(metric, fontsize=11, fontweight='bold')
        ax.legend(loc='best', fontsize=7, ncol=2)
        ax.grid(True, alpha=0.3)
    
    # Hide extra subplots
    for idx in range(n_metrics, len(axes)):
        axes[idx].axis('off')
    
    fig.suptitle(f'{title} (Finetune vs Frozen) - {benchmark_type.replace("_", " ").title()}', 
                 fontsize=16, fontweight='bold', y=1.00)
    
    plt.tight_layout()
    
    # Save
    filename = f"{output_dir}/combined_{title.lower().replace(' ', '_')}_{benchmark_type}.png"
    plt.savefig(filename, dpi=300, bbox_inches='tight')
    plt.close()
    
    print(f"Saved combined plot: {filename}")

In [12]:
# Generate combined plots for each metric group
print("\nGenerating combined plots...")
print("="*60)

metric_groups_to_plot = {
    'Manipulation Detection': ['MC@5', 'MC@10', 'MC@20'],
    'Average Fake Items': ['avg_fake_in_top_10', 'avg_fake_in_top_20'],
    'User Coverage': ['users_with_fake_pct_top_10', 'users_with_fake_pct_top_20'],
    'Fake Ratios': ['avg_fake_ratio_top_10', 'avg_fake_ratio_top_20']
}

for title, metrics in metric_groups_to_plot.items():
    for benchmark in benchmarks_to_plot:
        create_combined_plot(df_agg, metrics, benchmark, title)

print("\nAll combined plots generated!")


Generating combined plots...
Saved combined plot: plots/combined_manipulation_detection_benchmark_honeypot.png
Saved combined plot: plots/combined_manipulation_detection_benchmark_mixed.png
Saved combined plot: plots/combined_average_fake_items_benchmark_honeypot.png
Saved combined plot: plots/combined_average_fake_items_benchmark_mixed.png
Saved combined plot: plots/combined_user_coverage_benchmark_honeypot.png
Saved combined plot: plots/combined_user_coverage_benchmark_mixed.png
Saved combined plot: plots/combined_fake_ratios_benchmark_honeypot.png
Saved combined plot: plots/combined_fake_ratios_benchmark_mixed.png

All combined plots generated!


## 9. Find Best Unlearning Ratio for Each Method

In [13]:
def find_best_ratio_for_method(df_agg, method, benchmark_type, model_variant):
    """
    Find the best unlearning ratio based on AUC on benchmark_real_only.
    This measures true recommendation utility without attack contamination.
    """
    # Filter for this method, benchmark_real_only, and variant
    method_data = df_agg[
        (df_agg['unlearning_method'] == method) & 
        (df_agg['benchmark'] == 'benchmark_real_only') &  # Use real_only for selection
        (df_agg['model_variant'] == model_variant) &
        (df_agg['trial'].isna())
    ].copy()
    
    if len(method_data) == 0:
        return None, None
    
    # Get clean model's AUC as reference
    clean_data = df_agg[
        (df_agg['model_type'] == 'clean') & 
        (df_agg['benchmark'] == 'benchmark_real_only') &
        (df_agg['model_variant'] == model_variant)
    ]
    
    if len(clean_data) == 0:
        # Fallback: just pick the ratio with highest AUC
        best_ratio = method_data.loc[method_data['AUC'].idxmax(), 'unlearning_ratio']
    else:
        # Find ratio with AUC closest to clean model
        clean_auc = clean_data['AUC'].values[0]
        method_data['auc_distance'] = abs(method_data['AUC'] - clean_auc)
        best_ratio = method_data.loc[method_data['auc_distance'].idxmin(), 'unlearning_ratio']
    
    # Now get the full row for this ratio from the original benchmark_type
    best_row = df_agg[
        (df_agg['unlearning_method'] == method) & 
        (df_agg['benchmark'] == benchmark_type) &  # Get data for the actual benchmark being analyzed
        (df_agg['model_variant'] == model_variant) &
        (df_agg['unlearning_ratio'] == best_ratio) &
        (df_agg['trial'].isna())
    ]
    
    if len(best_row) == 0:
        return None, None
    
    return best_ratio, best_row.iloc[0]

In [14]:
print("\n" + "="*80)
print("FINDING BEST UNLEARNING RATIOS")
print("="*80)

# ALL metrics to analyze (both performance and manipulation)
all_analysis_metrics = [
    # Performance metrics
    'AUC', 'MRR', 'NDCG@5', 'NDCG@10', 'Recall@5', 'Recall@10',
    # Manipulation detection metrics
    'MC@5', 'MC@10', 'MC@20',
    'avg_fake_in_top_10', 'avg_fake_in_top_20',
    'users_with_fake_pct_top_10', 'users_with_fake_pct_top_20',
    'avg_fake_ratio_top_10', 'avg_fake_ratio_top_20'
]

# Metrics for plotting (excluding real_only benchmark)
fake_detection_metrics = [
    'MC@5', 'MC@10', 'MC@20',
    'avg_fake_in_top_10', 'avg_fake_in_top_20',
    'users_with_fake_pct_top_10', 'users_with_fake_pct_top_20',
    'avg_fake_ratio_top_10', 'avg_fake_ratio_top_20'
]

best_ratios = {}

for benchmark in benchmarks_to_plot:
    print(f"\n{'='*60}")
    print(f"Benchmark: {benchmark.replace('_', ' ').title()}")
    print(f"{'='*60}")
    
    best_ratios[benchmark] = {}
    
    # Iterate over model variants
    for variant in ['finetune', 'frozen']:
        print(f"\n  Model Variant: {variant.upper()}")
        print(f"  {'-'*56}")
        
        best_ratios[benchmark][variant] = {}
        
        # Get unlearning methods for this variant
        unlearned_data = df_agg[
            (df_agg['model_type'] == 'unlearned') & 
            (df_agg['benchmark'] == benchmark) &
            (df_agg['model_variant'] == variant) &
            (df_agg['trial'].isna())
        ]
        
        if len(unlearned_data) == 0:
            continue
        
        methods = unlearned_data['unlearning_method'].unique()
        
        for method in methods:
            print(f"\n  {method.replace('_', ' ').title()}:")
            
            best_ratio, best_row = find_best_ratio_for_method(
                df_agg, method, benchmark, variant
            )
            
            if best_ratio is not None:
                print(f"    Best Ratio: {best_ratio}")
                print(f"    Sample metrics at best ratio:")
                for metric in ['MC@5', 'MC@10', 'avg_fake_in_top_10']:
                    print(f"      {metric}: {best_row[metric]:.4f}")
                
                best_ratios[benchmark][variant][method] = {
                    'ratio': best_ratio,
                    'data': best_row
                }


FINDING BEST UNLEARNING RATIOS

Benchmark: Benchmark Honeypot

  Model Variant: FINETUNE
  --------------------------------------------------------

  First Order:
    Best Ratio: 0.1
    Sample metrics at best ratio:
      MC@5: 0.6446
      MC@10: 0.6208
      avg_fake_in_top_10: 6.2085

  Gradient Ascent:
    Best Ratio: 0.05
    Sample metrics at best ratio:
      MC@5: 0.6957
      MC@10: 0.6921
      avg_fake_in_top_10: 6.9212

  Model Variant: FROZEN
  --------------------------------------------------------

  First Order:
    Best Ratio: 0.2
    Sample metrics at best ratio:
      MC@5: 0.6780
      MC@10: 0.6049
      avg_fake_in_top_10: 6.0884

  Gradient Ascent:
    Best Ratio: 0.01
    Sample metrics at best ratio:
      MC@5: 0.6110
      MC@10: 0.6172
      avg_fake_in_top_10: 6.1720

Benchmark: Benchmark Mixed

  Model Variant: FINETUNE
  --------------------------------------------------------

  First Order:
    Best Ratio: 0.1
    Sample metrics at best ratio:
     

## 10. Calculate Unlearning Impact (Change from Poisoned Model)
```
Recovery % = (Poisoned Value - Unlearned Value) / (Poisoned Value - Clean Value) × 100
```

In [15]:
def calculate_unlearning_impact(df_agg, best_ratios, metrics):
    """
    Calculate how much unlearning changed the results compared to poisoned model.
    Reports both absolute change and percentage change.
    Compares finetune vs frozen as well.
    
    Special handling for honeypot benchmark:
    - Performance metrics: Lower is better (don't recommend fakes)
    - Manipulation metrics: Lower is better (same as other benchmarks)
    """
    
    # Define which metrics are "higher is better" (for mixed/real_only benchmarks)
    performance_metrics = ['AUC', 'MRR', 'NDCG@5', 'NDCG@10', 'Recall@5', 'Recall@10']
    
    impact_results = []
    
    for benchmark in benchmarks_to_plot:
        if benchmark not in best_ratios:
            continue
        
        for variant in ['finetune', 'frozen']:
            if variant not in best_ratios[benchmark]:
                continue
            
            # Get poisoned baseline for this variant
            poisoned_data = df_agg[
                (df_agg['model_type'] == 'poisoned') & 
                (df_agg['benchmark'] == benchmark) &
                (df_agg['model_variant'] == variant)
            ]
            
            # Get clean baseline for reference
            clean_data = df_agg[
                (df_agg['model_type'] == 'clean') & 
                (df_agg['benchmark'] == benchmark) &
                (df_agg['model_variant'] == variant)
            ]
            
            if len(poisoned_data) == 0:
                continue
            
            poisoned_row = poisoned_data.iloc[0]
            clean_row = clean_data.iloc[0] if len(clean_data) > 0 else None
            
            for method, method_info in best_ratios[benchmark][variant].items():
                best_ratio = method_info['ratio']
                unlearned_row = method_info['data']
                
                for metric in metrics:
                    poisoned_val = poisoned_row[metric]
                    unlearned_val = unlearned_row[metric]
                    clean_val = clean_row[metric] if clean_row is not None else None
                    
                    # Calculate changes
                    absolute_change = unlearned_val - poisoned_val
                    
                    if poisoned_val != 0:
                        percent_change = (absolute_change / poisoned_val) * 100
                    else:
                        percent_change = 0 if absolute_change == 0 else float('inf')
                    
                    # Calculate recovery towards clean (if clean exists)
                    recovery_pct = None
                    if clean_val is not None and poisoned_val != clean_val:
                        # Special handling for honeypot benchmark
                        if benchmark == 'benchmark_honeypot' and metric in performance_metrics:
                            # On honeypot, lower performance = better (means not recommending fakes)
                            # We want to move from poisoned DOWN toward clean (or toward 0)
                            total_gap = poisoned_val - clean_val  # Positive gap (going down)
                            recovered_gap = poisoned_val - unlearned_val  # Positive if we moved down
                        else:
                            # Normal case: 
                            # - For performance metrics on mixed/real_only: higher is better
                            # - For manipulation metrics on all benchmarks: lower is better
                            total_gap = clean_val - poisoned_val  # Gap we want to close
                            recovered_gap = unlearned_val - poisoned_val  # How much we moved
                        
                        recovery_pct = (recovered_gap / total_gap) * 100 if total_gap != 0 else 0
                    
                    impact_results.append({
                        'benchmark': benchmark,
                        'model_variant': variant,
                        'method': method,
                        'best_ratio': best_ratio,
                        'metric': metric,
                        'clean_value': clean_val,
                        'poisoned_value': poisoned_val,
                        'unlearned_value': unlearned_val,
                        'absolute_change': absolute_change,
                        'percent_change': percent_change,
                        'recovery_pct': recovery_pct
                    })
    
    return pd.DataFrame(impact_results)

In [16]:
print("\n" + "="*80)
print("CALCULATING UNLEARNING IMPACT")
print("="*80)

impact_df = calculate_unlearning_impact(df_agg, best_ratios, all_analysis_metrics)

# Save the impact analysis
impact_df.to_csv('tables/unlearning_impact_analysis.csv', index=False)
print("\nSaved impact analysis to: tables/unlearning_impact_analysis.csv")

# Display summary
print(f"\nImpact Analysis Summary:")
print(f"Shape: {impact_df.shape}")
print(f"\nSample results:")
impact_df.head(15)


CALCULATING UNLEARNING IMPACT

Saved impact analysis to: tables/unlearning_impact_analysis.csv

Impact Analysis Summary:
Shape: (120, 11)

Sample results:


Unnamed: 0,benchmark,model_variant,method,best_ratio,metric,clean_value,poisoned_value,unlearned_value,absolute_change,percent_change,recovery_pct
0,benchmark_honeypot,finetune,first_order,0.1,AUC,0.682974,0.623013,0.624149,0.001136,0.182267,1.893809
1,benchmark_honeypot,finetune,first_order,0.1,MRR,0.152745,0.149383,0.166241,0.016858,11.285408,501.340324
2,benchmark_honeypot,finetune,first_order,0.1,NDCG@5,0.146443,0.117218,0.143585,0.026366,22.493196,90.220008
3,benchmark_honeypot,finetune,first_order,0.1,NDCG@10,0.190852,0.160588,0.184509,0.023921,14.896039,79.042375
4,benchmark_honeypot,finetune,first_order,0.1,Recall@5,0.268012,0.175793,0.213256,0.037464,21.311475,40.625
5,benchmark_honeypot,finetune,first_order,0.1,Recall@10,0.40634,0.308357,0.341018,0.032661,10.5919,33.333333
6,benchmark_honeypot,finetune,first_order,0.1,MC@5,0.760807,0.689914,0.644573,-0.045341,-6.571986,-63.95664
7,benchmark_honeypot,finetune,first_order,0.1,MC@10,0.623919,0.692507,0.620845,-0.071662,-10.348176,104.481793
8,benchmark_honeypot,finetune,first_order,0.1,MC@20,0.624063,0.628386,0.632853,0.004467,0.710846,-103.333333
9,benchmark_honeypot,finetune,first_order,0.1,avg_fake_in_top_10,6.239193,6.925072,6.208453,-0.716619,-10.348176,104.481793


## 11. Create Detailed Impact Report

In [17]:
print("\n" + "="*80)
print("UNLEARNING IMPACT REPORT")
print("="*80)
print("\nThis report shows how much unlearning (at best ratio) changed each metric")
print("compared to the poisoned model, and what % of the gap to clean was recovered.")
print("Results are shown separately for Finetune and Frozen models.")
print("="*80)

for benchmark in benchmarks_to_plot:
    benchmark_impact = impact_df[impact_df['benchmark'] == benchmark]
    
    if len(benchmark_impact) == 0:
        continue
    
    print(f"\n\n{'#'*80}")
    print(f"# BENCHMARK: {benchmark.replace('_', ' ').title()}")
    print(f"{'#'*80}")
    
    variants = benchmark_impact['model_variant'].unique()
    
    for variant in variants:
        variant_impact = benchmark_impact[benchmark_impact['model_variant'] == variant]
        
        if len(variant_impact) == 0:
            continue
        
        print(f"\n{'='*80}")
        print(f"Model Variant: {variant.upper()}")
        print(f"{'='*80}")
        
        methods = variant_impact['method'].unique()
        
        for method in methods:
            method_impact = variant_impact[variant_impact['method'] == method]
            
            if len(method_impact) == 0:
                continue
            
            best_ratio = method_impact['best_ratio'].iloc[0]
            
            print(f"\n{'-'*80}")
            print(f"Method: {method.replace('_', ' ').title()}")
            print(f"Best Ratio: {best_ratio}")
            print(f"{'-'*80}")
            
            print(f"\n{'Metric':<30} {'Clean':<12} {'Poisoned':<12} {'Unlearned':<12} {'Change':<12} {'% Change':<12} {'Recovery %':<12}")
            print(f"{'-'*130}")
            
            for _, row in method_impact.iterrows():
                metric = row['metric']
                clean_val = row['clean_value']
                poisoned_val = row['poisoned_value']
                unlearned_val = row['unlearned_value']
                abs_change = row['absolute_change']
                pct_change = row['percent_change']
                recovery = row['recovery_pct']
                
                # Format values
                clean_str = f"{clean_val:.4f}" if clean_val is not None else "N/A"
                poisoned_str = f"{poisoned_val:.4f}"
                unlearned_str = f"{unlearned_val:.4f}"
                change_str = f"{abs_change:+.4f}"
                pct_str = f"{pct_change:+.2f}%" if abs(pct_change) != float('inf') else "N/A"
                recovery_str = f"{recovery:.2f}%" if recovery is not None else "N/A"
                
                print(f"{metric:<30} {clean_str:<12} {poisoned_str:<12} {unlearned_str:<12} {change_str:<12} {pct_str:<12} {recovery_str:<12}")



UNLEARNING IMPACT REPORT

This report shows how much unlearning (at best ratio) changed each metric
compared to the poisoned model, and what % of the gap to clean was recovered.
Results are shown separately for Finetune and Frozen models.


################################################################################
# BENCHMARK: Benchmark Honeypot
################################################################################

Model Variant: FINETUNE

--------------------------------------------------------------------------------
Method: First Order
Best Ratio: 0.1
--------------------------------------------------------------------------------

Metric                         Clean        Poisoned     Unlearned    Change       % Change     Recovery %  
----------------------------------------------------------------------------------------------------------------------------------
AUC                            0.6830       0.6230       0.6241       +0.0011      +0.18%       1.8

## 12. Visualize Impact Analysis

In [18]:
def create_impact_visualization(impact_df, benchmark_type, output_dir='plots'):
    """Create visualization showing the impact of unlearning, comparing finetune vs frozen"""
    
    bench_data = impact_df[impact_df['benchmark'] == benchmark_type]
    
    if len(bench_data) == 0:
        return
    
    variants = bench_data['model_variant'].unique()
    methods = bench_data['method'].unique()
    
    # Create subplots: one row per variant, one column per method
    fig, axes = plt.subplots(len(variants), len(methods), 
                             figsize=(8*len(methods), 6*len(variants)))
    
    # Handle single variant or method cases
    if len(variants) == 1 and len(methods) == 1:
        axes = np.array([[axes]])
    elif len(variants) == 1:
        axes = axes.reshape(1, -1)
    elif len(methods) == 1:
        axes = axes.reshape(-1, 1)
    
    for v_idx, variant in enumerate(variants):
        for m_idx, method in enumerate(methods):
            ax = axes[v_idx, m_idx]
            
            # Get data for this variant and method
            data = bench_data[
                (bench_data['model_variant'] == variant) & 
                (bench_data['method'] == method)
            ]
            
            if len(data) == 0:
                ax.axis('off')
                continue
            
            # Prepare data for plotting
            x = range(len(data))
            metrics_list = data['metric'].tolist()
            recovery_pcts = data['recovery_pct'].tolist()
            
            # Color based on recovery percentage
            colors = ['green' if r and r > 50 else 'orange' if r and r > 0 else 'red' 
                      for r in recovery_pcts]
            
            bars = ax.barh(x, recovery_pcts, color=colors, alpha=0.7)
            ax.set_yticks(x)
            ax.set_yticklabels(metrics_list, fontsize=9)
            ax.set_xlabel('Recovery Percentage (%)', fontsize=11, fontweight='bold')
            
            # Title shows both variant and method
            title = f'{variant.upper()} - {method.replace("_", " ").title()}'
            best_ratio = data['best_ratio'].iloc[0]
            title += f'\n(Best Ratio: {best_ratio})'
            ax.set_title(title, fontsize=12, fontweight='bold')
            
            ax.axvline(x=0, color='black', linestyle='-', linewidth=0.8)
            ax.axvline(x=100, color='green', linestyle='--', linewidth=1, alpha=0.5, label='Full Recovery')
            ax.grid(True, alpha=0.3, axis='x')
            ax.legend(fontsize=8)
            
            # Add value labels on bars
            for i, (bar, val) in enumerate(zip(bars, recovery_pcts)):
                if val is not None:
                    ax.text(val, bar.get_y() + bar.get_height()/2, 
                           f'{val:.1f}%', 
                           va='center', ha='left' if val >= 0 else 'right',
                           fontsize=8, fontweight='bold')
    
    plt.suptitle(f'Unlearning Recovery Analysis (Finetune vs Frozen) - {benchmark_type.replace("_", " ").title()}',
                 fontsize=16, fontweight='bold', y=1.00)
    plt.tight_layout()
    
    filename = f"{output_dir}/impact_recovery_{benchmark_type}.png"
    plt.savefig(filename, dpi=300, bbox_inches='tight')
    plt.close()
    
    print(f"Saved impact visualization: {filename}")

In [19]:
print("\nGenerating impact visualizations...")
print("="*60)

for benchmark in benchmarks_to_plot:
    create_impact_visualization(impact_df, benchmark)

print("\nAll impact visualizations generated!")


Generating impact visualizations...
Saved impact visualization: plots/impact_recovery_benchmark_honeypot.png
Saved impact visualization: plots/impact_recovery_benchmark_mixed.png

All impact visualizations generated!


## 13. Create Summary Tables

In [20]:
def create_summary_comparison_table(impact_df):
    """Create a summary table comparing methods across all metrics, including finetune vs frozen"""
    
    summary_data = []
    
    for benchmark in impact_df['benchmark'].unique():
        bench_data = impact_df[impact_df['benchmark'] == benchmark]
        
        for variant in bench_data['model_variant'].unique():
            variant_data = bench_data[bench_data['model_variant'] == variant]
            
            for method in variant_data['method'].unique():
                method_data = variant_data[variant_data['method'] == method]
                
                # Calculate average recovery
                avg_recovery = method_data['recovery_pct'].mean()
                
                # Count metrics that improved (negative change for manipulation metrics is good)
                improved_metrics = (method_data['absolute_change'] < 0).sum()
                total_metrics = len(method_data)
                
                # Average absolute change
                avg_abs_change = method_data['absolute_change'].mean()
                avg_pct_change = method_data['percent_change'].mean()
                
                summary_data.append({
                    'Benchmark': benchmark.replace('_', ' ').title(),
                    'Model Variant': variant.upper(),
                    'Method': method.replace('_', ' ').title(),
                    'Best Ratio': method_data['best_ratio'].iloc[0],
                    'Avg Recovery %': avg_recovery,
                    'Improved Metrics': f"{improved_metrics}/{total_metrics}",
                    'Avg Absolute Change': avg_abs_change,
                    'Avg % Change': avg_pct_change
                })
    
    summary_df = pd.DataFrame(summary_data)
    
    # Save
    summary_df.to_csv('tables/unlearning_summary.csv', index=False)
    print("\nSaved summary table to: tables/unlearning_summary.csv")
    
    return summary_df

In [21]:
print("\n" + "="*80)
print("CREATING SUMMARY COMPARISON TABLE")
print("="*80)

summary_comparison = create_summary_comparison_table(impact_df)
print("\nSummary Comparison:")
print(summary_comparison.to_string(index=False))


CREATING SUMMARY COMPARISON TABLE

Saved summary table to: tables/unlearning_summary.csv

Summary Comparison:
         Benchmark Model Variant          Method  Best Ratio  Avg Recovery % Improved Metrics  Avg Absolute Change  Avg % Change
Benchmark Honeypot      FINETUNE     First Order        0.10       52.764891             4/15            -0.044574      3.018421
Benchmark Honeypot      FINETUNE Gradient Ascent        0.05      -26.317381             4/15             0.009805      0.850884
Benchmark Honeypot        FROZEN     First Order        0.20      191.770218             3/15             0.112991     95.600925
Benchmark Honeypot        FROZEN Gradient Ascent        0.01       -0.482228            11/15            -0.002316      0.096687
   Benchmark Mixed      FINETUNE     First Order        0.10     -109.877799            10/15            -0.133312     -1.919125
   Benchmark Mixed      FINETUNE Gradient Ascent        0.05       58.334317             4/15             0.012673 

## 14. Create Metric-Specific Impact Tables

In [22]:
def create_metric_impact_tables(impact_df):
    """Create separate tables for each metric showing impact across methods, benchmarks, and variants"""
    
    metrics = impact_df['metric'].unique()
    
    for metric in metrics:
        metric_data = impact_df[impact_df['metric'] == metric].copy()
        
        # Pivot table for easy comparison
        pivot_table = metric_data.pivot_table(
            index=['benchmark', 'model_variant', 'method'],
            values=['clean_value', 'poisoned_value', 'unlearned_value', 
                   'absolute_change', 'percent_change', 'recovery_pct'],
            aggfunc='first'
        ).reset_index()
        
        # Format names
        pivot_table['benchmark'] = pivot_table['benchmark'].str.replace('_', ' ').str.title()
        pivot_table['model_variant'] = pivot_table['model_variant'].str.upper()
        pivot_table['method'] = pivot_table['method'].str.replace('_', ' ').str.title()
        
        # Save
        filename = f"tables/impact_{metric.replace('@', '_at_').replace('/', '_')}.csv"
        pivot_table.to_csv(filename, index=False)
        print(f"Saved: {filename}")

In [23]:
print("\n" + "="*80)
print("CREATING METRIC-SPECIFIC IMPACT TABLES")
print("="*80)

create_metric_impact_tables(impact_df)

print("\nAll metric-specific tables created!")



CREATING METRIC-SPECIFIC IMPACT TABLES
Saved: tables/impact_AUC.csv
Saved: tables/impact_MRR.csv
Saved: tables/impact_NDCG_at_5.csv
Saved: tables/impact_NDCG_at_10.csv
Saved: tables/impact_Recall_at_5.csv
Saved: tables/impact_Recall_at_10.csv
Saved: tables/impact_MC_at_5.csv
Saved: tables/impact_MC_at_10.csv
Saved: tables/impact_MC_at_20.csv
Saved: tables/impact_avg_fake_in_top_10.csv
Saved: tables/impact_avg_fake_in_top_20.csv
Saved: tables/impact_users_with_fake_pct_top_10.csv
Saved: tables/impact_users_with_fake_pct_top_20.csv
Saved: tables/impact_avg_fake_ratio_top_10.csv
Saved: tables/impact_avg_fake_ratio_top_20.csv

All metric-specific tables created!


## 15. Direct Finetune vs Frozen Comparison

In [24]:
def create_finetune_vs_frozen_comparison(impact_df, output_dir='plots'):
    """Create comparison visualizations showing finetune vs frozen side-by-side"""
    
    benchmarks = impact_df['benchmark'].unique()
    methods = impact_df['method'].unique()
    
    for benchmark in benchmarks:
        bench_data = impact_df[impact_df['benchmark'] == benchmark]
        
        if len(bench_data) == 0:
            continue
        
        for method in methods:
            method_data = bench_data[bench_data['method'] == method]
            
            if len(method_data) == 0:
                continue
            
            # Get finetune and frozen data
            finetune_data = method_data[method_data['model_variant'] == 'finetune'].sort_values('metric')
            frozen_data = method_data[method_data['model_variant'] == 'frozen'].sort_values('metric')
            
            if len(finetune_data) == 0 or len(frozen_data) == 0:
                continue
            
            # Make sure both have the same metrics in the same order
            common_metrics = list(set(finetune_data['metric'].tolist()) & set(frozen_data['metric'].tolist()))
            if len(common_metrics) == 0:
                continue
            
            finetune_data = finetune_data[finetune_data['metric'].isin(common_metrics)].sort_values('metric')
            frozen_data = frozen_data[frozen_data['metric'].isin(common_metrics)].sort_values('metric')
            
            # Create comparison plot
            fig, (ax1, ax2) = plt.subplots(1, 2, figsize=(16, 6))
            
            # Plot 1: Recovery percentage comparison
            metrics = finetune_data['metric'].tolist()
            x = np.arange(len(metrics))
            width = 0.35
            
            finetune_recovery = finetune_data['recovery_pct'].values
            frozen_recovery = frozen_data['recovery_pct'].values
            
            bars1 = ax1.bar(x - width/2, finetune_recovery, width, 
                           label='Finetune', alpha=0.8, color='steelblue')
            bars2 = ax1.bar(x + width/2, frozen_recovery, width, 
                           label='Frozen', alpha=0.8, color='darkorange')
            
            ax1.set_xlabel('Metrics', fontsize=11, fontweight='bold')
            ax1.set_ylabel('Recovery Percentage (%)', fontsize=11, fontweight='bold')
            ax1.set_title(f'Recovery % Comparison: {method.replace("_", " ").title()}', 
                         fontsize=12, fontweight='bold')
            ax1.set_xticks(x)
            ax1.set_xticklabels(metrics, rotation=45, ha='right', fontsize=9)
            ax1.axhline(y=100, color='green', linestyle='--', linewidth=1, alpha=0.5, label='100% Recovery')
            ax1.axhline(y=0, color='black', linestyle='-', linewidth=0.8)
            ax1.legend()
            ax1.grid(True, alpha=0.3, axis='y')
            
            # Add value labels on bars
            for bar in bars1:
                height = bar.get_height()
                if height is not None and not np.isnan(height):
                    ax1.text(bar.get_x() + bar.get_width()/2., height,
                           f'{height:.1f}%',
                           ha='center', va='bottom' if height >= 0 else 'top', fontsize=8)
            
            for bar in bars2:
                height = bar.get_height()
                if height is not None and not np.isnan(height):
                    ax1.text(bar.get_x() + bar.get_width()/2., height,
                           f'{height:.1f}%',
                           ha='center', va='bottom' if height >= 0 else 'top', fontsize=8)
            
            # Plot 2: Absolute change comparison
            finetune_change = finetune_data['absolute_change'].values
            frozen_change = frozen_data['absolute_change'].values
            
            bars3 = ax2.bar(x - width/2, finetune_change, width, 
                           label='Finetune', alpha=0.8, color='steelblue')
            bars4 = ax2.bar(x + width/2, frozen_change, width, 
                           label='Frozen', alpha=0.8, color='darkorange')
            
            ax2.set_xlabel('Metrics', fontsize=11, fontweight='bold')
            ax2.set_ylabel('Absolute Change from Poisoned', fontsize=11, fontweight='bold')
            ax2.set_title(f'Absolute Change Comparison: {method.replace("_", " ").title()}', 
                         fontsize=12, fontweight='bold')
            ax2.set_xticks(x)
            ax2.set_xticklabels(metrics, rotation=45, ha='right', fontsize=9)
            ax2.axhline(y=0, color='black', linestyle='-', linewidth=0.8)
            ax2.legend()
            ax2.grid(True, alpha=0.3, axis='y')
            
            # Add value labels on bars
            for bar in bars3:
                height = bar.get_height()
                if height is not None and not np.isnan(height):
                    ax2.text(bar.get_x() + bar.get_width()/2., height,
                           f'{height:.3f}',
                           ha='center', va='bottom' if height >= 0 else 'top', fontsize=8)
            
            for bar in bars4:
                height = bar.get_height()
                if height is not None and not np.isnan(height):
                    ax2.text(bar.get_x() + bar.get_width()/2., height,
                           f'{height:.3f}',
                           ha='center', va='bottom' if height >= 0 else 'top', fontsize=8)
            
            plt.suptitle(f'{benchmark.replace("_", " ").title()} - Finetune vs Frozen Comparison',
                        fontsize=14, fontweight='bold', y=1.02)
            plt.tight_layout()
            
            filename = f"{output_dir}/finetune_vs_frozen_{method}_{benchmark}.png"
            plt.savefig(filename, dpi=300, bbox_inches='tight')
            plt.close()
            
            print(f"Saved comparison: {filename}")

In [25]:
print("\n" + "="*80)
print("CREATING FINETUNE VS FROZEN COMPARISON PLOTS")
print("="*80)

create_finetune_vs_frozen_comparison(impact_df)

print("\nAll finetune vs frozen comparison plots created!")


CREATING FINETUNE VS FROZEN COMPARISON PLOTS
Saved comparison: plots/finetune_vs_frozen_first_order_benchmark_honeypot.png
Saved comparison: plots/finetune_vs_frozen_gradient_ascent_benchmark_honeypot.png
Saved comparison: plots/finetune_vs_frozen_first_order_benchmark_mixed.png
Saved comparison: plots/finetune_vs_frozen_gradient_ascent_benchmark_mixed.png

All finetune vs frozen comparison plots created!


## 16. Analysis Complete - Summary

In [26]:
print("\n" + "="*80)
print("ANALYSIS COMPLETE!")
print("="*80)

print("\n" + "="*80)
print("GENERATED FILES SUMMARY")
print("="*80)

print("\n📊 TABLES (in 'tables/' directory):")
print("  Core Metric Tables:")
print("    - manipulation_detection_metrics.csv")
print("    - average_fake_items.csv")
print("    - user_coverage_metrics.csv")
print("    - fake_ratio_metrics.csv")
print("    - performance_metrics.csv")
print("\n  Impact Analysis Tables:")
print("    - unlearning_impact_analysis.csv (detailed impact for all metrics)")
print("    - unlearning_summary.csv (high-level comparison)")
print("    - impact_[metric_name].csv (one table per metric)")

print("\n📈 PLOTS (in 'plots/' directory):")
print("  Individual Metric Plots:")
print("    - [metric]_[benchmark].png (line plots comparing finetune vs frozen)")
print("\n  Combined Visualizations:")
print("    - combined_manipulation_detection_[benchmark].png")
print("    - combined_average_fake_items_[benchmark].png")
print("    - combined_user_coverage_[benchmark].png")
print("    - combined_fake_ratios_[benchmark].png")
print("\n  Impact Visualizations:")
print("    - impact_recovery_[benchmark].png (recovery bars for finetune & frozen)")
print("    - finetune_vs_frozen_[method]_[benchmark].png (direct comparison plots)")

print("\n" + "="*80)
print("KEY FINDINGS")
print("="*80)

# Calculate some key statistics
if len(impact_df) > 0:
    print("\n🔍 Overall Unlearning Performance:")
    
    for benchmark in benchmarks_to_plot:
        bench_data = impact_df[impact_df['benchmark'] == benchmark]
        if len(bench_data) == 0:
            continue
        
        print(f"\n  {benchmark.replace('_', ' ').title()}:")
        
        for variant in ['finetune', 'frozen']:
            variant_data = bench_data[bench_data['model_variant'] == variant]
            if len(variant_data) == 0:
                continue
            
            print(f"\n    {variant.upper()}:")
            
            for method in variant_data['method'].unique():
                method_data = variant_data[variant_data['method'] == method]
                avg_recovery = method_data['recovery_pct'].mean()
                best_ratio = method_data['best_ratio'].iloc[0]
                
                improved_count = (method_data['absolute_change'] < 0).sum()
                total_count = len(method_data)
                
                print(f"      {method.replace('_', ' ').title()} (ratio={best_ratio}):")
                print(f"        - Average Recovery: {avg_recovery:.2f}%")
                print(f"        - Metrics Improved: {improved_count}/{total_count}")
                print(f"        - Average Change: {method_data['percent_change'].mean():.2f}%")

print("\n" + "="*80)
print("\n✅ To view results:")
print("  1. Check 'tables/' directory for detailed CSV files")
print("  2. Check 'plots/' directory for visualizations")
print("  3. Review 'unlearning_impact_analysis.csv' for complete impact analysis")
print("  4. Review 'unlearning_summary.csv' for high-level comparison")
print("\n" + "="*80)


ANALYSIS COMPLETE!

GENERATED FILES SUMMARY

📊 TABLES (in 'tables/' directory):
  Core Metric Tables:
    - manipulation_detection_metrics.csv
    - average_fake_items.csv
    - user_coverage_metrics.csv
    - fake_ratio_metrics.csv
    - performance_metrics.csv

  Impact Analysis Tables:
    - unlearning_impact_analysis.csv (detailed impact for all metrics)
    - unlearning_summary.csv (high-level comparison)
    - impact_[metric_name].csv (one table per metric)

📈 PLOTS (in 'plots/' directory):
  Individual Metric Plots:
    - [metric]_[benchmark].png (line plots comparing finetune vs frozen)

  Combined Visualizations:
    - combined_manipulation_detection_[benchmark].png
    - combined_average_fake_items_[benchmark].png
    - combined_user_coverage_[benchmark].png
    - combined_fake_ratios_[benchmark].png

  Impact Visualizations:
    - impact_recovery_[benchmark].png (recovery bars for finetune & frozen)
    - finetune_vs_frozen_[method]_[benchmark].png (direct comparison plots)