# Baseline Twin-B Experiment Analysis

This notebook analyzes the results from Experiment 2: Baseline Twin-B simulation with profiling.

## Analysis Goals
1. Identify CPU-GPU synchronization bottlenecks
2. Measure NCCL communication overhead
3. Analyze memory transfer patterns
4. Compare against DREAM'26 paper metrics
5. Establish baseline for future optimizations

In [None]:
import pandas as pd
import numpy as np
import matplotlib.pyplot as plt
import seaborn as sns
import json
from pathlib import Path

# Set style
sns.set_style('whitegrid')
plt.rcParams['figure.figsize'] = (12, 6)

# Update this path to your experiment results directory
RESULTS_DIR = Path('../../experiment2_results/baseline_XXXXX')  # Replace XXXXX with job ID

print(f"Analyzing results from: {RESULTS_DIR}")

## 1. GPU Metrics Analysis

In [None]:
# Load GPU metrics
gpu_baseline = pd.read_csv(RESULTS_DIR / 'gpu_metrics_baseline.csv')
gpu_profiled = pd.read_csv(RESULTS_DIR / 'gpu_metrics_profiled.csv')

# Clean column names
gpu_baseline.columns = gpu_baseline.columns.str.strip()
gpu_profiled.columns = gpu_profiled.columns.str.strip()

print("GPU Metrics - Baseline Run")
print(f"Samples collected: {len(gpu_baseline)}")
print(f"Duration: ~{len(gpu_baseline)} seconds")
print(f"\nAverage metrics:")
print(f"  Power draw: {gpu_baseline['power_draw_w'].mean():.2f}W (max: {gpu_baseline['power_draw_w'].max():.2f}W)")
print(f"  GPU utilization: {gpu_baseline['gpu_utilization_pct'].mean():.2f}%")
print(f"  Memory used: {gpu_baseline['memory_used_mb'].mean():.2f}MB")
print(f"  Temperature: {gpu_baseline['temperature_c'].mean():.2f}째C")

In [None]:
# Plot GPU metrics over time
fig, axes = plt.subplots(2, 2, figsize=(15, 10))

# Separate by GPU
for gpu_id in gpu_baseline['gpu_id'].unique():
    gpu_data = gpu_baseline[gpu_baseline['gpu_id'] == gpu_id]
    
    # Power
    axes[0, 0].plot(gpu_data['power_draw_w'], label=f'GPU {gpu_id}', alpha=0.7)
    # Utilization
    axes[0, 1].plot(gpu_data['gpu_utilization_pct'], label=f'GPU {gpu_id}', alpha=0.7)
    # Memory
    axes[1, 0].plot(gpu_data['memory_used_mb'], label=f'GPU {gpu_id}', alpha=0.7)
    # Temperature
    axes[1, 1].plot(gpu_data['temperature_c'], label=f'GPU {gpu_id}', alpha=0.7)

axes[0, 0].set_title('Power Draw Over Time')
axes[0, 0].set_ylabel('Power (W)')
axes[0, 0].legend()

axes[0, 1].set_title('GPU Utilization Over Time')
axes[0, 1].set_ylabel('Utilization (%)')
axes[0, 1].legend()

axes[1, 0].set_title('Memory Usage Over Time')
axes[1, 0].set_ylabel('Memory (MB)')
axes[1, 0].set_xlabel('Sample')
axes[1, 0].legend()

axes[1, 1].set_title('Temperature Over Time')
axes[1, 1].set_ylabel('Temperature (째C)')
axes[1, 1].set_xlabel('Sample')
axes[1, 1].legend()

plt.tight_layout()
plt.savefig(RESULTS_DIR / 'gpu_metrics_timeline.png', dpi=150)
plt.show()

## 2. Agent Simulation Results

In [None]:
# Load agent results
agents_df = pd.read_csv(RESULTS_DIR / 'mesa_agent_results_baseline.csv')

print("Agent Simulation Summary")
print(f"Total records: {len(agents_df):,}")
print(f"Unique agents: {agents_df['agent_id'].nunique()}")
print(f"Unique zones: {agents_df['room'].nunique()}")
print(f"Simulation steps: {agents_df['step'].max() + 1}")
print(f"\nAgent type distribution:")
print(agents_df['agent_type'].value_counts())
print(f"\nComfort metrics:")
print(f"  Average comfort level: {agents_df['comfort_level'].mean():.2f}")
print(f"  AC usage rate: {agents_df['using_ac'].mean()*100:.2f}%")
print(f"  Average current temp: {agents_df['current_temp'].mean():.2f}째C")
print(f"  Average preferred temp: {agents_df['preferred_temp'].mean():.2f}째C")

In [None]:
# Plot AC usage over time
ac_usage_by_step = agents_df.groupby('step')['using_ac'].mean()

plt.figure(figsize=(14, 5))
plt.plot(ac_usage_by_step.index, ac_usage_by_step.values * 100)
plt.title('AC Usage Rate Over Simulation Time')
plt.xlabel('Step')
plt.ylabel('AC Usage Rate (%)')
plt.grid(True, alpha=0.3)
plt.savefig(RESULTS_DIR / 'ac_usage_over_time.png', dpi=150)
plt.show()

In [None]:
# Temperature distribution by zone
zone_temps = agents_df.groupby('room').agg({
    'current_temp': ['mean', 'std'],
    'using_ac': 'mean'
}).round(2)

zone_temps.columns = ['avg_temp', 'std_temp', 'ac_usage_rate']
zone_temps = zone_temps.sort_values('ac_usage_rate', ascending=False)

print("\nTop 10 zones by AC usage:")
print(zone_temps.head(10))

## 3. Profiling Analysis

### 3.1 CUDA API Summary

In [None]:
# Load CUDA API summary (if available)
cuda_api_file = RESULTS_DIR / 'cuda_api_summary.csv'
if cuda_api_file.exists():
    cuda_api = pd.read_csv(cuda_api_file)
    print("CUDA API Summary")
    print(cuda_api.head(20))
    
    # Look for cudaStreamSynchronize
    if 'Function' in cuda_api.columns or 'Name' in cuda_api.columns:
        name_col = 'Function' if 'Function' in cuda_api.columns else 'Name'
        sync_calls = cuda_api[cuda_api[name_col].str.contains('Synchronize', na=False)]
        if not sync_calls.empty:
            print("\n=== cudaStreamSynchronize Bottleneck Analysis ===")
            print(sync_calls)
            print("\nPer DREAM'26 paper: Expected ~66% of CUDA API time")
else:
    print("CUDA API summary not found. Check nsys stats output.")

### 3.2 GPU Kernel Summary

In [None]:
# Load GPU kernel summary
kernel_file = RESULTS_DIR / 'gpu_kernel_summary.csv'
if kernel_file.exists():
    kernels = pd.read_csv(kernel_file)
    print("GPU Kernel Summary (Top 20)")
    print(kernels.head(20))
    
    # Look for NCCL kernels
    if 'Name' in kernels.columns or 'Kernel' in kernels.columns:
        name_col = 'Name' if 'Name' in kernels.columns else 'Kernel'
        nccl_kernels = kernels[kernels[name_col].str.contains('nccl', case=False, na=False)]
        if not nccl_kernels.empty:
            print("\n=== NCCL Communication Analysis ===")
            print(nccl_kernels)
            print("\nPer DREAM'26 paper:")
            print("  Expected AllGather: ~32.7% of GPU kernel time")
            print("  Expected AllReduce: ~31.7% of GPU kernel time")
else:
    print("GPU kernel summary not found. Check nsys stats output.")

### 3.3 Memory Operations

In [None]:
# Load memory operation summary
mem_file = RESULTS_DIR / 'memory_operation_summary.csv'
if mem_file.exists():
    mem_ops = pd.read_csv(mem_file)
    print("Memory Operation Summary")
    print(mem_ops.head(20))
    print("\nPer DREAM'26 paper:")
    print("  Expected: Many small transfers (~37.5 bytes avg for H2D)")
    print("  Expected total: ~2.95 GB (2.32 GB D2H, 0.32 GB D2D)")
else:
    print("Memory operation summary not found. Check nsys stats output.")

## 4. Energy Consumption Analysis

In [None]:
# Calculate energy consumption
# Energy (kWh) = Power (W) * Time (hours) / 1000

# Assuming 1 sample per second
runtime_hours = len(gpu_baseline) / 3600

for gpu_id in gpu_baseline['gpu_id'].unique():
    gpu_data = gpu_baseline[gpu_baseline['gpu_id'] == gpu_id]
    avg_power = gpu_data['power_draw_w'].mean()
    energy_kwh = (avg_power * runtime_hours) / 1000
    
    print(f"\nGPU {gpu_id}:")
    print(f"  Average power: {avg_power:.2f}W")
    print(f"  Runtime: {runtime_hours:.2f} hours")
    print(f"  Energy consumed: {energy_kwh:.4f} kWh")

# Total energy
total_energy = (gpu_baseline['power_draw_w'].sum() * runtime_hours) / (1000 * len(gpu_baseline['gpu_id'].unique()))
print(f"\nTotal GPU energy consumption: {total_energy:.4f} kWh")

# Load runtime info
runtime_file = RESULTS_DIR / 'baseline_runtime.txt'
if runtime_file.exists():
    with open(runtime_file) as f:
        runtime_sec = int(f.read().strip())
        print(f"\nTotal simulation runtime: {runtime_sec} seconds ({runtime_sec/60:.2f} minutes)")

## 5. Bottleneck Identification Summary

In [None]:
# Load automated analysis if available
summary_file = RESULTS_DIR / 'baseline_analysis_summary.json'
if summary_file.exists():
    with open(summary_file) as f:
        summary = json.load(f)
    
    print("=" * 60)
    print("BASELINE ANALYSIS SUMMARY")
    print("=" * 60)
    print(json.dumps(summary, indent=2))
else:
    print("Automated summary not found.")

## 6. Comparison with DREAM'26 Paper Metrics

In [None]:
# Create comparison table
comparison_data = {
    'Metric': [
        'cudaStreamSynchronize overhead',
        'NCCL AllGather (% of GPU kernel time)',
        'NCCL AllReduce (% of GPU kernel time)',
        'Total NCCL communication',
        'Average H2D transfer size',
        'Primary CPU process usage',
        'Secondary CPU process usage'
    ],
    'DREAM\'26 Paper': [
        '66.3% of CUDA API time',
        '32.7%',
        '31.7%',
        '64.4%',
        '37.5 bytes',
        '96.99%',
        '2.02%'
    ],
    'This Experiment': [
        'TBD - Check nsys report',
        'TBD - Check nsys report',
        'TBD - Check nsys report',
        'TBD - Check nsys report',
        'TBD - Check nsys report',
        'TBD - Check logs',
        'TBD - Check logs'
    ]
}

comparison_df = pd.DataFrame(comparison_data)
print("\n" + "=" * 80)
print("COMPARISON WITH DREAM'26 PAPER")
print("=" * 80)
print(comparison_df.to_string(index=False))
print("\nNote: Open the .nsys-rep file in Nsight Systems GUI for detailed analysis")

## 7. Recommendations

Based on the DREAM'26 paper and this baseline analysis:

### Key Bottlenecks Identified:
1. **cudaStreamSynchronize overhead** - Blocks GPU-CPU concurrency
2. **NCCL communication dominates GPU time** - More time syncing than computing
3. **Small, fragmented memory transfers** - Poor PCIe bandwidth utilization
4. **CPU workload imbalance** - Secondary process underutilized
5. **Host-side blocking** - CPU threads waiting instead of working

### Optimization Opportunities:
1. **Batch data transfers** - Reduce number of small transfers
2. **Overlap communication with computation** - Use asynchronous operations
3. **Reduce synchronization frequency** - Balance accuracy vs. performance
4. **Load balancing** - Better distribute work across CPU processes
5. **Optimize NCCL collective operations** - Tune AllGather/AllReduce patterns

### Next Experiments:
1. Test different data exchange frequencies
2. Implement batched memory transfers
3. Evaluate async communication patterns
4. Profile with different GPU counts
5. Test energy-aware scheduling strategies