# Windows vs Ubuntu DVFS: Smart-Watt Comparison Analysis

This notebook implements the **Smart-Watt DVFS approach** from the vindhya repository and compares CPU behavior between:
- **Windows** laptop data (from vindhya/DVFS_F)
- **Ubuntu** laptop data

## Features Implemented:
1. ‚úÖ Temporal windowing (last 5 samples)
2. ‚úÖ Horizon-based prediction (1 second ahead)
3. ‚úÖ Probability-aware DVFS
4. ‚úÖ Hysteresis (frequency hold)
5. ‚úÖ Multi-level frequencies (LOW/MID/HIGH)
6. ‚úÖ Physics-based energy model
7. ‚úÖ Cross-OS comparison

## üì¶ Setup & Installation

In [None]:
# Install required packages
!pip install pandas numpy scikit-learn matplotlib seaborn joblib -q

import pandas as pd
import numpy as np
import matplotlib.pyplot as plt
import seaborn as sns
from sklearn.ensemble import RandomForestClassifier
from sklearn.metrics import accuracy_score, classification_report, confusion_matrix
import joblib
import warnings
warnings.filterwarnings('ignore')

# Set style
plt.style.use('seaborn-v0_8-darkgrid')
sns.set_palette("husl")

print("‚úÖ All packages installed successfully!")

## üìÅ Upload Data Files

Upload these files:
1. **Windows data**: `cpu_log_prepared.csv` from vindhya/DVFS_F/data/
2. **Ubuntu data**: `ubuntu_laptop_data.csv`

Or upload the entire `vindhya` folder as a zip file.

In [None]:
from google.colab import files
import os

print("üì§ Upload your data files:")
print("  1. cpu_log_prepared.csv (Windows)")
print("  2. ubuntu_laptop_data.csv (Ubuntu)")
print("\nOr upload vindhya.zip and we'll extract it")

uploaded = files.upload()

# Check if zip file uploaded
if any(f.endswith('.zip') for f in uploaded.keys()):
    zip_file = [f for f in uploaded.keys() if f.endswith('.zip')][0]
    !unzip -q {zip_file}
    print(f"‚úÖ Extracted {zip_file}")

# List uploaded files
print("\nüìÇ Available files:")
!ls -lh

## üìä Part 1: Data Loading & Exploration

In [None]:
# Load Windows data
try:
    df_windows = pd.read_csv('cpu_log_prepared.csv')
    print("‚úÖ Windows data loaded")
except:
    print("‚ö†Ô∏è  Could not find cpu_log_prepared.csv. Trying alternative paths...")
    try:
        df_windows = pd.read_csv('vindhya/DVFS_F/data/cpu_log_prepared.csv')
        print("‚úÖ Windows data loaded from vindhya folder")
    except:
        print("‚ùå Windows data not found. Please upload cpu_log_prepared.csv")
        df_windows = None

# Load Ubuntu data
try:
    df_ubuntu = pd.read_csv('ubuntu_laptop_data.csv')
    print("‚úÖ Ubuntu data loaded")
except:
    print("‚ùå Ubuntu data not found. Please upload ubuntu_laptop_data.csv")
    df_ubuntu = None

print("\n" + "="*60)
print("DATA SUMMARY")
print("="*60)

if df_windows is not None:
    print(f"\nü™ü WINDOWS DATA:")
    print(f"  Rows: {len(df_windows):,}")
    print(f"  Columns: {df_windows.columns.tolist()}")
    print(f"  Duration: ~{len(df_windows)*0.2/60:.1f} minutes (200ms intervals)")
    print(f"  CPU util mean: {df_windows['cpu_util'].mean():.2%}")

if df_ubuntu is not None:
    print(f"\nüêß UBUNTU DATA:")
    print(f"  Rows: {len(df_ubuntu):,}")
    print(f"  Columns: {df_ubuntu.columns.tolist()}")
    print(f"  Duration: ~{len(df_ubuntu)*11/60:.1f} minutes (11s intervals)")
    print(f"  CPU usage mean: {df_ubuntu['cpu_usage'].mean():.2f}%")

In [None]:
# Visualize raw data
fig, axes = plt.subplots(2, 2, figsize=(16, 10))
fig.suptitle('Windows vs Ubuntu: Raw Data Comparison', fontsize=16, fontweight='bold')

if df_windows is not None:
    # Windows CPU utilization
    axes[0, 0].plot(df_windows['cpu_util'][:1000], linewidth=0.8, alpha=0.7)
    axes[0, 0].set_title('Windows: CPU Utilization (First 1000 samples)', fontweight='bold')
    axes[0, 0].set_xlabel('Sample')
    axes[0, 0].set_ylabel('CPU Utilization')
    axes[0, 0].grid(True, alpha=0.3)
    
    # Windows frequency distribution
    axes[0, 1].hist(df_windows['cpu_util'], bins=50, edgecolor='black', alpha=0.7)
    axes[0, 1].set_title('Windows: CPU Utilization Distribution', fontweight='bold')
    axes[0, 1].set_xlabel('CPU Utilization')
    axes[0, 1].set_ylabel('Frequency')
    axes[0, 1].axvline(df_windows['cpu_util'].mean(), color='red', 
                       linestyle='--', label=f'Mean: {df_windows["cpu_util"].mean():.2%}')
    axes[0, 1].legend()
    axes[0, 1].grid(True, alpha=0.3)

if df_ubuntu is not None:
    # Ubuntu CPU usage
    axes[1, 0].plot(df_ubuntu['cpu_usage'][:1000], linewidth=0.8, alpha=0.7, color='green')
    axes[1, 0].set_title('Ubuntu: CPU Usage (First 1000 samples)', fontweight='bold')
    axes[1, 0].set_xlabel('Sample')
    axes[1, 0].set_ylabel('CPU Usage (%)')
    axes[1, 0].grid(True, alpha=0.3)
    
    # Ubuntu frequency distribution
    axes[1, 1].hist(df_ubuntu['cpu_usage'], bins=50, edgecolor='black', alpha=0.7, color='green')
    axes[1, 1].set_title('Ubuntu: CPU Usage Distribution', fontweight='bold')
    axes[1, 1].set_xlabel('CPU Usage (%)')
    axes[1, 1].set_ylabel('Frequency')
    axes[1, 1].axvline(df_ubuntu['cpu_usage'].mean(), color='red', 
                       linestyle='--', label=f'Mean: {df_ubuntu["cpu_usage"].mean():.2f}%')
    axes[1, 1].legend()
    axes[1, 1].grid(True, alpha=0.3)

plt.tight_layout()
plt.savefig('raw_data_comparison.png', dpi=300, bbox_inches='tight')
plt.show()

print("\nüíæ Saved: raw_data_comparison.png")

## üîß Part 2: Feature Engineering (Smart-Watt Approach)

We'll build features using **temporal windowing**:
- Last 5 CPU values
- Deltas (rate of change)
- Statistics (mean, std)

In [None]:
def build_features_smartwatt(cpu_values, window=5):
    """
    Build Smart-Watt style features from CPU utilization.
    
    Args:
        cpu_values: Array of CPU utilization values
        window: Number of past samples to use (default: 5)
    
    Returns:
        X: Feature matrix (n_samples, n_features)
        Feature count: 5 raw + 4 deltas + 2 stats = 11 features
    """
    X = []
    
    for i in range(window, len(cpu_values)):
        window_data = cpu_values[i - window:i]
        
        features = []
        
        # 1. Raw window values (5 features)
        features.extend(window_data)
        
        # 2. Deltas - rate of change (4 features)
        features.extend(np.diff(window_data))
        
        # 3. Statistics (2 features)
        features.append(np.mean(window_data))  # Mean
        features.append(np.std(window_data))   # Std dev
        
        X.append(features)
    
    return np.array(X)


def build_labels_horizon(cpu_values, window=5, horizon=5, threshold=0.30):
    """
    Build horizon-based binary labels for classification.
    
    Args:
        cpu_values: Array of CPU utilization (0-1 scale)
        window: Feature window size
        horizon: How many samples ahead to predict
        threshold: CPU threshold for HIGH frequency (default: 30%)
    
    Returns:
        y: Binary labels (1 = HIGH freq needed, 0 = LOW freq)
    """
    y = []
    
    # Skip first 'window' samples (used for features)
    for i in range(window, len(cpu_values) - horizon):
        # Look ahead 'horizon' samples and take average
        future_avg = np.mean(cpu_values[i:i + horizon])
        
        # Binary classification: HIGH (1) or LOW (0)
        y.append(1 if future_avg > threshold else 0)
    
    return np.array(y)


print("‚úÖ Feature engineering functions defined")
print("\nüìä Feature Engineering Strategy:")
print("  ‚Ä¢ Window size: 5 samples")
print("  ‚Ä¢ Horizon: 5 samples (~1 second ahead)")
print("  ‚Ä¢ Threshold: 30% CPU utilization")
print("  ‚Ä¢ Total features: 11 (5 raw + 4 deltas + 2 stats)")

In [None]:
# Build features for Windows data
if df_windows is not None:
    print("üîß Building features for Windows data...")
    
    cpu_vals_win = df_windows['cpu_util'].values
    
    X_windows = build_features_smartwatt(cpu_vals_win, window=5)
    y_windows = build_labels_horizon(cpu_vals_win, window=5, horizon=5, threshold=0.30)
    
    # Align X and y (y is shorter due to horizon)
    min_len = min(len(X_windows), len(y_windows))
    X_windows = X_windows[:min_len]
    y_windows = y_windows[:min_len]
    
    print(f"  ‚úÖ X_windows shape: {X_windows.shape}")
    print(f"  ‚úÖ y_windows shape: {y_windows.shape}")
    print(f"  ‚úÖ Class distribution: HIGH={y_windows.sum()} ({y_windows.mean():.1%}), LOW={len(y_windows)-y_windows.sum()} ({1-y_windows.mean():.1%})")
else:
    print("‚ö†Ô∏è  Skipping Windows feature engineering (no data)")
    X_windows, y_windows = None, None

In [None]:
# Build features for Ubuntu data
if df_ubuntu is not None:
    print("üîß Building features for Ubuntu data...")
    
    # Normalize Ubuntu CPU usage to 0-1 scale (it's in 0-100)
    cpu_vals_ubuntu = df_ubuntu['cpu_usage'].values / 100.0
    
    X_ubuntu = build_features_smartwatt(cpu_vals_ubuntu, window=5)
    y_ubuntu = build_labels_horizon(cpu_vals_ubuntu, window=5, horizon=5, threshold=0.30)
    
    # Align X and y
    min_len = min(len(X_ubuntu), len(y_ubuntu))
    X_ubuntu = X_ubuntu[:min_len]
    y_ubuntu = y_ubuntu[:min_len]
    
    print(f"  ‚úÖ X_ubuntu shape: {X_ubuntu.shape}")
    print(f"  ‚úÖ y_ubuntu shape: {y_ubuntu.shape}")
    print(f"  ‚úÖ Class distribution: HIGH={y_ubuntu.sum()} ({y_ubuntu.mean():.1%}), LOW={len(y_ubuntu)-y_ubuntu.sum()} ({1-y_ubuntu.mean():.1%})")
else:
    print("‚ö†Ô∏è  Skipping Ubuntu feature engineering (no data)")
    X_ubuntu, y_ubuntu = None, None

## ü§ñ Part 3: Model Training (Smart-Watt Classifier)

In [None]:
def train_smartwatt_model(X, y, model_name="Smart-Watt"):
    """
    Train Random Forest classifier using Smart-Watt parameters.
    """
    print(f"\nü§ñ Training {model_name} model...")
    
    # Time-aware split (NO SHUFFLE - preserve temporal order)
    split_idx = int(0.7 * len(X))
    X_train, X_test = X[:split_idx], X[split_idx:]
    y_train, y_test = y[:split_idx], y[split_idx:]
    
    print(f"  Train samples: {len(X_train):,}")
    print(f"  Test samples: {len(X_test):,}")
    
    # Smart-Watt model configuration
    model = RandomForestClassifier(
        n_estimators=400,
        max_depth=14,
        class_weight="balanced",  # Handle class imbalance
        random_state=42,
        n_jobs=-1,
        verbose=0
    )
    
    # Train
    model.fit(X_train, y_train)
    
    # Evaluate
    y_pred = model.predict(X_test)
    y_prob = model.predict_proba(X)[:, 1]  # Probability of HIGH class
    
    acc = accuracy_score(y_test, y_pred)
    
    print(f"\n{'='*50}")
    print(f"{model_name} RESULTS")
    print(f"{'='*50}")
    print(f"Accuracy: {acc*100:.2f}%")
    print(f"\nConfusion Matrix:")
    print(confusion_matrix(y_test, y_pred))
    print(f"\nClassification Report:")
    print(classification_report(y_test, y_pred, digits=3, target_names=['LOW', 'HIGH']))
    
    # Feature importance
    feature_names = [
        'CPU_t-5', 'CPU_t-4', 'CPU_t-3', 'CPU_t-2', 'CPU_t-1',
        'Delta_1', 'Delta_2', 'Delta_3', 'Delta_4',
        'Mean', 'Std'
    ]
    importances = pd.DataFrame({
        'Feature': feature_names,
        'Importance': model.feature_importances_
    }).sort_values('Importance', ascending=False)
    
    print(f"\nüîù Top 5 Most Important Features:")
    for idx, row in importances.head().iterrows():
        print(f"  {row['Feature']:12s}: {row['Importance']:.4f}")
    
    return model, y_prob, acc, importances


print("‚úÖ Training function defined")

In [None]:
# Train Windows model
if X_windows is not None and y_windows is not None:
    model_windows, y_prob_windows, acc_windows, importances_windows = train_smartwatt_model(
        X_windows, y_windows, model_name="Windows Smart-Watt"
    )
    
    # Save model
    joblib.dump(model_windows, 'smartwatt_windows.pkl')
    print("\nüíæ Model saved: smartwatt_windows.pkl")
else:
    print("‚ö†Ô∏è  Skipping Windows model training")
    model_windows, y_prob_windows, acc_windows = None, None, None

In [None]:
# Train Ubuntu model
if X_ubuntu is not None and y_ubuntu is not None:
    model_ubuntu, y_prob_ubuntu, acc_ubuntu, importances_ubuntu = train_smartwatt_model(
        X_ubuntu, y_ubuntu, model_name="Ubuntu Smart-Watt"
    )
    
    # Save model
    joblib.dump(model_ubuntu, 'smartwatt_ubuntu.pkl')
    print("\nüíæ Model saved: smartwatt_ubuntu.pkl")
else:
    print("‚ö†Ô∏è  Skipping Ubuntu model training")
    model_ubuntu, y_prob_ubuntu, acc_ubuntu = None, None, None

## ‚ö° Part 4: DVFS Simulation with Hysteresis

In [None]:
def simulate_smartwatt_dvfs(df, cpu_col, y_prob, num_processes_col=None, 
                           low_freq=1520, mid_freq=2000, high_freq=2400,
                           hold_high=5, hold_low=3, window_cpu=5):
    """
    Simulate Smart-Watt DVFS with:
    - Probability-aware decisions
    - Hysteresis (frequency hold)
    - Multi-level frequencies
    - Windowed decision making
    """
    print(f"\n‚ö° Simulating Smart-Watt DVFS...")
    print(f"  LOW: {low_freq} MHz, MID: {mid_freq} MHz, HIGH: {high_freq} MHz")
    print(f"  Hysteresis: HOLD_HIGH={hold_high}, HOLD_LOW={hold_low}")
    
    # Start from window position
    window = 5
    df_sim = df.iloc[window:].copy().reset_index(drop=True)
    
    # Align probability array
    min_len = min(len(df_sim), len(y_prob))
    df_sim = df_sim.iloc[:min_len].copy()
    y_prob = y_prob[:min_len]
    
    # Initialize
    current_freq = None
    hold_counter = 0
    cpu_window = []
    smart_freqs = []
    
    for idx in range(len(df_sim)):
        cpu_util = df_sim.iloc[idx][cpu_col]
        prob = y_prob[idx]
        
        # Windowed CPU averaging
        cpu_window.append(cpu_util)
        if len(cpu_window) > window_cpu:
            cpu_window.pop(0)
        
        recent_cpu_mean = sum(cpu_window) / len(cpu_window)
        
        # Decision logic with probability awareness
        if prob > 0.85 and recent_cpu_mean > 0.7:
            target_freq = high_freq
        elif prob > 0.55:
            target_freq = mid_freq
        else:
            target_freq = low_freq
        
        # First iteration
        if current_freq is None:
            current_freq = target_freq
            hold_counter = hold_high if target_freq == high_freq else hold_low
        
        # Hysteresis: hold current frequency
        elif hold_counter > 0:
            hold_counter -= 1
        
        # Allow transition after hold period
        else:
            if target_freq != current_freq:
                current_freq = target_freq
                hold_counter = hold_high if target_freq == high_freq else hold_low
        
        smart_freqs.append(current_freq)
    
    df_sim['smart_freq'] = smart_freqs
    df_sim['prediction_prob'] = y_prob
    
    # Frequency transition penalty (Stack A)
    df_sim['freq_delta'] = df_sim['smart_freq'].diff().abs().fillna(0)
    
    # Calculate energy with physics model
    ALPHA = 0.5  # Transition penalty coefficient
    LOGICAL_CORES = 8
    
    # Core-idle awareness (Stack B)
    if num_processes_col and num_processes_col in df_sim.columns:
        active_ratio = np.minimum(1.0, df_sim[num_processes_col] / LOGICAL_CORES)
    else:
        active_ratio = 1.0  # Assume all cores active
    
    # Physics-based energy model
    df_sim['smart_energy'] = (
        df_sim['smart_freq'] ** 2
        + ALPHA * df_sim['freq_delta'] * df_sim['smart_freq']
    ) * active_ratio
    
    total_energy = df_sim['smart_energy'].sum()
    
    # Frequency distribution
    freq_counts = df_sim['smart_freq'].value_counts()
    freq_transitions = (df_sim['freq_delta'] > 0).sum()
    
    print(f"\n‚úÖ Simulation complete!")
    print(f"  Total samples: {len(df_sim):,}")
    print(f"  Total energy (proxy): {total_energy:,.0f}")
    print(f"  Frequency transitions: {freq_transitions}")
    print(f"\n  Frequency usage:")
    for freq, count in freq_counts.items():
        print(f"    {freq} MHz: {count:,} samples ({count/len(df_sim)*100:.1f}%)")
    
    return df_sim, total_energy


print("‚úÖ DVFS simulation function defined")

In [None]:
# Simulate Windows DVFS
if df_windows is not None and y_prob_windows is not None:
    df_windows_sim, energy_windows = simulate_smartwatt_dvfs(
        df_windows, 
        cpu_col='cpu_util',
        y_prob=y_prob_windows,
        num_processes_col='num_processes'
    )
else:
    print("‚ö†Ô∏è  Skipping Windows DVFS simulation")
    df_windows_sim, energy_windows = None, None

In [None]:
# Simulate Ubuntu DVFS
if df_ubuntu is not None and y_prob_ubuntu is not None:
    df_ubuntu_sim, energy_ubuntu = simulate_smartwatt_dvfs(
        df_ubuntu, 
        cpu_col='cpu_usage',
        y_prob=y_prob_ubuntu,
        num_processes_col=None  # Ubuntu data doesn't have process count
    )
else:
    print("‚ö†Ô∏è  Skipping Ubuntu DVFS simulation")
    df_ubuntu_sim, energy_ubuntu = None, None

## üìä Part 5: Windows vs Ubuntu Comparison

In [None]:
# Comparison table
comparison_data = []

if df_windows is not None:
    comparison_data.append({
        'OS': 'Windows',
        'Samples': len(df_windows),
        'Duration (min)': len(df_windows)*0.2/60,
        'Avg CPU (%)': df_windows['cpu_util'].mean() * 100,
        'Model Accuracy (%)': acc_windows * 100 if acc_windows else None,
        'Total Energy': energy_windows if energy_windows else None,
        'Freq Transitions': (df_windows_sim['freq_delta'] > 0).sum() if df_windows_sim is not None else None
    })

if df_ubuntu is not None:
    comparison_data.append({
        'OS': 'Ubuntu',
        'Samples': len(df_ubuntu),
        'Duration (min)': len(df_ubuntu)*11/60,
        'Avg CPU (%)': df_ubuntu['cpu_usage'].mean(),
        'Model Accuracy (%)': acc_ubuntu * 100 if acc_ubuntu else None,
        'Total Energy': energy_ubuntu if energy_ubuntu else None,
        'Freq Transitions': (df_ubuntu_sim['freq_delta'] > 0).sum() if df_ubuntu_sim is not None else None
    })

comparison_df = pd.DataFrame(comparison_data)

print("\n" + "="*70)
print("WINDOWS vs UBUNTU COMPARISON")
print("="*70)
print(comparison_df.to_string(index=False))
print("="*70)

# Save comparison
comparison_df.to_csv('os_comparison.csv', index=False)
print("\nüíæ Saved: os_comparison.csv")

In [None]:
# Visualization: DVFS Behavior Comparison
fig, axes = plt.subplots(2, 2, figsize=(18, 12))
fig.suptitle('Windows vs Ubuntu: Smart-Watt DVFS Comparison', fontsize=16, fontweight='bold')

sample_range = 500  # Show first 500 samples for clarity

if df_windows_sim is not None:
    # Windows frequency decisions
    axes[0, 0].plot(df_windows_sim['smart_freq'][:sample_range], 
                    linewidth=1.5, alpha=0.8, label='Smart-Watt DVFS')
    axes[0, 0].set_title('Windows: Frequency Decisions (First 500 samples)', fontweight='bold')
    axes[0, 0].set_xlabel('Sample')
    axes[0, 0].set_ylabel('Frequency (MHz)')
    axes[0, 0].set_ylim([1400, 2500])
    axes[0, 0].legend()
    axes[0, 0].grid(True, alpha=0.3)
    
    # Windows CPU vs Frequency
    axes[0, 1].scatter(df_windows_sim['cpu_util'][:sample_range]*100, 
                       df_windows_sim['smart_freq'][:sample_range],
                       alpha=0.5, s=10, c=df_windows_sim['prediction_prob'][:sample_range],
                       cmap='RdYlGn')
    axes[0, 1].set_title('Windows: CPU Utilization vs Frequency', fontweight='bold')
    axes[0, 1].set_xlabel('CPU Utilization (%)')
    axes[0, 1].set_ylabel('Frequency (MHz)')
    axes[0, 1].grid(True, alpha=0.3)
    cbar = plt.colorbar(axes[0, 1].collections[0], ax=axes[0, 1])
    cbar.set_label('Prediction Probability')

if df_ubuntu_sim is not None:
    # Ubuntu frequency decisions
    axes[1, 0].plot(df_ubuntu_sim['smart_freq'][:sample_range], 
                    linewidth=1.5, alpha=0.8, color='green', label='Smart-Watt DVFS')
    axes[1, 0].set_title('Ubuntu: Frequency Decisions (First 500 samples)', fontweight='bold')
    axes[1, 0].set_xlabel('Sample')
    axes[1, 0].set_ylabel('Frequency (MHz)')
    axes[1, 0].set_ylim([1400, 2500])
    axes[1, 0].legend()
    axes[1, 0].grid(True, alpha=0.3)
    
    # Ubuntu CPU vs Frequency
    axes[1, 1].scatter(df_ubuntu_sim['cpu_usage'][:sample_range], 
                       df_ubuntu_sim['smart_freq'][:sample_range],
                       alpha=0.5, s=10, c=df_ubuntu_sim['prediction_prob'][:sample_range],
                       cmap='RdYlGn')
    axes[1, 1].set_title('Ubuntu: CPU Usage vs Frequency', fontweight='bold')
    axes[1, 1].set_xlabel('CPU Usage (%)')
    axes[1, 1].set_ylabel('Frequency (MHz)')
    axes[1, 1].grid(True, alpha=0.3)
    cbar = plt.colorbar(axes[1, 1].collections[0], ax=axes[1, 1])
    cbar.set_label('Prediction Probability')

plt.tight_layout()
plt.savefig('dvfs_comparison.png', dpi=300, bbox_inches='tight')
plt.show()

print("\nüíæ Saved: dvfs_comparison.png")

In [None]:
# Feature Importance Comparison
if 'importances_windows' in locals() and 'importances_ubuntu' in locals():
    fig, (ax1, ax2) = plt.subplots(1, 2, figsize=(16, 6))
    fig.suptitle('Feature Importance Comparison', fontsize=16, fontweight='bold')
    
    # Windows
    importances_windows.plot(kind='barh', x='Feature', y='Importance', ax=ax1, legend=False, color='steelblue')
    ax1.set_title('Windows Model', fontweight='bold')
    ax1.set_xlabel('Importance')
    ax1.invert_yaxis()
    
    # Ubuntu
    importances_ubuntu.plot(kind='barh', x='Feature', y='Importance', ax=ax2, legend=False, color='green')
    ax2.set_title('Ubuntu Model', fontweight='bold')
    ax2.set_xlabel('Importance')
    ax2.invert_yaxis()
    
    plt.tight_layout()
    plt.savefig('feature_importance_comparison.png', dpi=300, bbox_inches='tight')
    plt.show()
    
    print("\nüíæ Saved: feature_importance_comparison.png")

## üìà Part 6: Key Insights & Findings

In [None]:
print("\n" + "="*70)
print("KEY INSIGHTS: WINDOWS vs UBUNTU")
print("="*70)

if df_windows is not None and df_ubuntu is not None:
    print(f"\n1Ô∏è‚É£  CPU BEHAVIOR:")
    print(f"   ‚Ä¢ Windows avg CPU: {df_windows['cpu_util'].mean()*100:.2f}%")
    print(f"   ‚Ä¢ Ubuntu avg CPU: {df_ubuntu['cpu_usage'].mean():.2f}%")
    print(f"   ‚Ä¢ Difference: {abs(df_windows['cpu_util'].mean()*100 - df_ubuntu['cpu_usage'].mean()):.2f}%")
    
if acc_windows and acc_ubuntu:
    print(f"\n2Ô∏è‚É£  MODEL ACCURACY:")
    print(f"   ‚Ä¢ Windows model: {acc_windows*100:.2f}%")
    print(f"   ‚Ä¢ Ubuntu model: {acc_ubuntu*100:.2f}%")
    print(f"   ‚Ä¢ Better: {'Windows' if acc_windows > acc_ubuntu else 'Ubuntu'}")

if df_windows_sim is not None and df_ubuntu_sim is not None:
    win_transitions = (df_windows_sim['freq_delta'] > 0).sum()
    ubuntu_transitions = (df_ubuntu_sim['freq_delta'] > 0).sum()
    
    print(f"\n3Ô∏è‚É£  DVFS BEHAVIOR:")
    print(f"   ‚Ä¢ Windows transitions: {win_transitions}")
    print(f"   ‚Ä¢ Ubuntu transitions: {ubuntu_transitions}")
    print(f"   ‚Ä¢ More stable: {'Windows' if win_transitions < ubuntu_transitions else 'Ubuntu'}")
    
    win_high_pct = (df_windows_sim['smart_freq'] == 2400).sum() / len(df_windows_sim) * 100
    ubuntu_high_pct = (df_ubuntu_sim['smart_freq'] == 2400).sum() / len(df_ubuntu_sim) * 100
    
    print(f"\n4Ô∏è‚É£  HIGH FREQUENCY USAGE:")
    print(f"   ‚Ä¢ Windows @ 2400 MHz: {win_high_pct:.1f}% of time")
    print(f"   ‚Ä¢ Ubuntu @ 2400 MHz: {ubuntu_high_pct:.1f}% of time")
    print(f"   ‚Ä¢ More aggressive: {'Windows' if win_high_pct > ubuntu_high_pct else 'Ubuntu'}")

if energy_windows and energy_ubuntu:
    # Normalize by sample count for fair comparison
    energy_per_sample_win = energy_windows / len(df_windows_sim)
    energy_per_sample_ubuntu = energy_ubuntu / len(df_ubuntu_sim)
    
    print(f"\n5Ô∏è‚É£  ENERGY EFFICIENCY:")
    print(f"   ‚Ä¢ Windows energy/sample: {energy_per_sample_win:,.2f}")
    print(f"   ‚Ä¢ Ubuntu energy/sample: {energy_per_sample_ubuntu:,.2f}")
    print(f"   ‚Ä¢ More efficient: {'Windows' if energy_per_sample_win < energy_per_sample_ubuntu else 'Ubuntu'}")

print("\n" + "="*70)
print("\n‚úÖ Analysis complete! Check the generated plots and CSV files.")

## üíæ Part 7: Download Results

In [None]:
# Save simulation results
if df_windows_sim is not None:
    df_windows_sim.to_csv('windows_dvfs_results.csv', index=False)
    print("üíæ Saved: windows_dvfs_results.csv")

if df_ubuntu_sim is not None:
    df_ubuntu_sim.to_csv('ubuntu_dvfs_results.csv', index=False)
    print("üíæ Saved: ubuntu_dvfs_results.csv")

# Download all results
print("\nüì• Download generated files:")
from google.colab import files

download_files = [
    'os_comparison.csv',
    'windows_dvfs_results.csv',
    'ubuntu_dvfs_results.csv',
    'smartwatt_windows.pkl',
    'smartwatt_ubuntu.pkl',
    'raw_data_comparison.png',
    'dvfs_comparison.png',
    'feature_importance_comparison.png'
]

for file in download_files:
    try:
        files.download(file)
        print(f"  ‚úÖ Downloaded: {file}")
    except:
        print(f"  ‚ö†Ô∏è  Could not download: {file}")

## üéØ Summary

This notebook implemented the **Smart-Watt DVFS approach** with:

### ‚úÖ Implemented Features:
1. **Temporal Windowing** - 5-sample windows with deltas and statistics
2. **Horizon Prediction** - Predicts 1 second ahead (5 samples)
3. **Random Forest Classifier** - 400 trees, depth 14, balanced classes
4. **Probability-Aware DVFS** - Uses prediction confidence
5. **Hysteresis** - HOLD_HIGH=5, HOLD_LOW=3 to prevent oscillation
6. **Multi-Level Frequencies** - LOW (1520), MID (2000), HIGH (2400)
7. **Physics-Based Energy** - E = f¬≤ + Œ±¬∑|Œîf|¬∑f
8. **Cross-OS Comparison** - Windows vs Ubuntu analysis

### üìä Expected Results:
- Model accuracy: **~94-97%** (based on vindhya's results)
- Energy savings: **~5% vs baseline DVFS**
- Frequency transitions: **Reduced by ~40%** due to hysteresis

### üî¨ Key Findings:
- Compare CPU behavior patterns between Windows and Ubuntu
- Identify which OS has more predictable CPU patterns
- Analyze frequency transition stability
- Measure energy efficiency differences

---

**Created using Smart-Watt DVFS methodology from vindhya/DVFS_F repository**