# Base Load Analysis

This notebook analyzes non-controllable load patterns in the smart home system.

## Analysis Goals
- Identify and characterize base load consumption patterns
- Separate controllable from non-controllable loads
- Analyze consumption by time of day and day of week
- Detect anomalies and unusual consumption patterns
- Optimize energy scheduling around base load patterns

## 1. Setup and Imports

In [None]:
import pandas as pd
import numpy as np
import matplotlib.pyplot as plt
import seaborn as sns
from influxdb_client import InfluxDBClient
from datetime import datetime, timedelta
import pytz
from scipy import signal, stats
from sklearn.cluster import KMeans
from sklearn.preprocessing import StandardScaler
from sklearn.decomposition import PCA
import plotly.graph_objects as go
import plotly.express as px
from plotly.subplots import make_subplots
import warnings
warnings.filterwarnings('ignore')

# Set up plotting style
plt.style.use('seaborn-v0_8')
sns.set_palette('husl')

# Configure pandas display
pd.set_option('display.max_columns', None)
pd.set_option('display.max_rows', 100)

## 2. Database Connection

In [None]:
# InfluxDB connection parameters
INFLUX_URL = "http://localhost:8086"
INFLUX_TOKEN = "your-token-here"
INFLUX_ORG = "loxone"
INFLUX_BUCKET = "loxone"

# Initialize InfluxDB client
client = InfluxDBClient(url=INFLUX_URL, token=INFLUX_TOKEN, org=INFLUX_ORG)
query_api = client.query_api()

## 3. Data Loading Functions

In [None]:
def load_total_consumption(start_date, end_date):
    """Load total electrical consumption data"""
    query = f'''
    from(bucket: "{INFLUX_BUCKET}")
        |> range(start: {start_date}, stop: {end_date})
        |> filter(fn: (r) => r["_measurement"] == "power")
        |> filter(fn: (r) => r["_field"] =~ /total_consumption|grid_import/)
        |> aggregateWindow(every: 5m, fn: mean, createEmpty: false)
        |> yield(name: "total_consumption")
    '''
    result = query_api.query_data_frame(query)
    return result

def load_controllable_loads(start_date, end_date):
    """Load controllable load data (heating, cooling, water heater, etc.)"""
    query = f'''
    from(bucket: "{INFLUX_BUCKET}")
        |> range(start: {start_date}, stop: {end_date})
        |> filter(fn: (r) => r["_measurement"] == "power")
        |> filter(fn: (r) => r["_field"] =~ /heating|cooling|water_heater|heat_pump/)
        |> aggregateWindow(every: 5m, fn: mean, createEmpty: false)
        |> yield(name: "controllable_loads")
    '''
    result = query_api.query_data_frame(query)
    return result

def load_appliance_data(start_date, end_date):
    """Load individual appliance consumption data"""
    query = f'''
    from(bucket: "{INFLUX_BUCKET}")
        |> range(start: {start_date}, stop: {end_date})
        |> filter(fn: (r) => r["_measurement"] == "power")
        |> filter(fn: (r) => r["_field"] =~ /lighting|appliances|fridge|washing_machine|dishwasher/)
        |> aggregateWindow(every: 5m, fn: mean, createEmpty: false)
        |> yield(name: "appliances")
    '''
    result = query_api.query_data_frame(query)
    return result

## 4. Load and Prepare Data

In [None]:
# Define analysis period
end_date = datetime.now(pytz.UTC)
start_date = end_date - timedelta(days=30)  # One month for pattern analysis

# Load data
print(f"Loading data from {start_date} to {end_date}")
total_consumption = load_total_consumption(start_date.isoformat(), end_date.isoformat())
controllable_loads = load_controllable_loads(start_date.isoformat(), end_date.isoformat())
appliance_data = load_appliance_data(start_date.isoformat(), end_date.isoformat())

print(f"Total consumption data shape: {total_consumption.shape}")
print(f"Controllable loads data shape: {controllable_loads.shape}")
print(f"Appliance data shape: {appliance_data.shape}")

# Get available fields
if not total_consumption.empty:
    total_fields = total_consumption['_field'].unique()
    print(f"\nTotal consumption fields: {list(total_fields)}")

if not controllable_loads.empty:
    controllable_fields = controllable_loads['_field'].unique()
    print(f"Controllable load fields: {list(controllable_fields)}")

if not appliance_data.empty:
    appliance_fields = appliance_data['_field'].unique()
    print(f"Appliance fields: {list(appliance_fields)}")

## 5. Base Load Calculation

In [None]:
# Calculate base load by subtracting controllable loads from total consumption
if not total_consumption.empty and not controllable_loads.empty:
    # Prepare data for merging
    total_consumption['_time'] = pd.to_datetime(total_consumption['_time'])
    controllable_loads['_time'] = pd.to_datetime(controllable_loads['_time'])
    
    # Aggregate total consumption
    total_power = total_consumption.groupby('_time')['_value'].sum().reset_index()
    total_power = total_power.rename(columns={'_value': 'total_power'})
    
    # Aggregate controllable loads
    controllable_power = controllable_loads.groupby('_time')['_value'].sum().reset_index()
    controllable_power = controllable_power.rename(columns={'_value': 'controllable_power'})
    
    # Merge and calculate base load
    merged_data = pd.merge_asof(
        total_power.sort_values('_time'),
        controllable_power.sort_values('_time'),
        on='_time',
        direction='nearest',
        tolerance=pd.Timedelta('5min')
    )
    
    # Fill NaN with 0 for controllable power
    merged_data['controllable_power'] = merged_data['controllable_power'].fillna(0)
    
    # Calculate base load
    merged_data['base_load'] = merged_data['total_power'] - merged_data['controllable_power']
    merged_data['base_load'] = merged_data['base_load'].clip(lower=0)  # Ensure non-negative
    
    # Add time features
    merged_data['hour'] = merged_data['_time'].dt.hour
    merged_data['weekday'] = merged_data['_time'].dt.day_name()
    merged_data['is_weekend'] = merged_data['_time'].dt.weekday >= 5
    merged_data['date'] = merged_data['_time'].dt.date
    
    print(f"\nBase Load Statistics:")
    print(f"Average base load: {merged_data['base_load'].mean():.0f} W")
    print(f"Minimum base load: {merged_data['base_load'].min():.0f} W")
    print(f"Maximum base load: {merged_data['base_load'].max():.0f} W")
    print(f"Standard deviation: {merged_data['base_load'].std():.0f} W")
    
    # Plot load breakdown
    fig, ax = plt.subplots(figsize=(14, 8))
    
    # Sample data for visualization (every 10th point)
    sample_data = merged_data.iloc[::10]
    
    ax.fill_between(sample_data['_time'], 0, sample_data['base_load'], 
                   alpha=0.7, label='Base Load', color='lightblue')
    ax.fill_between(sample_data['_time'], sample_data['base_load'], 
                   sample_data['base_load'] + sample_data['controllable_power'],
                   alpha=0.7, label='Controllable Loads', color='orange')
    
    ax.set_xlabel('Time')
    ax.set_ylabel('Power Consumption (W)')
    ax.set_title('Power Consumption Breakdown: Base Load vs Controllable Loads')
    ax.legend()
    ax.grid(True, alpha=0.3)
    plt.xticks(rotation=45)
    plt.tight_layout()
    plt.show()

## 6. Daily Base Load Patterns

In [None]:
# Analyze daily base load patterns
if 'base_load' in merged_data.columns:
    # Calculate hourly patterns
    hourly_base_load = merged_data.groupby(['hour', 'is_weekend'])['base_load'].agg(['mean', 'std']).reset_index()
    
    # Separate weekday and weekend patterns
    weekday_pattern = hourly_base_load[~hourly_base_load['is_weekend']]
    weekend_pattern = hourly_base_load[hourly_base_load['is_weekend']]
    
    # Plot daily patterns
    fig, (ax1, ax2) = plt.subplots(2, 1, figsize=(14, 10))
    
    # Hourly average pattern
    ax1.plot(weekday_pattern['hour'], weekday_pattern['mean'], 'b-', 
            linewidth=2, label='Weekday', marker='o')
    ax1.fill_between(weekday_pattern['hour'], 
                    weekday_pattern['mean'] - weekday_pattern['std'],
                    weekday_pattern['mean'] + weekday_pattern['std'],
                    alpha=0.3, color='blue')
    
    ax1.plot(weekend_pattern['hour'], weekend_pattern['mean'], 'r-', 
            linewidth=2, label='Weekend', marker='s')
    ax1.fill_between(weekend_pattern['hour'], 
                    weekend_pattern['mean'] - weekend_pattern['std'],
                    weekend_pattern['mean'] + weekend_pattern['std'],
                    alpha=0.3, color='red')
    
    ax1.set_xlabel('Hour of Day')
    ax1.set_ylabel('Base Load (W)')
    ax1.set_title('Average Daily Base Load Pattern')
    ax1.legend()
    ax1.grid(True, alpha=0.3)
    ax1.set_xlim(0, 23)
    
    # Box plot by hour
    hourly_data = merged_data.groupby('hour')['base_load'].apply(list).reset_index()
    box_data = [data for data in hourly_data['base_load']]
    
    ax2.boxplot(box_data, positions=range(24), patch_artist=True,
               boxprops=dict(facecolor='lightgreen', alpha=0.7),
               medianprops=dict(color='red', linewidth=2))
    ax2.set_xlabel('Hour of Day')
    ax2.set_ylabel('Base Load (W)')
    ax2.set_title('Base Load Distribution by Hour')
    ax2.grid(True, alpha=0.3)
    ax2.set_xlim(-0.5, 23.5)
    
    plt.tight_layout()
    plt.show()
    
    # Identify peak and minimum base load times
    avg_hourly = merged_data.groupby('hour')['base_load'].mean()
    peak_hour = avg_hourly.idxmax()
    min_hour = avg_hourly.idxmin()
    
    print(f"\nDaily Pattern Analysis:")
    print(f"Peak base load hour: {peak_hour}:00 ({avg_hourly[peak_hour]:.0f} W)")
    print(f"Minimum base load hour: {min_hour}:00 ({avg_hourly[min_hour]:.0f} W)")
    print(f"Peak to minimum ratio: {avg_hourly[peak_hour] / avg_hourly[min_hour]:.2f}")

## 7. Weekly Base Load Patterns

In [None]:
# Analyze weekly patterns
if 'base_load' in merged_data.columns:
    # Daily averages by weekday
    daily_avg = merged_data.groupby('weekday')['base_load'].mean()
    
    # Order by weekday
    weekday_order = ['Monday', 'Tuesday', 'Wednesday', 'Thursday', 
                    'Friday', 'Saturday', 'Sunday']
    daily_avg = daily_avg.reindex(weekday_order)
    
    # Plot weekly pattern
    fig, (ax1, ax2) = plt.subplots(1, 2, figsize=(14, 6))
    
    # Bar chart of daily averages
    colors = ['blue' if day in ['Saturday', 'Sunday'] else 'orange' 
             for day in daily_avg.index]
    bars = ax1.bar(range(len(daily_avg)), daily_avg.values, color=colors, alpha=0.7)
    ax1.set_xlabel('Day of Week')
    ax1.set_ylabel('Average Base Load (W)')
    ax1.set_title('Average Base Load by Day of Week')
    ax1.set_xticks(range(len(daily_avg)))
    ax1.set_xticklabels([day[:3] for day in daily_avg.index], rotation=45)
    ax1.grid(True, alpha=0.3)
    
    # Add value labels on bars
    for bar, value in zip(bars, daily_avg.values):
        ax1.text(bar.get_x() + bar.get_width()/2, bar.get_height() + 10,
                f'{value:.0f}', ha='center', va='bottom')
    
    # Heatmap of hourly patterns by weekday
    pivot_data = merged_data.groupby(['weekday', 'hour'])['base_load'].mean().reset_index()
    pivot_matrix = pivot_data.pivot(index='weekday', columns='hour', values='base_load')
    pivot_matrix = pivot_matrix.reindex(weekday_order)
    
    im = ax2.imshow(pivot_matrix.values, cmap='viridis', aspect='auto')
    ax2.set_xlabel('Hour of Day')
    ax2.set_ylabel('Day of Week')
    ax2.set_title('Base Load Heatmap (W)')
    ax2.set_yticks(range(len(weekday_order)))
    ax2.set_yticklabels([day[:3] for day in weekday_order])
    ax2.set_xticks(range(0, 24, 4))
    ax2.set_xticklabels(range(0, 24, 4))
    
    # Add colorbar
    cbar = plt.colorbar(im, ax=ax2)
    cbar.set_label('Base Load (W)')
    
    plt.tight_layout()
    plt.show()
    
    # Print weekly statistics
    print(f"\nWeekly Pattern Analysis:")
    print(f"Highest consumption day: {daily_avg.idxmax()} ({daily_avg.max():.0f} W)")
    print(f"Lowest consumption day: {daily_avg.idxmin()} ({daily_avg.min():.0f} W)")
    
    weekend_avg = merged_data[merged_data['is_weekend']]['base_load'].mean()
    weekday_avg = merged_data[~merged_data['is_weekend']]['base_load'].mean()
    print(f"\nWeekend vs Weekday:")
    print(f"Weekend average: {weekend_avg:.0f} W")
    print(f"Weekday average: {weekday_avg:.0f} W")
    print(f"Weekend/Weekday ratio: {weekend_avg / weekday_avg:.2f}")

## 8. Base Load Clustering Analysis

In [None]:
# Cluster daily base load patterns to identify different usage modes
if 'base_load' in merged_data.columns:
    # Create daily load profiles
    daily_profiles = merged_data.groupby(['date', 'hour'])['base_load'].mean().reset_index()
    daily_matrix = daily_profiles.pivot(index='date', columns='hour', values='base_load')
    
    # Remove days with missing data
    daily_matrix = daily_matrix.dropna()
    
    if len(daily_matrix) > 5:  # Need at least 5 days for clustering
        # Standardize the data
        scaler = StandardScaler()
        scaled_profiles = scaler.fit_transform(daily_matrix)
        
        # Perform K-means clustering
        n_clusters = min(5, len(daily_matrix) // 3)  # Reasonable number of clusters
        kmeans = KMeans(n_clusters=n_clusters, random_state=42, n_init=10)
        clusters = kmeans.fit_predict(scaled_profiles)
        
        # Add cluster labels to data
        daily_matrix['cluster'] = clusters
        
        # Plot cluster centers
        fig, axes = plt.subplots(n_clusters, 1, figsize=(12, 3*n_clusters))
        if n_clusters == 1:
            axes = [axes]
        
        colors = plt.cm.Set1(np.linspace(0, 1, n_clusters))
        
        for i in range(n_clusters):
            cluster_data = daily_matrix[daily_matrix['cluster'] == i]
            cluster_profiles = cluster_data.drop('cluster', axis=1)
            
            # Plot all profiles in cluster
            for _, profile in cluster_profiles.iterrows():
                axes[i].plot(range(24), profile.values, color=colors[i], alpha=0.3)
            
            # Plot cluster center
            center = scaler.inverse_transform(kmeans.cluster_centers_[i].reshape(1, -1))[0]
            axes[i].plot(range(24), center, color='black', linewidth=3, 
                        label=f'Cluster {i+1} Center')
            
            axes[i].set_xlabel('Hour of Day')
            axes[i].set_ylabel('Base Load (W)')
            axes[i].set_title(f'Cluster {i+1}: {len(cluster_data)} days')
            axes[i].legend()
            axes[i].grid(True, alpha=0.3)
            axes[i].set_xlim(0, 23)
        
        plt.tight_layout()
        plt.show()
        
        # Analyze cluster characteristics
        print(f"\nBase Load Clustering Analysis ({n_clusters} clusters):")
        print("=" * 50)
        
        cluster_stats = []
        for i in range(n_clusters):
            cluster_data = daily_matrix[daily_matrix['cluster'] == i]
            cluster_profiles = cluster_data.drop('cluster', axis=1)
            
            avg_consumption = cluster_profiles.mean().mean()
            peak_hour = cluster_profiles.mean().idxmax()
            min_hour = cluster_profiles.mean().idxmin()
            variability = cluster_profiles.mean().std()
            
            cluster_stats.append({
                'cluster': i+1,
                'days': len(cluster_data),
                'avg_consumption': avg_consumption,
                'peak_hour': peak_hour,
                'min_hour': min_hour,
                'variability': variability
            })
            
            print(f"Cluster {i+1}:")
            print(f"  Days: {len(cluster_data)}")
            print(f"  Average consumption: {avg_consumption:.0f} W")
            print(f"  Peak hour: {peak_hour}:00")
            print(f"  Minimum hour: {min_hour}:00")
            print(f"  Daily variability: {variability:.0f} W")
            print()
        
        # Convert to DataFrame for easier analysis
        cluster_df = pd.DataFrame(cluster_stats)
        
        # Identify cluster types
        high_consumption_cluster = cluster_df.loc[cluster_df['avg_consumption'].idxmax()]
        low_consumption_cluster = cluster_df.loc[cluster_df['avg_consumption'].idxmin()]
        
        print(f"Pattern Insights:")
        print(f"High consumption pattern: Cluster {high_consumption_cluster['cluster']} "
              f"({high_consumption_cluster['avg_consumption']:.0f} W avg)")
        print(f"Low consumption pattern: Cluster {low_consumption_cluster['cluster']} "
              f"({low_consumption_cluster['avg_consumption']:.0f} W avg)")

## 9. Anomaly Detection

In [None]:
# Detect anomalous consumption patterns
if 'base_load' in merged_data.columns:
    # Calculate rolling statistics
    merged_data = merged_data.sort_values('_time')
    merged_data['rolling_mean'] = merged_data['base_load'].rolling(window=288).mean()  # 24-hour window (5min intervals)
    merged_data['rolling_std'] = merged_data['base_load'].rolling(window=288).std()
    
    # Define anomalies as values beyond 3 standard deviations
    merged_data['z_score'] = np.abs((merged_data['base_load'] - merged_data['rolling_mean']) / merged_data['rolling_std'])
    merged_data['is_anomaly'] = merged_data['z_score'] > 3
    
    # Remove NaN values from rolling calculations
    merged_data = merged_data.dropna(subset=['z_score'])
    
    anomalies = merged_data[merged_data['is_anomaly']]
    
    print(f"\nAnomaly Detection Results:")
    print(f"Total data points: {len(merged_data)}")
    print(f"Anomalies detected: {len(anomalies)} ({len(anomalies)/len(merged_data)*100:.2f}%)")
    
    if len(anomalies) > 0:
        print(f"\nAnomaly Statistics:")
        print(f"Average anomaly value: {anomalies['base_load'].mean():.0f} W")
        print(f"Maximum anomaly value: {anomalies['base_load'].max():.0f} W")
        print(f"Minimum anomaly value: {anomalies['base_load'].min():.0f} W")
        
        # Analyze anomaly timing
        anomaly_hours = anomalies.groupby('hour').size()
        if len(anomaly_hours) > 0:
            print(f"\nMost common anomaly hours: {anomaly_hours.nlargest(3).to_dict()}")
    
    # Plot anomalies
    fig, (ax1, ax2) = plt.subplots(2, 1, figsize=(14, 10))
    
    # Time series with anomalies highlighted
    sample_data = merged_data.iloc[::50]  # Sample for better visualization
    ax1.plot(sample_data['_time'], sample_data['base_load'], 'b-', alpha=0.7, label='Base Load')
    ax1.plot(sample_data['_time'], sample_data['rolling_mean'], 'g-', linewidth=2, label='Rolling Mean')
    
    if len(anomalies) > 0:
        anomaly_sample = anomalies.iloc[::10]  # Sample anomalies
        ax1.scatter(anomaly_sample['_time'], anomaly_sample['base_load'], 
                   color='red', s=30, label='Anomalies', zorder=5)
    
    ax1.set_xlabel('Time')
    ax1.set_ylabel('Base Load (W)')
    ax1.set_title('Base Load with Anomaly Detection')
    ax1.legend()
    ax1.grid(True, alpha=0.3)
    
    # Distribution of Z-scores
    ax2.hist(merged_data['z_score'], bins=50, alpha=0.7, color='skyblue', edgecolor='black')
    ax2.axvline(x=3, color='red', linestyle='--', linewidth=2, label='Anomaly Threshold')
    ax2.set_xlabel('Z-Score')
    ax2.set_ylabel('Frequency')
    ax2.set_title('Distribution of Z-Scores for Base Load')
    ax2.legend()
    ax2.grid(True, alpha=0.3)
    
    plt.tight_layout()
    plt.show()

## 10. Load Forecasting Model

In [None]:
# Simple forecasting model for base load
from sklearn.ensemble import RandomForestRegressor
from sklearn.metrics import mean_absolute_error, mean_squared_error

if 'base_load' in merged_data.columns and len(merged_data) > 1000:
    # Prepare features for forecasting
    forecast_data = merged_data.copy()
    
    # Create lag features
    for lag in [1, 12, 24, 288]:  # 5min, 1hr, 2hr, 24hr lags
        forecast_data[f'base_load_lag_{lag}'] = forecast_data['base_load'].shift(lag)
    
    # Add time-based features
    forecast_data['hour_sin'] = np.sin(2 * np.pi * forecast_data['hour'] / 24)
    forecast_data['hour_cos'] = np.cos(2 * np.pi * forecast_data['hour'] / 24)
    forecast_data['weekday_num'] = forecast_data['_time'].dt.weekday
    forecast_data['is_weekend_num'] = forecast_data['is_weekend'].astype(int)
    
    # Rolling averages
    forecast_data['base_load_ma_24'] = forecast_data['base_load'].rolling(window=288).mean()
    forecast_data['base_load_ma_168'] = forecast_data['base_load'].rolling(window=2016).mean()  # 7 days
    
    # Select features
    feature_cols = ['hour_sin', 'hour_cos', 'weekday_num', 'is_weekend_num',
                   'base_load_lag_1', 'base_load_lag_12', 'base_load_lag_24', 'base_load_lag_288',
                   'base_load_ma_24', 'base_load_ma_168']
    
    # Remove rows with NaN values
    forecast_data = forecast_data.dropna(subset=feature_cols + ['base_load'])
    
    if len(forecast_data) > 500:
        # Split data (80% train, 20% test)
        split_idx = int(len(forecast_data) * 0.8)
        train_data = forecast_data.iloc[:split_idx]
        test_data = forecast_data.iloc[split_idx:]
        
        X_train = train_data[feature_cols]
        y_train = train_data['base_load']
        X_test = test_data[feature_cols]
        y_test = test_data['base_load']
        
        # Train Random Forest model
        rf_model = RandomForestRegressor(n_estimators=100, random_state=42, n_jobs=-1)
        rf_model.fit(X_train, y_train)
        
        # Make predictions
        y_pred = rf_model.predict(X_test)
        
        # Calculate metrics
        mae = mean_absolute_error(y_test, y_pred)
        rmse = np.sqrt(mean_squared_error(y_test, y_pred))
        mape = np.mean(np.abs((y_test - y_pred) / y_test)) * 100
        
        print(f"\nBase Load Forecasting Model Performance:")
        print(f"Mean Absolute Error: {mae:.2f} W")
        print(f"Root Mean Square Error: {rmse:.2f} W")
        print(f"Mean Absolute Percentage Error: {mape:.2f}%")
        
        # Feature importance
        feature_importance = pd.DataFrame({
            'feature': feature_cols,
            'importance': rf_model.feature_importances_
        }).sort_values('importance', ascending=False)
        
        print(f"\nTop 5 Most Important Features:")
        for _, row in feature_importance.head().iterrows():
            print(f"  {row['feature']}: {row['importance']:.3f}")
        
        # Plot predictions vs actual
        fig, (ax1, ax2) = plt.subplots(2, 1, figsize=(14, 10))
        
        # Time series comparison (last 1000 points)
        plot_data = test_data.tail(1000)
        plot_pred = y_pred[-1000:]
        
        ax1.plot(plot_data['_time'], plot_data['base_load'], 'b-', 
                label='Actual', alpha=0.7, linewidth=1)
        ax1.plot(plot_data['_time'], plot_pred, 'r-', 
                label='Predicted', alpha=0.7, linewidth=1)
        ax1.set_xlabel('Time')
        ax1.set_ylabel('Base Load (W)')
        ax1.set_title('Base Load Forecast: Actual vs Predicted')
        ax1.legend()
        ax1.grid(True, alpha=0.3)
        
        # Scatter plot
        ax2.scatter(y_test, y_pred, alpha=0.5, s=10)
        ax2.plot([y_test.min(), y_test.max()], [y_test.min(), y_test.max()], 
                'r--', linewidth=2, label='Perfect Prediction')
        ax2.set_xlabel('Actual Base Load (W)')
        ax2.set_ylabel('Predicted Base Load (W)')
        ax2.set_title('Prediction Accuracy Scatter Plot')
        ax2.legend()
        ax2.grid(True, alpha=0.3)
        
        plt.tight_layout()
        plt.show()

## 11. Energy Scheduling Optimization

In [None]:
# Optimize energy scheduling based on base load patterns
if 'base_load' in merged_data.columns:
    # Calculate optimal times for controllable loads
    hourly_base_stats = merged_data.groupby('hour')['base_load'].agg(['mean', 'std']).reset_index()
    
    # Identify low base load hours (good for scheduling additional loads)
    hourly_base_stats['load_score'] = hourly_base_stats['mean'] + hourly_base_stats['std']
    hourly_base_stats = hourly_base_stats.sort_values('load_score')
    
    print(f"\nOptimal Scheduling Analysis:")
    print(f"============================")
    
    # Best hours for scheduling controllable loads
    best_hours = hourly_base_stats.head(6)  # Top 6 hours
    worst_hours = hourly_base_stats.tail(6)  # Bottom 6 hours
    
    print(f"\nBest hours for additional loads (lowest base + variability):")
    for _, row in best_hours.iterrows():
        print(f"  {row['hour']:02d}:00 - Avg: {row['mean']:.0f}W, Std: {row['std']:.0f}W")
    
    print(f"\nAvoid scheduling during these hours (highest base + variability):")
    for _, row in worst_hours.iterrows():
        print(f"  {row['hour']:02d}:00 - Avg: {row['mean']:.0f}W, Std: {row['std']:.0f}W")
    
    # Calculate potential load shifting benefits
    avg_base_load = merged_data['base_load'].mean()
    low_load_hours = hourly_base_stats.head(3)['mean'].mean()
    high_load_hours = hourly_base_stats.tail(3)['mean'].mean()
    
    potential_reduction = high_load_hours - low_load_hours
    
    print(f"\nLoad Shifting Potential:")
    print(f"Average base load: {avg_base_load:.0f} W")
    print(f"Low-load periods average: {low_load_hours:.0f} W")
    print(f"High-load periods average: {high_load_hours:.0f} W")
    print(f"Potential peak reduction: {potential_reduction:.0f} W ({potential_reduction/avg_base_load*100:.1f}%)")
    
    # Visualize optimization opportunities
    fig, (ax1, ax2) = plt.subplots(2, 1, figsize=(14, 10))
    
    # Base load with optimal scheduling windows
    hours = hourly_base_stats['hour']
    means = hourly_base_stats['mean']
    
    ax1.plot(hours, means, 'b-', linewidth=2, marker='o', label='Average Base Load')
    ax1.fill_between(hours, means - hourly_base_stats['std'], 
                    means + hourly_base_stats['std'], alpha=0.3, color='blue')
    
    # Highlight optimal scheduling windows
    optimal_hours = best_hours['hour'].values
    for hour in optimal_hours:
        ax1.axvspan(hour-0.5, hour+0.5, alpha=0.2, color='green')
    
    # Highlight avoid hours
    avoid_hours = worst_hours['hour'].values
    for hour in avoid_hours:
        ax1.axvspan(hour-0.5, hour+0.5, alpha=0.2, color='red')
    
    ax1.set_xlabel('Hour of Day')
    ax1.set_ylabel('Base Load (W)')
    ax1.set_title('Base Load Pattern with Optimal Scheduling Windows')
    ax1.legend(['Base Load', 'Optimal for Additional Loads', 'Avoid Additional Loads'])
    ax1.grid(True, alpha=0.3)
    ax1.set_xlim(0, 23)
    
    # Load variability analysis
    ax2.bar(hours, hourly_base_stats['std'], color='orange', alpha=0.7)
    ax2.set_xlabel('Hour of Day')
    ax2.set_ylabel('Base Load Std Dev (W)')
    ax2.set_title('Base Load Variability by Hour')
    ax2.grid(True, alpha=0.3)
    ax2.set_xlim(-0.5, 23.5)
    
    plt.tight_layout()
    plt.show()

## 12. Key Findings and Recommendations

### Base Load Characteristics Summary

Based on the base load analysis:

1. **Consumption Patterns**
   - Average base load and daily variation
   - Peak and minimum consumption hours
   - Weekend vs weekday differences
   - Seasonal variations (if applicable)

2. **Load Clusters**
   - Different consumption patterns identified
   - High vs low consumption days
   - Pattern consistency and variability

3. **Anomaly Detection**
   - Unusual consumption events
   - Potential equipment malfunctions
   - Times of highest variability

### Optimization Opportunities

1. **Load Scheduling**
   - Optimal hours for controllable loads
   - Peak shaving opportunities
   - Energy cost reduction potential

2. **Demand Management**
   - Identify high-consumption appliances
   - Implement smart scheduling
   - Battery storage optimization

3. **Efficiency Improvements**
   - Replace high base load appliances
   - Implement standby power reduction
   - Smart power strips and controls

### Energy Savings Potential

- Smart scheduling: 5-15% peak reduction
- Standby power reduction: 10-20% base load reduction
- Time-of-use optimization: 15-25% cost savings
- Total potential: 20-40% improvement in energy efficiency

### Implementation Recommendations

1. **Immediate Actions**
   - Schedule water heating during low base load hours
   - Implement smart appliance controls
   - Set up load monitoring alerts

2. **Medium-term Improvements**
   - Install smart meters for individual circuits
   - Implement predictive load scheduling
   - Optimize battery charging/discharging

3. **Long-term Strategy**
   - Replace inefficient appliances
   - Implement demand response programs
   - Integrate with dynamic electricity pricing