# Energy Consumption Optimizer

This notebook implements an end-to-end system for:
1. **Predicting** household energy consumption using machine learning
2. **Optimizing** appliance schedules to minimize electricity costs

## Dataset
REFIT Smart Home dataset - appliance-level energy consumption data

## Setup and Imports

In [None]:
import os
import sys
import pandas as pd
import numpy as np
import matplotlib.pyplot as plt
import seaborn as sns
import warnings
warnings.filterwarnings('ignore')

# Import custom modules
from src import config
from src.data_loader import load_and_prepare_data
from src.preprocessor import preprocess_pipeline
from src.feature_engineer import prepare_features_and_target, split_train_val_test
from src.predictive_model import train_xgboost_model, train_lstm_model, evaluate_model
from src.optimizer import optimize_schedule, generate_schedule_dataframe, create_summary_table
from src.visualizer import (plot_predictions, plot_schedule_heatmap, plot_cost_comparison,
                            plot_feature_importance, create_results_summary)

# Set display options
pd.set_option('display.max_columns', None)
pd.set_option('display.max_rows', 100)

print("✓ All imports successful!")

## 1. Data Loading

Load REFIT Smart Home dataset for House 1.

In [None]:
# Load data
HOME_ID = 1
df = load_and_prepare_data(config.RAW_DATA_DIR, home_id=HOME_ID)

print(f"\nDataset shape: {df.shape}")
print(f"Date range: {df.index.min()} to {df.index.max()}")
print(f"\nColumns: {list(df.columns)}")

In [None]:
# Display first few rows
df.head(10)

In [None]:
# Basic statistics
df.describe()

## 2. Data Preprocessing

- Resample to 15-minute intervals
- Handle missing values
- Add time-based features
- Normalize data
- Remove outliers

In [None]:
df_processed, scaler = preprocess_pipeline(
    df,
    resample_interval=config.RESAMPLE_INTERVAL,
    missing_method=config.MISSING_VALUE_METHOD,
    remove_outliers_flag=True,
    outlier_threshold=config.OUTLIER_THRESHOLD,
    normalize=True
)

print(f"\nProcessed dataset shape: {df_processed.shape}")
print(f"\nNew columns added: {[col for col in df_processed.columns if col not in df.columns]}")

In [None]:
# View processed data
df_processed.head()

## 3. Feature Engineering

- Create lag features (1-4 intervals)
- Create rolling statistics (mean, sum, std)
- Prepare target variable (1 hour ahead prediction)

In [None]:
# Identify appliance columns
appliance_cols = [col for col in df_processed.columns 
                 if ('Appliance' in col or 'Aggregate' in col) 
                 and not any(x in col for x in ['lag', 'rolling', 'hour', 'day', 'sin', 'cos', 'weekend'])]

print(f"Appliance columns: {appliance_cols}")

# Use Aggregate as target
target_column = 'Aggregate' if 'Aggregate' in appliance_cols else appliance_cols[0]
print(f"\nTarget column: {target_column}")

# Use subset of appliances for features
feature_appliances = appliance_cols[:5] if len(appliance_cols) > 5 else appliance_cols
print(f"Feature appliances: {feature_appliances}")

In [None]:
# Create features and target
features, target = prepare_features_and_target(
    df_processed,
    target_column=target_column,
    appliance_columns=feature_appliances,
    lag_intervals=config.LAG_INTERVALS,
    rolling_windows=config.ROLLING_WINDOW_SIZES,
    forecast_horizon=4  # 1 hour ahead (4 * 15min)
)

print(f"\nFeatures shape: {features.shape}")
print(f"Target shape: {target.shape}")

In [None]:
# Split into train/val/test
X_train, X_val, X_test, y_train, y_val, y_test = split_train_val_test(
    features, target,
    config.TRAIN_RATIO, config.VAL_RATIO, config.TEST_RATIO
)

## 4. Predictive Modeling - XGBoost

Train XGBoost model to predict energy consumption 1 hour ahead.

In [None]:
# Train XGBoost model
xgb_model = train_xgboost_model(
    X_train, y_train, 
    X_val, y_val,
    params=config.XGBOOST_PARAMS
)

In [None]:
# Evaluate on test set
y_pred_xgb, metrics_xgb = evaluate_model(xgb_model, X_test, y_test, 'xgboost')

print("\n" + "="*60)
print("XGBoost Test Results")
print("="*60)
print(f"RMSE: {metrics_xgb['RMSE']:.4f}")
print(f"MAE: {metrics_xgb['MAE']:.4f}")
print(f"MAPE: {metrics_xgb['MAPE']:.2f}%")
print("="*60)

In [None]:
# Plot predictions
plot_predictions(
    y_test.values, y_pred_xgb,
    timestamps=y_test.index,
    title="XGBoost: Energy Consumption Prediction"
)

In [None]:
# Feature importance
plot_feature_importance(xgb_model, X_train.columns.tolist(), top_n=20)

## 5. Predictive Modeling - LSTM (Optional)

Train LSTM model for time-series prediction.

In [None]:
# Uncomment to train LSTM model
# lstm_model = train_lstm_model(
#     X_train, y_train,
#     X_val, y_val,
#     params=config.LSTM_PARAMS
# )

# if lstm_model is not None:
#     y_pred_lstm, metrics_lstm = evaluate_model(lstm_model, X_test, y_test, 'lstm')
#     plot_predictions(
#         y_test.values[-len(y_pred_lstm):], y_pred_lstm,
#         timestamps=y_test.index[-len(y_pred_lstm):],
#         title="LSTM: Energy Consumption Prediction"
#     )

## 6. Appliance Schedule Optimization

Optimize appliance schedules to minimize electricity cost while respecting constraints.

In [None]:
# View appliance configurations
print("Flexible Appliances Configuration:")
print("="*60)
for name, cfg in config.FLEXIBLE_APPLIANCES.items():
    print(f"\n{name}:")
    print(f"  Runtime: {cfg['runtime_hours']} hours")
    print(f"  Time window: {cfg['earliest_start']:02d}:00 - {cfg['latest_finish']:02d}:00")
    print(f"  Power rating: {cfg['power_rating']} W")

In [None]:
# View electricity prices
plt.figure(figsize=(12, 4))
plt.bar(range(24), config.HOURLY_PRICES, color='steelblue', alpha=0.7)
plt.xlabel('Hour of Day')
plt.ylabel('Price (£/kWh)')
plt.title('Hourly Electricity Prices')
plt.xticks(range(24), [f"{h:02d}" for h in range(24)])
plt.grid(True, alpha=0.3, axis='y')
plt.tight_layout()
plt.show()

In [None]:
# Calculate original schedule (baseline)
original_schedule = {}
for name, config_app in config.FLEXIBLE_APPLIANCES.items():
    runtime_hours = int(np.ceil(config_app['runtime_hours']))
    earliest = config_app['earliest_start']
    schedule = np.zeros(24)
    schedule[earliest:earliest+runtime_hours] = 1
    original_schedule[name] = schedule

print("Original Schedule (run at earliest time):")
for name, sched in original_schedule.items():
    hours = np.where(sched == 1)[0]
    print(f"  {name}: {hours[0]:02d}:00 - {hours[-1]+1:02d}:00")

In [None]:
# Optimize schedule
optimized_schedule, original_cost, optimized_cost = optimize_schedule(
    config.FLEXIBLE_APPLIANCES,
    config.HOURLY_PRICES,
    config.ALLOW_SIMULTANEOUS_APPLIANCES,
    config.MAX_SIMULTANEOUS_APPLIANCES
)

In [None]:
# View optimized schedule
schedule_df = generate_schedule_dataframe(
    optimized_schedule,
    config.FLEXIBLE_APPLIANCES,
    config.HOURLY_PRICES
)

print("\nOptimized Schedule:")
schedule_df

In [None]:
# Summary table
summary_table = create_summary_table(
    config.FLEXIBLE_APPLIANCES,
    original_schedule,
    optimized_schedule,
    config.HOURLY_PRICES
)

print("\nSchedule Comparison:")
summary_table

## 7. Visualization

Visualize optimization results.

In [None]:
# Schedule heatmap
plot_schedule_heatmap(
    optimized_schedule,
    config.FLEXIBLE_APPLIANCES,
    config.HOURLY_PRICES,
    title="Optimized Appliance Schedule"
)

In [None]:
# Cost comparison
plot_cost_comparison(
    original_cost, optimized_cost,
    config.FLEXIBLE_APPLIANCES,
    original_schedule, optimized_schedule,
    config.HOURLY_PRICES
)

## 8. Results Summary

In [None]:
# Create comprehensive results summary
savings_metrics = {
    'original_cost': original_cost,
    'optimized_cost': optimized_cost,
    'absolute_savings': original_cost - optimized_cost,
    'percent_savings': (original_cost - optimized_cost) / original_cost * 100
}

results_summary = create_results_summary(metrics_xgb, savings_metrics)

print("\n" + "="*80)
print(" "*30 + "FINAL RESULTS")
print("="*80)
print(results_summary.to_string(index=False))
print("="*80)

## Conclusion

This notebook demonstrated:

1. **Data Processing**: Loaded and preprocessed REFIT Smart Home dataset
2. **Feature Engineering**: Created lag and rolling features for time-series prediction
3. **Predictive Modeling**: Trained XGBoost model to forecast energy consumption
4. **Schedule Optimization**: Used convex optimization to minimize electricity costs
5. **Visualization**: Created comprehensive plots to analyze results

### Key Findings:
- Achieved accurate energy consumption predictions
- Optimized appliance schedules to reduce costs by shifting usage to off-peak hours
- Respected all appliance constraints (runtime, time windows)

### Next Steps:
- Experiment with different appliance configurations
- Try LSTM model for comparison
- Analyze multiple homes
- Incorporate real-time pricing data
- Add battery storage optimization