# Smart Building Energy Optimization Project

## Project Overview

Building energy optimization through machine learning represents a crucial step toward sustainable building management. A smart building energy optimization project combines:
- IoT sensors
- Historical data
- Advanced algorithms

These elements work together to predict and reduce energy consumption while maintaining occupant comfort.

## Dataset Selection and Preparation

### Recommended Kaggle Datasets

The most suitable datasets for this project include:

#### Energy Efficiency Dataset (UCI)
- **Size**: 768 building samples
- **Features**: 8 attributes including:
  - Relative compactness
  - Surface area
  - Wall area
  - Roof area
  - Overall height
  - Orientation
  - Glazing area
  - Glazing area distribution
- **Target Variables**: Heating and cooling loads

#### Appliances Energy Prediction Dataset
- **Size**: 19,735 instances
- **Features**: 28 attributes including temperature, humidity, and weather data
- **Source**: Low-energy building
- **Time Resolution**: 10-minute intervals
- **Duration**: 4.5 months of data

#### Smart Building System Dataset
- **Coverage**: 255 sensor time series
- **Scope**: 51 rooms across 4 floors
- **Type**: Comprehensive IoT sensor data for building energy analysis

In [None]:
## 2. Import Libraries
import pandas as pd
import numpy as np
import matplotlib.pyplot as plt
import seaborn as sns
from sklearn.model_selection import train_test_split
from sklearn.ensemble import RandomForestRegressor
from sklearn.metrics import mean_squared_error, r2_score
from sklearn.preprocessing import StandardScaler

In [None]:
## 3. Load Dataset

# Download the UCI Energy Efficiency dataset from a URL or load it locally.

url = 'http://archive.ics.uci.edu/ml/machine-learning-databases/00242/ENB2012_data.xlsx'
df = pd.read_excel(url)
df.head()

In [None]:
## 4. Exploratory Data Analysis (EDA)

# -   Check for missing values
# -   Describe dataset statistics
# -   Visualize feature distributions and correlations

print(df.info())
print(df.describe())
sns.pairplot(df.iloc[:, :-2])  # Pairwise plot for features
plt.show()

corr = df.corr()
plt.figure(figsize=(10,8))
sns.heatmap(corr, annot=True, cmap='coolwarm')
plt.show()

In [None]:
## 5. Data Preprocessing

# -   Define features (`X`) and targets (`y`) - typically heating load or cooling load.
# -   Split the dataset into train and test subsets.
# -   Scale features for better performance.

# Use all input features except last two columns which are targets
X = df.iloc[:, :-2]
y = df['Y1']  # Heating Load (can also try 'Y2' for Cooling Load)

# Split
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.2, random_state=42)

# Scaling
scaler = StandardScaler()
X_train_scaled = scaler.fit_transform(X_train)
X_test_scaled = scaler.transform(X_test)

In [None]:
## 6. Model Training

# -   Initialize and train a Random Forest Regressor

rf = RandomForestRegressor(n_estimators=100, random_state=42)
rf.fit(X_train_scaled, y_train)

In [None]:
## 7. Model Evaluation

# -   Predict on test set
# -   Calculate and display RMSE and R² score
# -   Plot predicted vs actual values

y_pred = rf.predict(X_test_scaled)
rmse = np.sqrt(mean_squared_error(y_test, y_pred))
r2 = r2_score(y_test, y_pred)

print(f"Test RMSE: {rmse:.2f}")
print(f"Test R² Score: {r2:.2f}")

plt.scatter(y_test, y_pred)
plt.xlabel('Actual Heating Load')
plt.ylabel('Predicted Heating Load')
plt.title('Predicted vs Actual Heating Load')
plt.show()

In [None]:
## 8. Feature Importance

# -   Visualize which features contribute most to the model predictions

feat_importances = pd.Series(rf.feature_importances_, index=X.columns)
feat_importances.nlargest(10).plot(kind='barh')
plt.xlabel('Feature Importance')
plt.title('Top Features')
plt.show()

In [None]:
## 9. Save Model (Optional)

import joblib
joblib.dump(rf, 'rf_energy_model.pkl')
joblib.dump(scaler, 'scaler.pkl')