# Efficiency Analysis

This notebook analyzes energy efficiency across buildings by leveraging
pre-aggregated energy trends created in Notebook 02.

We focus on:
- Structural efficiency (energy per square meter)
- Operational efficiency (time-of-day and day-level behavior)
- Context-aware inefficiency using temporal patterns

The goal is to identify where and why energy inefficiencies occur.

In [0]:
# ---------------------------------------
# LOAD TREND TABLES FROM NOTEBOOK 2
# ---------------------------------------

monthly_trend = spark.table("osu_energy.monthly_trend")
daily_trend   = spark.table("osu_energy.daily_trend")
hourly_trend  = spark.table("osu_energy.hourly_trend")
spikes        = spark.table("osu_energy.energy_spikes")

display(monthly_trend.limit(10))



We load precomputed Delta tables generated in Notebook 02.
This avoids redundant computation and mirrors real-world
data engineering pipelines.

In [0]:
from pyspark.sql.functions import col, avg

efficiency_by_building = (
    monthly_trend
    .withColumn("energy_per_sqm", col("total_energy_kwh") / col("square_meters"))
    .groupBy("building")
    .agg(avg("energy_per_sqm").alias("avg_energy_per_sqm"))
    .orderBy(col("avg_energy_per_sqm").desc())
)

display(efficiency_by_building)


## Structural Efficiency (Energy per Square Meter)

Energy per square meter normalizes usage by building size.
Buildings with consistently high values are likely structurally inefficient,
potentially due to outdated systems, insulation, or building design.


In [0]:
from pyspark.sql.functions import mean, stddev

stats = efficiency_by_building.select(
    mean("avg_energy_per_sqm").alias("mean"),
    stddev("avg_energy_per_sqm").alias("std")
).collect()[0]

threshold = stats["mean"] + stats["std"]

inefficient_structural = efficiency_by_building.filter(
    col("avg_energy_per_sqm") > threshold
)

display(inefficient_structural)