# Efficiency Analysis

This notebook identifies energy inefficiencies across campus buildings by normalizing energy usage and analyzing temporal patterns.

### We focus on:
- Efficiency per square meter
- Efficiency by day and hour  
- Pareto (80/20) contributors to excess energy usage

---

**Note:** All inputs are sourced from aggregated Delta tables produced in Notebook 02 (energy trends).

In [0]:
from pyspark.sql import functions as F
from pyspark.sql.window import Window

# Silver joined table (Notebook 1)
silver = spark.table("workspace.default.silver_energy_joined_2025")

# Aggregated trends (Notebook 2)
daily_trend   = spark.table("osu_energy.daily_trend")
monthly_trend = spark.table("osu_energy.monthly_trend")
hourly_trend  = spark.table("osu_energy.hourly_trend")

display(silver.limit(5))

We extract a unique building reference table containing
gross area (square meters) and metadata.

This ensures all efficiency metrics are normalized correctly.

In [0]:
buildings = (
    silver
    .select(
        "sitename",
        F.col("grossarea").alias("square_meters"),
        "campusname"
    )
    .dropDuplicates(["sitename"])
    .filter(F.col("square_meters").isNotNull())
)

display(buildings.limit(5))


Total energy usage alone is misleading because buildings vary in size.
Energy per square meter (EUI-style metric) enables fair comparison
across buildings.

In [0]:
efficiency_monthly = (
    monthly_trend
    .join(buildings, on="sitename", how="left")
    .withColumn(
        "energy_per_sqm",
        F.col("total_usage") / F.col("square_meters")
    )
    .filter(F.col("square_meters").isNotNull())
)

display(efficiency_monthly.limit(10))


We define the campus-wide average energy per square meter
as the efficiency baseline.

Buildings significantly above this baseline
are flagged as inefficient.

In [0]:
baseline = (
    efficiency_monthly
    .select(F.avg("energy_per_sqm").alias("baseline"))
    .collect()[0]["baseline"]
)

efficiency_flagged = (
    efficiency_monthly
    .withColumn(
        "inefficiency_ratio",
        F.col("energy_per_sqm") / F.lit(baseline)
    )
)

display(
    efficiency_flagged
    .orderBy(F.col("inefficiency_ratio").desc())
)


Daily efficiency highlights operational inefficiencies,
such as buildings consuming excess energy on low-occupancy days.


In [0]:
daily_efficiency = (
    daily_trend
    .join(buildings, on="sitename", how="left")
    .withColumn(
        "energy_per_sqm",
        F.col("total_usage") / F.col("square_meters")
    )
    .filter(F.col("square_meters").isNotNull())
)

display(daily_efficiency.limit(10))


## Hourly Efficiency (Campus-Level)

Hourly trends are aggregated across all buildings.
This analysis focuses on campus-wide energy efficiency by hour of day
to identify systemic scheduling inefficiencies such as nighttime base load.


In [0]:
from pyspark.sql import functions as F

# Total campus area (constant denominator)
total_campus_area = (
    buildings
    .select(F.sum("square_meters").alias("total_area"))
    .collect()[0]["total_area"]
)

print("Total campus area (sqm):", total_campus_area)

# Campus-level hourly efficiency
hourly_efficiency = (
    hourly_trend
    .withColumn(
        "energy_per_sqm",
        F.col("total_energy") / F.lit(total_campus_area)
    )
    .orderBy("hour")
)

display(hourly_efficiency)


### Interpretation (for hourly efficiency)

Elevated energy usage during off-hours indicates
baseline loads from HVAC, lighting, or equipment
operating when buildings are likely unoccupied.

This suggests opportunities for:
- Scheduling optimizatio
- Automation and controls
- Policy-based energy reduction


##Pareto Analysis
Pareto analysis identifies the small subset of buildings
responsible for the majority of excess energy usage.

This enables high-impact interventions with limited resources.


In [0]:
pareto_df = (
    efficiency_monthly
    .withColumn(
        "excess_energy_per_sqm",
        F.col("energy_per_sqm") - F.lit(baseline)
    )
    .filter(F.col("excess_energy_per_sqm") > 0)
    .withColumn(
        "excess_energy_kwh",
        F.col("excess_energy_per_sqm") * F.col("square_meters")
    )
    .orderBy(F.col("excess_energy_kwh").desc())
)

total_excess = pareto_df.select(
    F.sum("excess_energy_kwh")
).collect()[0][0]

window = Window.orderBy(F.col("excess_energy_kwh").desc())

pareto_df = (
    pareto_df
    .withColumn(
        "cumulative_excess_kwh",
        F.sum("excess_energy_kwh").over(window)
    )
    .withColumn(
        "cumulative_pct",
        F.col("cumulative_excess_kwh") / F.lit(total_excess) * 100
    )
)

display(pareto_df)


In [0]:
priority_buildings = pareto_df.filter(F.col("cumulative_pct") <= 80)

display(priority_buildings)


## Key Findings

- Energy inefficiency is highly concentrated across campus
- A small number of buildings drive the majority of excess usage
- Inefficiencies vary strongly by time of day
- Several buildings consume high energy during low-occupancy hours

These patterns suggest both structural and operational inefficiencies.


In [0]:
# Save outputs to workspace.default (user-writable schema)

(priority_buildings.write
 .mode("overwrite")
 .format("delta")
 .saveAsTable("workspace.default.priority_buildings"))

(daily_efficiency.write
 .mode("overwrite")
 .format("delta")
 .saveAsTable("workspace.default.daily_efficiency"))

(hourly_efficiency.write
 .mode("overwrite")
 .format("delta")
 .saveAsTable("workspace.default.hourly_efficiency"))

print("Saved efficiency tables to workspace.default")


### Persisting Analysis Outputs

Efficiency and prioritization tables are saved for
dashboarding and downstream analysis.

Tables are written to a user-owned schema to ensure
reliable access and reproducibility.

In [0]:
spark.sql("GRANT USE CATALOG ON CATALOG workspace TO `ali.1189@osu.edu`")
spark.sql("GRANT USE SCHEMA ON SCHEMA workspace.osu_energy TO `ali.1189@osu.edu`")
spark.sql("GRANT SELECT ON TABLE workspace.default.daily_efficiency TO `ali.1189@osu.edu`")
spark.sql("GRANT SELECT ON TABLE workspace.default.hourly_efficiency TO `ali.1189@osu.edu`")
spark.sql("GRANT SELECT ON TABLE workspace.default.priority_buildings TO `ali.1189@osu.edu`")

spark.sql("GRANT USE CATALOG ON CATALOG workspace TO `trinh.134@buckeyemail.osu.edu`")
spark.sql("GRANT USE SCHEMA ON SCHEMA workspace.osu_energy TO `trinh.134@buckeyemail.osu.edu`")
spark.sql("GRANT SELECT ON TABLE workspace.default.daily_efficiency TO `trinh.134@buckeyemail.osu.edu`")
spark.sql("GRANT SELECT ON TABLE workspace.default.hourly_efficiency TO `trinh.134@buckeyemail.osu.edu`")
spark.sql("GRANT SELECT ON TABLE workspace.default.priority_buildings TO `trinh.134@buckeyemail.osu.edu`")