# UK-DALE - Exploratory Data Analysis## OverviewThis notebook explores the UK Domestic Appliance-Level Electricity (UK-DALE) dataset containing ~114M readings from 5 households with appliance-level monitoring.**Student**: Vatsal Mehta (220408633@aston.ac.uk)**Supervisor**: Dr. Farzaneh Farhadi**Project**: Grid Guardian - AZR Energy Forecasting & Anomaly Detection

In [None]:
# Setup and importsimport pandas as pdimport polars as plimport numpy as npimport matplotlib.pyplot as pltimport seaborn as snsfrom pathlib import Pathimport jsonimport warningswarnings.filterwarnings("ignore")plt.style.use("seaborn-v0_8-whitegrid")sns.set_palette("husl")plt.rcParams["figure.dpi"] = 150plt.rcParams["savefig.dpi"] = 300PROJECT_ROOT = Path("..").resolve()DATA_ROOT = PROJECT_ROOT / "data"UKDALE_PATH = DATA_ROOT / "processed" / "dataset=ukdale"FIGURES_DIR = PROJECT_ROOT / "docs" / "figures"FIGURES_DIR.mkdir(parents=True, exist_ok=True)print("Setup complete")

## Load and Validate Data**Purpose**: Load UK-DALE processed data and verify schema**Expected**: ~114M records at 30-minute intervals with appliance metadata

In [None]:
# Load sample dataprint("Loading UK-DALE sample (1M records)...")df_sample = pl.scan_parquet(str(UKDALE_PATH / "**/*.parquet")).head(1_000_000).collect()print(f"Loaded {len(df_sample):,} records")print(f"Columns: {df_sample.columns}")df_pd = df_sample.to_pandas()df_pd["extras_parsed"] = df_pd["extras"].apply(json.loads)df_pd["channel"] = df_pd["extras_parsed"].apply(lambda x: x.get("channel", "unknown"))df_pd["building"] = df_pd["entity_id"].str.split("_").str[0]print(f"Buildings: {df_pd["building"].unique()}")print(f"Unique appliances: {df_pd["channel"].nunique()}")

## Consumption Analysis**Purpose**: Understand appliance-level consumption patterns

In [None]:
# Consumption statisticsprint("=== Energy Consumption Statistics ===")print(df_pd["energy_kwh"].describe())print("=== Top 15 Appliances by Total Consumption ===")appliance_totals = df_pd.groupby("channel")["energy_kwh"].agg(["sum", "mean", "median", "std", "count"]).sort_values("sum", ascending=False)display(appliance_totals.head(15))total_energy = df_pd["energy_kwh"].sum()appliance_totals["pct_contribution"] = (appliance_totals["sum"] / total_energy) * 100print(f"Top 5 appliances: {appliance_totals.head(5)["pct_contribution"].sum():.1f}% of consumption")

## Key Findings### Appliance-Level Insights- **Always-on baseline**: Fridge/freezer provide constant baseline- **Scheduled appliances**: Washing machine/dishwasher show time-of-day patterns- **High-power bursts**: Kettle/oven show short duration events### Next Steps1. Complete LCL exploration2. Implement appliance-specific anomaly detection3. Design hierarchical forecasting (aggregate + disaggregated)