# Electricity Demand in Victoria, Australia 

In this notebook we will prepare and store the electricity demand dataset found [here](https://github.com/tidyverts/tsibbledata/tree/master/data-raw/vic_elec/VIC2015).

**Citation:**

Godahewa, Rakshitha, Bergmeir, Christoph, Webb, Geoff, Hyndman, Rob, & Montero-Manso, Pablo. (2021). Australian Electricity Demand Dataset (Version 1) [Data set]. Zenodo. https://doi.org/10.5281/zenodo.4659727

**Description of data:**

A description of the data can be found [here](https://rdrr.io/cran/tsibbledata/man/vic_elec.html). The data contains electricity demand in Victoria, Australia, at 30 minute intervals over a period of 12 years, from 2002 to early 2015. There is also the temperature in Melbourne at 30 minute intervals and public holiday dates.

# Download the data via the URL below and pandas

In [1]:
import pandas as pd
import numpy as np

In [2]:
# Electricity demand.
url = "https://raw.githubusercontent.com/tidyverts/tsibbledata/master/data-raw/vic_elec/VIC2015/demand.csv"
demand = pd.read_csv(url)

# Temperature of Melbourne (BOM site 086071).
url = "https://raw.githubusercontent.com/tidyverts/tsibbledata/master/data-raw/vic_elec/VIC2015/temperature.csv"
temp = pd.read_csv(url)
df = demand.merge(temp, on=["Date", "Period"], how="left")

In [3]:
# Public holidays in Australia
url = "https://raw.githubusercontent.com/tidyverts/tsibbledata/master/data-raw/vic_elec/VIC2015/holidays.txt"
holidays = pd.read_csv(url, header=None, parse_dates=[0], dayfirst=True)
holidays.columns = ["date"]

# Process and save the data

We will only use the `OperationLessIndustrial` demand. So let's drop `Industrial`.

In [4]:
df.drop(columns=["Industrial"], inplace=True)

Let's extract the date and date-time.

In [5]:
# Convert the integer Date to an actual date with datetime type
df["date"] = df["Date"].apply(
    lambda x: pd.Timestamp("1899-12-30") + pd.Timedelta(x, unit="days")
)

# Create a timestamp from the integer Period representing 30 minute intervals
df["date_time"] = df["date"] + pd.to_timedelta((df["Period"] - 1) * 30, unit="m")

Drop the null rows.

In [6]:
df.dropna(inplace=True)

Create holidays column.

In [7]:
holidays["is_holiday"] = 1
df = df.merge(holidays, on=["date"], how="left")
df["is_holiday"] = df["is_holiday"].fillna(0).astype(int)

We now just use the timestamp and the electricity demand and resample to hourly.

In [8]:
# Rename columns
timeseries = df[["date_time", "OperationalLessIndustrial", "Temp", "is_holiday"]]

timeseries.columns = ["date_time", "demand", "temperature", "is_holiday"]

# Resample to hourly
timeseries = (
    timeseries.set_index("date_time")
    .resample("H")
    .agg(
        {
            "demand": "sum",
            "temperature": "mean",
            "is_holiday": np.min,
        }
    )
)
timeseries.head()

Unnamed: 0_level_0,demand,temperature,is_holiday
date_time,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1
2002-01-01 00:00:00,6919.366092,32.6,1
2002-01-01 01:00:00,7165.974188,32.6,1
2002-01-01 02:00:00,6406.542994,32.6,1
2002-01-01 03:00:00,5815.537828,32.6,1
2002-01-01 04:00:00,5497.732922,32.6,1


Save the timeseries in the datasets folder.

In [9]:
timeseries.to_csv("../Datasets/victoria_electricity_demand.csv")