<a href="https://colab.research.google.com/github/vshlemon/colabs/blob/main/notebooks/time-series-forecasting/introductory.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

# Setup

In [None]:
install.packages("fpp3")

In [None]:
library(fpp3)

# Intro
- https://otexts.com/fpp3/graphics.html

## Basics & R

In [None]:
# how to create a tsibble object
y <- tsibble(
  Year = 2015:2019,
  Observation = c(123, 39, 78, 52, 110),
  index = Year
)

In [None]:
# Here we manipulate the data using some functions made available from the `dplyr` package
# Some datasets, such as PBS, are readily available as vars via. the fpp3 library

# We can refer to column names directly when working with a dataset object
# We can pipe output from one operation to another using `|>`
# We can save to a new variable using `->` (typically `<-` will be used for setting variables)

PBS |>
  filter(ATC2 == "A10") |>
  select(Month, Concession, Type, Cost) |>
  summarise(TotalC = sum(Cost)) |>
  mutate(Cost = TotalC/1e6) ->
  a10 # output variable name

# a10 # uncomment to view

In [None]:
# Reading to a dataset (tsibble) object from csv

prison <- readr::read_csv("https://OTexts.com/fpp3/extrafiles/prison_population.csv")


# We can use `<-` to store to a variable though it is not the cleanest for long chained operations
# You can use `-` next to a column name to exclude it from the selection

# Use `as_tsibble` to convert to a tsibble object, `key` is the granularity for grouping of the data,
# `index` is the unique identifier within a grouping to organise by

prison <- prison |>
          mutate(Quarter = yearquarter(Date)) |>
          select(-Date) |>
          as_tsibble(
            key = c(State, Gender, Legal, Indigenous),
            index = Quarter
          )
# prison

## Time Series Intro

**Trend**: is when there is a continual pattern in the data over a period of time - usually a consistent linear increase or decrease for example

**Seasonality**: is when there is a repeating pattern in the data at regular frequencies related to the human calendar - eg. a pattern repeats hourly, yearly or so on

**Cyclic**: seems to be more like an overarching pattern of re-occuring cycles of behaviour over larger periods, encompassing many sub-patterns of seasonality within it. And with frequency that is not fixed or regular i.e. the repetition could happen over expanding windows (e.g. originally repeats after a year for 2 years, then for 4 etc.) or in other ways

In [None]:
options(repr.plot.width=15, repr.plot.height=8)

In [None]:
# We can make plots that segregate data by their seasonality of different granularities, the dataset contains
# half-hourly records of energy demand, the `gg_season` function is provided by the `fpp3` library & can infer
# what your date field is to organise periodicity by, it then can present the graph as grouped by the grain
# eg. by year, by year+month etc. needed to visualise the periods within (quarters, weeks etc.)

vic_elec |>
  gg_season(
    y=Demand,
    period='day',
    labels='right'
  ) +
  theme(legend.position="none", aspect.ratio=0.5) +
  labs(y="MWh", title="Electricity demand: Victoria (Daily Patterns)")

vic_elec |>
  gg_season(
    y=Demand,
    period='week',
    labels='right'
  ) +
  theme(legend.position="none", aspect.ratio=0.5) +
  labs(y="MWh", title="Electricity demand: Victoria (Weekly Patterns)")

vic_elec |>
  gg_season(
    y=Demand,
    period='month',
    labels='right'
  ) +
  theme(legend.position="none", aspect.ratio=0.5) +
  labs(y="MWh", title="Electricity demand: Victoria (Monthly Patterns)")

vic_elec |>
  gg_season(
    y=Demand,
    period='year',
    labels='right'
  ) +
  theme(legend.position="none", aspect.ratio=0.5) +
  labs(y="MWh", title="Electricity demand: Victoria (Yearly Patterns)")

You can see that there are strong seasonality patterns especially at the daily granularity (got by setting period to consider as a week or as a month - you can see the repeating pattern of a days' activity in each)

There is a general rise and fall across the duration of a day but it is made more obvious when zoomed out and not viewing with a day's period. The yearly period graph appears random but is fairly consistent with the daily seasonality, and is just noisy in appearance due to how compact it is.

So it seems the seasonality is at the daily granularity alone, or at least most strongly, and this is probably because people consume electricity at similar hours of the day every day & repeat this behaviour on and on unaffected by other grains of the calendar i.e. consumption is unlikely to go up on a tuesday compared to wednesday, or a january compared to june, for the most part consumption depends on the hour of the day.