# Lecture 5: Time Series Analysis

Today we focus on a special type of datasets: time series.

Time series occur in environmental monitoring a lot, but they are also used in business developement, market analysis. So what you learn today, is relevant for a wide range of fields. 

- Time series analysis is a classical data science topic: We dig into datasets to just find the patterns which occur, not necessarily driven by a prior hypothesis. 
- Time series data can be used for forecasting: Predict the future from the past!

The annual global temperature anomaly (difference to the 1960-1975 mean) is a classical time series with observation points every year:

<div>
<img src="attachment:d714e49c-a8ff-4165-a80b-5e7bda4fbd02.jpg" width="500"/>
</div>

Responsible for the significant warming since the 1960s is the increase in global CO2 levels (here represented by measurements at the Mauna Loa Observatory, Hawaii) , which is displayed in the next time series (this time with monthly values, showing the clear seasonality and the increasing trend):

<div>
<img src="attachment:0d4cf9e0-6db0-4282-844d-349d61f50ace.png" width="500"/>
</div>

## 5.1 What makes time series special?

A time series is any type of time-resolved data. It could be the number of coffees you drink every day. It could be the market value of bitcoin on a daily-basis. But it can also be the number of subway passengers in Vienna on a hourly-basis. 

### 5.1.1 Natural order of data

What makes them unique, is that the "independent" variable is structured: There is a direction, i.e. there is a sense of "previous" and "future". The "dependent" variables can not be only a simple scalar, but vectors, sets of datas etc. Generally a time series is structured such that some variable $x_t$ where the subscript $t$ represents time, with $t=1$ being the first observation available on x and $t=T$ being the last. In an ideal world most time series would have regular observation intervals (each day, each month, each year, etc.), but we will see later that this doesn't always hold and that this requires methods to deal with such irregularities. 

### 5.1.2 Trend, seasonality, cycles and irregularities

Time series can be viewed as a composition of a trend ($T(t)$, long-term increase or decrease in the data), a seasonality ($S(t)$, repeating pattern at fixed intervals, e.g., daily, weekly, yearly), cyclic behavior ($C(t)$, repeated fluctuations which are not tied to a fixed calender based interval) and noise ($\epsilon(t)$, random variation or error).

The time series can be decomposed into these components either badditively or multiplicatively:

$$
x_t = T(t) + S(t) + C(t) + \epsilon(t)
$$

$$
x_t = T(t) \cdot S(t) \cdot C(t) \cdot \epsilon(t)
$$

### 5.1.3 Autocorrelation

Autocorrelation is the correlation of a time series with a shifted (lagged) version of itself. Often subsequent values in a time series are very much related to each other, i.e. the autocorrelation at small lags is high. However, also periodic patterns (seasonality) is visible in the autocorrelation with autocorrelation values at lags which correspond to the interval length of the periodic behavior.

Autocorrelation can be measured via:

$$

$$

### 5.1.4 Stationary vs non-stationary

## 5.2 Time series: pandas functionality

`pandas` is very convenient for dealing with time resolved data.

First, it has its built-in data type, the `pd.Timestamp`, which inheritly knows what time is and how it works, i.e. it knows how our calendars and clocks work. 

A particular feature of the Series and DataFrames data structures is that they both have a labeled axis called index. A specific type of index that you will often see with time series data is the DatetimeIndex which you will explore further in this chapter. Generally, the index makes slicing and dicing operations very intuitive. For example, to make a DataFrame ready for time series analysis, you will learn how to create DataFrames with an index of type DatetimeIndex.