In [46]:
%matplotlib widget
import numpy as np
import pandas as pd
import statsmodels.api as sm
import statsmodels.tsa.stattools as st

import matplotlib
import matplotlib.pyplot as plt


# Time Series Analysis

Based on the Penn State course, [STAT 510 Applied Time Series Analysis](https://online.stat.psu.edu/stat510/).

Models in which time $t$ value $x_t$ related to past values and past estimation errors are called ARIMA (Autoregressive Integrated Moving Average) models or Box-Jenkins models.

**Time Series** At each instant $t$, we have a random variable $x_t$.

**Weak Stationarity**
1. The mean is constant: $E(x_t)$ does not depend on $t$
2. The variance is constant: $\text{var}(t)$ does not depend on $t$
3. Time translation invariance of auto-covariances: $\text{cov}(x_t,x_{t-h})$ is independent of $t$ 

**AR(1) model**
\begin{equation}
x_t = \delta + \phi_1 x_{t-1} + w_t
\end{equation}
where
$$
\begin{split}
w_t &\sim N(0,\sigma_w^2)\\
E(x_t,w_t) &= E(x_t)E(w_t) = 0
\end{split}
$$

**White noise**

A set of uncorrelated random variables $w_t$ with zero mean and finite variance, is called white noise. In addition, if the variables are also independent of each other, we have iid white noise. If the distribution is normal, the series is called Gaussian white noise, and may be denoted as
$$
w_t \stackrel{iid}\sim N(0,\sigma_w^2)
$$


**Properties of AR(1)**
1. $E(x_t) = \delta + \phi_1 E(x_{t-1}) \implies E(x_t) = \frac{\delta}{1-\phi_1}$
2. $\text{var}(x_t) = \phi_1^2 \text{var}(x_{t-1}) + \sigma_w^2 \implies \text{var}(x_t) = \frac{\sigma_w^2}{1-\phi_1^2} \implies |{\phi_1}| \lt 1$

    We will also look at the autocorrelation of the random variables at two time instants. For this, we start with the autocovariance $\gamma_x(t-h,t)=\text{cov}(x_{t-h},x_t)$. We have
    $$
    \begin{split}
    \gamma_x(t-1,t) &= \phi_1 \gamma_x(t-1,t-1) = \phi_1\text{var}(x_{t})\\
    \gamma_x(t-2,t) &= \phi_1 \gamma_x(t-2,t-1) = \phi_1^2\text{var}(x_{t})\\
    &\cdots \\
    \gamma_x(t-h,t) &= \phi_1^h \text{var}(x_t)
    \end{split}
$$
3. The autocorrelation $\rho_x(t-h,t)$ is given by
    $$
    \rho_x(t-h,t) = \frac{\gamma_x(t-h,t)}{\sqrt{\text{var}(x_{t-h})} \sqrt{\text{var}(x_{t})}} = \frac{\gamma_x(t-h,t)}{\text{var}(x_t)} = \phi_1^h
    $$

The last property gives us the autocorrelation function (**ACF**) of the model.

In [47]:
# 99 Years of data for worldwide magnitude 7+ earthquakes

foo = pd.read_csv("/home/vpoduri/Python-Stats/TSA_510_data/quakes.txt",header=None,sep="\s+")

# Concatenate rows from data frame to construct a series, drop NaN and reset the index

s1 = pd.concat([foo.iloc[r,:] for r in range(foo.shape[0])]).dropna().reset_index(drop=True)

In [73]:
s1.describe()

count    99.000000
mean     20.020202
std       7.263242
min       6.000000
25%      15.000000
50%      20.000000
75%      24.000000
max      41.000000
dtype: float64

In [80]:

plt.close()
fig = plt.figure()
_ = fig.suptitle("Earthquakes with magnitude > 7" )

a1 = fig.add_subplot(211,ylabel="Count of quakes",xlabel="Index")
a2 = fig.add_subplot(212)
plt.subplots_adjust(hspace=0.6)

_ = a1.plot(s1.index,s1.values,'.-r') 
_ = a1.hlines(s1.mean(),s1.index.min(),s1.index.max(),'r')   # add a line at the mean value

acf = st.acf(s1,nlags=10,fft=False)    # This calculated acf array can be plotted with  a2.plot(acf)

_ = sm.graphics.tsa.plot_acf(s1,ax=a2,alpha=0.05)   # statsmodels function with 95% bands


Canvas(toolbar=Toolbar(toolitems=[('Home', 'Reset original view', 'home', 'home'), ('Back', 'Back to previous …