<a href="https://colab.research.google.com/github/xxxcrttt/Kaggle/blob/main/Time_Series.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

In [3]:
# import libraries
import numpy as np
import pandas as pd
from random import gauss
from pandas.plotting import autocorrelation_plot 
import warnings 
import itertools
from random import random 

import statsmodels.formula.api as smf 
import statsmodels.api as sm 
from statsmodels.graphics.tsaplots import plot_acf, plot_pacf
from statsmodels.tsa.ar_model import AR
from statsmodels.tsa.seasonal import seasonal_decompose
from statsmodels.tsa.stattools import adfuller

import matplotlib.pyplot as plt 
import seaborn as sns  

## Ground work 

**Time Series** -- a sequence of data points / observations recorded at different time intervals, which are frequently but not always: 

```
{Xt} t = 1, ..., T
```

A time series (discrete) realization of a (continuous) stochastic process generating the data and underlying the reason that we can infer from the former about the latter is the Kolmogorov extension theorem. 

生成数据的(连续)随机过程的时间序列(离散)实现以及可以从前者推断后者的根本原因是柯尔莫洛夫拓展定理。

**概率论公理**
1. 非负性: 对于任意事件 $P(A) >= 0$, 任意事件的概率都可以用 0-1 区间上的一个实数来表示
2. 归一化: $P(Ω) = 1$, 整体样本集合中的某个基本事件发生的概率为1 == 在样本集合之外已经不存在基本事件了
3. 可加性: 任意两两不相交事件 E1, E2, ... 的可数序列满足:
 $$P(E_{1}\cup E_{2}\cup \cdots )=\sum P(E_{i})$$
不相交子集的并的事件集合的概率为子集的概率的和 -- 如果存在子集之间的重叠 这一关系不成立

公式：
1. $$P(A\cup B)=P(A)+P(B)-P(A\cap B)$$
2. $$P(\Omega -E)=1-P(E)$$
3. $$P(A\cap B)=P(A)\cdot P(B\vert A)$$
4. 贝叶斯定理 A 和 B 是独立的当且仅当：$$P(A\cap B)=P(A)\cdot P(B)$$

## Patterns 
seperate the time series into components with easily understandable characteristics: 
$$Xt = Tt + St + Ct + It$$

* Tt: The trend shows a general direction of the time series data over a long period of time. It represents a long-term progression of the series(secular variation).    
显示了时间序列数据在很长一段时间内的大体方向 -- 它代表了一系列的长期变化
* St: the seasonal component with fixed and known period. It is observed when there is a distinct repeated pattern observed between regular intervals due to seasonal factors -- annual, monthly or weekly.   
具有固定且已知周期的季节性pattern，当由于季节性因素而在定期间隔之间观察到的明显的重复模式，比如日常用电量 or 季节性商品的年销售额
* Ct: (optical) cyclical component is a repetitive pattern which does not occur at fixed intervals -- usually observed in an economic context like business cycles.  
周期性成分是一种重复模式，他不会以固定的时间间隔发生 -- 通常是在商业周期等经济环境中被观察到
* It: the irregular component (residual) consists of the fluctuations in the time series that are observed after removing trend and seasonal / cyclical variations.   
残差由去除趋势和季节性、周期性变化后观察到的时间序列波动组成

**multiplication decomposition**:
$$Xt = Tt * St * It$$ is equivalent to:
$$log Xt = log Tt + log St + log It $$


In [4]:
help(seasonal_decompose)

Help on function seasonal_decompose in module statsmodels.tsa.seasonal:

seasonal_decompose(x, model='additive', filt=None, period=None, two_sided=True, extrapolate_trend=0)
    Seasonal decomposition using moving averages.
    
    Parameters
    ----------
    x : array_like
        Time series. If 2d, individual series are in columns. x must contain 2
        complete cycles.
    model : {"additive", "multiplicative"}, optional
        Type of seasonal component. Abbreviations are accepted.
    filt : array_like, optional
        The filter coefficients for filtering out the seasonal component.
        The concrete moving average method used in filtering is determined by
        two_sided.
    period : int, optional
        Period of the series. Must be used if x is not a pandas object or if
        the index of x does not have  a frequency. Overrides default
        periodicity of x if x is a pandas object with a timeseries index.
    two_sided : bool, optional
        The moving a