<a href="https://colab.research.google.com/github/willxu1234/trading-bot/blob/main/stationarity_cointegration.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

Files used for this

https://drive.google.com/drive/folders/1jYgwuJ3lNQj2zxdeJFU3pZ6ExZDYucoO?usp=sharing


In [None]:
!pip install statsmodels




In [67]:
import pandas as pd
import statsmodels.tsa.stattools as ts
from statsmodels.tsa.stattools import adfuller

**Import data from source**


In [39]:
df_x = pd.read_csv('BAC.csv', index_col=0) #import data into dataframe from api/source
df_y = pd.read_csv('VNQ.csv', index_col=0) #import data into dataframe from api/source

df_x.head


<bound method NDFrame.head of                  Open       High        Low      Close  Adj Close   Volume
Date                                                                      
2011-11-21  54.580002  54.580002  53.650002  53.990002  37.555283  2054100
2011-11-22  53.770000  54.250000  53.389999  53.740002  37.381382  2021300
2011-11-23  53.290001  53.290001  52.060001  52.119999  36.254509  1728000
2011-11-25  52.000000  53.299999  51.889999  52.349998  36.414501   680200
2011-11-28  53.880001  54.169998  53.080002  53.520000  37.228348  1721900
...               ...        ...        ...        ...        ...      ...
2020-11-16  87.169998  87.440002  85.320000  86.339996  86.339996  2978300
2020-11-17  85.699997  86.760002  85.110001  86.419998  86.419998  3856900
2020-11-18  86.589996  87.080002  84.680000  84.739998  84.739998  9655500
2020-11-19  84.620003  85.169998  83.730003  85.070000  85.070000  4225400
2020-11-20  85.029999  85.150002  84.230003  84.610001  84.610001  313

**Extract column desired**

In [59]:
df_x_AdjClose = df_x['Adj Close']
df_y_AdjClose = df_y['Adj Close']

df_x_AdjClose.head

<bound method NDFrame.head of Date
2011-11-21     4.862592
2011-11-22     4.756307
2011-11-23     4.552591
2011-11-25     4.579163
2011-11-28     4.650021
                ...    
2020-11-16    27.580000
2020-11-17    27.549999
2020-11-18    26.980000
2020-11-19    26.980000
2020-11-20    26.809999
Name: Adj Close, Length: 2266, dtype: float64>

**Test for unit root of time series** 

*   A prerequisite for co-integration is that all of the series under consideration are integrated of order one or greater to begin with (reject unit root)

*   A unit root test tests whether a time series variable is non-stationary and possesses a unit root. The null hypothesis is generally defined as the presence of a unit root and the alternative hypothesis is either stationarity, trend stationarity or explosive root depending on the test used.



In [88]:
# adf test for x
statsmodels.tsa.stattools.adfuller(df_x_AdjClose, regression='ct', autolag='AIC')


(-2.68744423998744,
 0.24133274276375566,
 27,
 2238,
 {'1%': -3.9628208620185714,
  '10%': -3.128206102889325,
  '5%': -3.412453559692873},
 2110.896388790241)


Note: we use regression='ct' for adf test of constant and time trend for asset prices...See below

More research can be done here for our case as this can knock/keep some pairs out


---



Case II: Constant and Time Trend

The test regression is

yt = c + δt + φyt−1 + εt

and includes a constant and deterministic time trend to capture the deterministic trend under the alternative. The hypotheses to be tested are

H0 : φ = 1 ⇒ yt ∼ I(1) with drift

H1 : |φ| < 1 ⇒ yt ∼ I(0) with deterministic time trend

**This formulation is appropriate for trending time series like asset prices or
the levels of macroeconomic aggregates like real GDP.**

Source: 

Page 119: https://faculty.washington.edu/ezivot/econ584/notes/unitroot.pdf

In [89]:
# adf test for y
statsmodels.tsa.stattools.adfuller(df_y_AdjClose, regression='ct', autolag='AIC')

(-3.9878656248817688,
 0.009149138880705393,
 27,
 2238,
 {'1%': -3.9628208620185714,
  '10%': -3.128206102889325,
  '5%': -3.412453559692873},
 5541.99200258862)

We can see x nonstationary so we should no consider x(BAC) for co-integration pairs

**We will test for cointegration regardless to see that the test fails**

In [61]:
result = ts.coint(df_x_AdjClose,df_y_AdjClose)
print(result)

(-2.271839063588416, 0.3876857705201648, array([-3.90128181, -3.33882895, -3.04632302]))


**Returns**

**coint_t**: float
The t-statistic of unit-root test on residuals.

**pvalue**: float
MacKinnon”s approximate, asymptotic p-value based on MacKinnon (1994).

**crit_value**: dict
Critical values for the test statistic at the 1 %, 5 %, and 10 % levels based on regression curve. This depends on the number of observations.



---


**Since we fail to reject the null, BAC and VNQ are not co-integrated.**



---




**Notes**

* The Null hypothesis is that there is no cointegration, the alternative hypothesis is that there is cointegrating relationship. If the pvalue is small, below a critical size, then we can reject the hypothesis that there is no cointegrating relationship.

* P-values and critical values are obtained through regression surface approximation from MacKinnon 1994 and 2010.

* If the two series are almost perfectly collinear, then computing the test is numerically unstable. However, the two series will be cointegrated under the maintained assumption that they are integrated. In this case the t-statistic will be set to -inf and the pvalue to zero.

* Assumes no nans and no gaps in time series.





**References**

1
MacKinnon, J.G. 1994 “Approximate Asymptotic Distribution Functions for Unit-Root and Cointegration Tests.” Journal of Business & Economics Statistics, 12.2, 167-76.

2
MacKinnon, J.G. 2010. “Critical Values for Cointegration Tests.” Queen”s University, Dept of Economics Working Papers 1227. http://ideas.repec.org/p/qed/wpaper/1227.html

**Testing functionality on known co-integration pair**

In [90]:
# adf test for EWA
statsmodels.tsa.stattools.adfuller(df_EWA_AdjClose, regression='ct', autolag='AIC')

(-3.360937987151856,
 0.05682981995898565,
 27,
 2490,
 {'1%': -3.962410376909648,
  '10%': -3.128089028078249,
  '5%': -3.4122546731863177},
 192.1867764830531)

In [91]:
# adf test for EWC
statsmodels.tsa.stattools.adfuller(df_EWC_AdjClose, regression='ct', autolag='AIC')

(-3.2458494087048284,
 0.07564664512182458,
 27,
 2490,
 {'1%': -3.962410376909648,
  '10%': -3.128089028078249,
  '5%': -3.4122546731863177},
 849.2965906165118)

Both EWA and EWC are stationary





In [62]:
df_EWA = pd.read_csv('EWA.csv', index_col=0) #import data into dataframe from api/source
df_EWC = pd.read_csv('EWC.csv', index_col=0) #import data into dataframe from api/source

df_EWA.head

<bound method NDFrame.head of                  Open       High        Low      Close  Adj Close   Volume
Date                                                                      
2010-11-22  24.520000  24.629999  24.209999  24.590000  15.281363  2429800
2010-11-23  23.889999  23.990000  23.620001  23.709999  14.734494  8565900
2010-11-24  24.170000  24.400000  24.150000  24.320000  15.113575  3301800
2010-11-26  23.690001  23.830000  23.639999  23.650000  14.697206  3722200
2010-11-29  23.660000  23.940001  23.459999  23.920000  14.864994  7119900
...               ...        ...        ...        ...        ...      ...
2020-11-16  22.450001  22.540001  22.360001  22.520000  22.520000  2993600
2020-11-17  22.480000  22.629999  22.410000  22.559999  22.559999  1445000
2020-11-18  22.730000  22.870001  22.580000  22.590000  22.590000  4644300
2020-11-19  22.590000  22.760000  22.530001  22.760000  22.760000  2692800
2020-11-20  22.770000  22.870001  22.719999  22.799999  22.799999  237

In [63]:
df_EWA_AdjClose = df_EWA['Adj Close']
df_EWC_AdjClose = df_EWC['Adj Close']


result2 = ts.coint(df_EWA_AdjClose,df_EWC_AdjClose)
print(result2)

(-3.3820048162454066, 0.04438886626137574, array([-3.90079646, -3.33855861, -3.04613545]))


**Results**



P-value = .04439 => **EWA and EWC are co-integrated with confidence of 95%.**
 

