# Cointegration Analysis - Engle-Granger Methodology

Implementing the Engle-Granger two-step procedure for all 42 currency pair combinations.

## Steps
1. Load data and setup
2. Individual ADF tests (stationarity check)
3. Pairwise cointegration tests
4. Results summary and visualization

In [36]:
import pandas as pd
import numpy as np
import matplotlib.pyplot as plt
import seaborn as sns
from datetime import datetime
from statsmodels.tsa.stattools import adfuller
from sklearn.linear_model import LinearRegression
from itertools import combinations
import warnings
warnings.filterwarnings('ignore')

# Configuration
import sys
sys.path.append('../')
from config import SIGNIFICANCE_LEVEL

# Display settings
plt.style.use('seaborn-v0_8')

In [37]:
data = pd.read_csv('../data/forex_data.csv')
data.set_index('Date', inplace=True)
data.head()

Unnamed: 0_level_0,EURUSD,GBPUSD,JPYUSD,CHFUSD,CADUSD,AUDUSD,NZDUSD
Date,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1,Unnamed: 7_level_1
2007-01-01 00:00:00+00:00,1.321895,1.964212,0.008412,0.821963,0.858885,0.791014,0.705617
2007-01-02 00:00:00+00:00,1.327598,1.973399,0.008415,0.824334,0.858516,0.796178,0.706814
2007-01-03 00:00:00+00:00,1.317107,1.95221,0.008379,0.816327,0.852806,0.792205,0.706414
2007-01-04 00:00:00+00:00,1.309295,1.942993,0.008415,0.811688,0.849185,0.783392,0.695797
2007-01-05 00:00:00+00:00,1.298499,1.9308,0.008425,0.808146,0.851934,0.780275,0.687711


In [38]:
#Log transformation
log_data = np.log(data)
log_data.head()

Unnamed: 0_level_0,EURUSD,GBPUSD,JPYUSD,CHFUSD,CADUSD,AUDUSD,NZDUSD
Date,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1,Unnamed: 7_level_1
2007-01-01 00:00:00+00:00,0.279066,0.675091,-4.778115,-0.19606,-0.15212,-0.23444,-0.348683
2007-01-02 00:00:00+00:00,0.283371,0.679757,-4.777694,-0.193179,-0.152549,-0.227932,-0.346988
2007-01-03 00:00:00+00:00,0.275437,0.668962,-4.781977,-0.202941,-0.159224,-0.232935,-0.347553
2007-01-04 00:00:00+00:00,0.269489,0.664229,-4.777694,-0.208639,-0.163478,-0.244122,-0.362697
2007-01-05 00:00:00+00:00,0.261209,0.657934,-4.776515,-0.213012,-0.160246,-0.248109,-0.374387


In [39]:
for symbol in log_data.columns:
    adfuller_result = adfuller(data[symbol])
    print(f"{symbol}: ADF Statistic = {adfuller_result[0]:.4f}, p-value = {adfuller_result[1]:.4f}")
    if adfuller_result[1] < SIGNIFICANCE_LEVEL:
        print(f"Stationary (reject H0)")
    else:
        print(f"Non-stationary (fail to reject H0)")

EURUSD: ADF Statistic = -1.6724, p-value = 0.4454
Non-stationary (fail to reject H0)
GBPUSD: ADF Statistic = -2.0836, p-value = 0.2512
Non-stationary (fail to reject H0)
JPYUSD: ADF Statistic = -0.9171, p-value = 0.7823
Non-stationary (fail to reject H0)
CHFUSD: ADF Statistic = -2.7533, p-value = 0.0653
Non-stationary (fail to reject H0)
CADUSD: ADF Statistic = -1.3020, p-value = 0.6282
Non-stationary (fail to reject H0)
AUDUSD: ADF Statistic = -1.4716, p-value = 0.5475
Non-stationary (fail to reject H0)
NZDUSD: ADF Statistic = -2.3337, p-value = 0.1613
Non-stationary (fail to reject H0)


In [43]:
def cointegration_test(x, y, significance_level=SIGNIFICANCE_LEVEL):
    """
    Perform the Engle-Granger cointegration test on the log prices.
    
    Parameters:
    - x: First time series.
    - y: Second time series.
    - significance_level: Significance level for the test.
    
    Returns:
    - bool: True if cointegrated, False otherwise.
    """
    x_val = pd.DataFrame(log_data[x])
    y_val = log_data[y]
    reg = LinearRegression()
    reg.fit(x_val,y_val)
    prediction = reg.predict(x_val)
    res = y_val - prediction
    adfuller_result = adfuller(res)
    print(f"x:{x}, y:{y}, p-value:{adfuller_result[1]}")
    return (adfuller_result[1] < significance_level)

In [51]:
pairs = combinations(log_data.columns,2)
cointegrated_pairs = []
for pair in pairs:
    x,y = pair
    if cointegration_test(x,y):
        print(f"{pair} is a cointegrated pair")
        cointegrated_pairs.append(pair)
    else:
        print(f"{pair} is NOT a cointegrated pair")

print(f"Total number of cointegrated pairs: {len(cointegrated_pairs)}")

x:EURUSD, y:GBPUSD, p-value:0.014853483578117643
('EURUSD', 'GBPUSD') is a cointegrated pair
x:EURUSD, y:JPYUSD, p-value:0.39157470033996805
('EURUSD', 'JPYUSD') is NOT a cointegrated pair
x:EURUSD, y:CHFUSD, p-value:0.01611444953654824
('EURUSD', 'CHFUSD') is a cointegrated pair
x:EURUSD, y:CADUSD, p-value:0.06841418385749515
('EURUSD', 'CADUSD') is NOT a cointegrated pair
x:EURUSD, y:AUDUSD, p-value:0.2108101193990899
('EURUSD', 'AUDUSD') is NOT a cointegrated pair
x:EURUSD, y:NZDUSD, p-value:0.11943612318238317
('EURUSD', 'NZDUSD') is NOT a cointegrated pair
x:GBPUSD, y:JPYUSD, p-value:0.49498817807795487
('GBPUSD', 'JPYUSD') is NOT a cointegrated pair
x:GBPUSD, y:CHFUSD, p-value:0.012313772946459512
('GBPUSD', 'CHFUSD') is a cointegrated pair
x:GBPUSD, y:CADUSD, p-value:0.04862954890934108
('GBPUSD', 'CADUSD') is a cointegrated pair
x:GBPUSD, y:AUDUSD, p-value:0.13175081488496526
('GBPUSD', 'AUDUSD') is NOT a cointegrated pair
x:GBPUSD, y:NZDUSD, p-value:0.060972203345310325
('GBPU

## Note

This exploratory analysis tests cointegration on the full dataset (2007-2024). 

**Important**: The actual paper methodology tests cointegration only on rolling training windows (63, 128, 257 days) as part of the trading strategy implementation.

The proper methodology will be implemented in the next notebook following the paper's framework.