# Reversal Trading Using Correlated Triples

We seek triples of coins such that the behavior of one is modeled well by a function of the other two based on a linear regression model. We explore four combinations of two different variants of two distinct paradigms, listed below:

1. Indicator variables and dependent variables to use - large price movements can add noise so we explore use of linear regression models with the following inputs and outputs:
    - Inputs: prices of two coins, output: price of a third coin
    - Inputs: log prices of two coins, output: log price of a third coin
2. Selection of correlated coins - if two coins are highly correlated with a third coin, selecting them both as independent predictors of the third coin may be redundant so we also consider selecting independent coins which have the least correlation with the dependent coin (above a certain threshold):
    - Above a certain threshold, for a given dependent coin, choose the highest correlated coins available as its predictors
    - Above a certain threshold, for a given dependent coin, choose the lowest correlated coins available as its predictors.

Regarding (2) above, one other possible approach worth exploring which is not explored in this notebook would be to pick the independent variables to not be highly correlated with each other rather than to the dependent coin.

Using predictive models in these cases, when we observe a deviation in the dependent variable from what is predicted by our model, we predict reversal and long the under-priced asset(s) and short the over-priced asset(s).

Regarding the first variant of (1), in the best case, when using prices as independent and dependent variables, the in-sample Sharpe Ratio (over the period from 2020-01-01 00:00:00 to 2022-12-31 20:00:00) is 0.26 with a corresponding out-sample Sharpe Ratio (over the period from 2023-01-01 00:00:00 to 2025-02-20 08:00:00) of 1.53, when using the lowest correlated coins to select.

Regarding the second variant of (1), in the best case, when using log prices as independent and dependent variables, the in-sample Sharpe Ratio is 0.356 and the out-sample Sharpe Ratio is 1.45.

# Coin Universe

We use a historical snapshot of coins from [December 01, 2019](https://coinmarketcap.com/historical/20191201/) to get a representative set of coins from before the period of analysis to avoid look-ahead bias.

In [1]:
strat_univ = [
    'BTCUSDT', 'ETHUSDT', 'XRPUSDT', 'BCHUSDT', 'LTCUSDT', 'EOSUSDT', 'BNBUSDT',
    'XLMUSDT', 'TRXUSDT', 'ADAUSDT', 'XTZUSDT', 'LINKUSDT', 'ATOMUSDT',
    'NEOUSDT', 'MKRUSDT', 'DASHUSDT', 'ETCUSDT', 'USDCUSDT', 'ONTUSDT', 'VETUSDT',
    'DOGEUSDT', 'BATUSDT', 'ZECUSDT',
    'SNXUSDT', 'QTUMUSDT', 'TUSDUSDT', 'ZRXUSDT',
    'THXUSDT', 'THRUSDT', 'ALGOUSDT', 'REPUSDT', 'NANOUSDT', 'KBCUSDT', 'BTGUSDT', 'RVNUSDT', 'OMGUSDT',
    'CNXUSDT', 'ABBCUSDT', 'XINUSDT', 'VSYSUSDT', 'SEELEUSDT', 'EONUSDT', 'ZBUSDT', 'EKTUSDT', 'DGBUSDT',
    'BTMUSDT', 'LSKUSDT', 'KMDUSDT', 'SAIUSDT', 'LUNAUSDT', 'KCSUSDT', 'FTTUSDT', 'QNTUSDT', 'SXPUSDT',
    'BDXUSDT', 'GAPUSDT', 'BCDUSDT', 'THETAUSDT', 'ICXUSDT', 'FSTUSDT', 'MATICUSDT', 'SCUSDT', 'EVRUSDT',
    'BTTUSDT', 'MOFUSDT', 'IOSTUSDT', 'MCOUSDT', 'WAVESUSDT', 'XVGUSDT', 'MONAUSDT', 'BTSUSDT', 'BCNUSDT',
    'HCUSDT', 'MAIDUSDT', 'NEXOUSDT', 'ARDRUSDT', 'DXUSDT', 'OKBUSDT', 'FXCUSDT', 'RLCUSDT', 'MBUSDT',
    'BXKUSDT', 'AEUSDT', 'ENJUSDT', 'STEEMUSDT', 'SLVUSDT', 'BRZEUSDT', 'ZILUSDT', 'VESTUSDT', 'ZENUSDT',
    'SOLVEUSDT', 'CHZUSDT', 'NOAHUSDT', 'LAUSDT', 'BTMXUSDT', 'ETNUSDT', 'ENGUSDT', 'ILCUSDT', 'NPXSUSDT',
    'CRPTUSDT', 'GNTUSDT', 'SNTUSDT', 'ELFUSDT', 'JWLUSDT', 'FETUSDT', 'BOTXUSDT', 'NRGUSDT', 'DGDUSDT',
    'EXMRUSDT', 'EURSUSDT', 'AOAUSDT', 'RIFUSDT', 'CIX100USDT', 'BFUSDT', 'XZCUSDT', 'FABUSDT', 'GRINUSDT',
    'NETUSDT', 'VERIUSDT', 'DGTXUSDT', 'KNCUSDT', 'RENUSDT', 'STRATUSDT', 'ETPUSDT', 'NEXUSDT', 'NEWUSDT',
    'BCZEROUSDT', 'GXCUSDT', 'TNTUSDT', 'BTC2USDT', 'PPTUSDT', 'USDKUSDT', 'ELAUSDT', 'IGNISUSDT', 'PLCUSDT',
    'BNKUSDT', 'DTRUSDT', 'RCNUSDT', 'HPTUSDT', 'LAMBUSDT', 'MANAUSDT', 'EDCUSDT', 'BEAMUSDT', 'TTUSDT',
    'AIONUSDT', 'BZUSDT', 'WTCUSDT', 'WICCUSDT', 'LRCUSDT', 'BRDUSDT', 'FCTUSDT', 'NULSUSDT', 'FTMUSDT',
    'IOTXUSDT', 'QBITUSDT', 'XMXUSDT', 'YOUUSDT', 'NASUSDT', 'WAXPUSDT', 'ARKUSDT', 'RDDUSDT', 'GNYUSDT',
    'AGVCUSDT', 'HYNUSDT', 'CVCUSDT', 'WANUSDT', 'WINUSDT', 'LINAUSDT', 'RUSDT', 'PAIUSDT', 'FSNUSDT',
    'FUNUSDT', 'DPTUSDT', 'BHDUSDT', 'LOOMUSDT', 'XACUSDT', 'BUSDUSDT', 'BHPUSDT', 'TRUEUSDT', 'LOKIUSDT',
    'QASHUSDT', 'BNTUSDT', 'DOTUSDT', 'SOLUSDT']

# Load And Structure Data

We extract closing prices over the interval from 2020-01-01 00:00:00 to 2025-02-20 08:00:00 with a frequency of 4 hours.

In [2]:
from binance.client import Client as bnb_client
from binance.client import BinanceAPIException
from datetime import datetime
import pandas as pd

client = bnb_client(tld='US')

def get_price_data_for_coin(coin, freq, start_datetime, end_datetime):
    bn_data = client.get_historical_klines(coin, freq, start_datetime, end_datetime)
    columns = ['open_time','open','high','low','close','volume','close_time','quote_volume',
        'num_trades','taker_base_volume','taker_quote_volume','ignore']

    bn_data = pd.DataFrame(bn_data, columns=columns)
    bn_data['open_time'] = bn_data['open_time'].map(lambda x: datetime.utcfromtimestamp(x / 1000))
    bn_data['close_time'] = bn_data['close_time'].map(lambda x: datetime.utcfromtimestamp(x / 1000))
    return bn_data


def get_price_data_for_universe(freq, start_datetime, end_datetime):
    px_data = {}

    for coin in strat_univ:
        try:
            px_data[coin] = get_price_data_for_coin(
                coin, freq, start_datetime, end_datetime).set_index('open_time')
            print(f"Downloaded data for {coin}.")
        except BinanceAPIException:
            print(f"Couldn't load data for {coin}.")
    
    return px_data



In [3]:
data_freq = '4h'
look_back_window_size = 30 * 6
look_back_min_periods = 10 * 6
start_datetime = '2020-01-01 00:00:00'
end_datetime = '2025-02-20 08:00:00'

In [4]:
px_data = get_price_data_for_universe(data_freq, start_datetime, end_datetime)

Downloaded data for BTCUSDT.
Downloaded data for ETHUSDT.
Downloaded data for XRPUSDT.
Downloaded data for BCHUSDT.
Downloaded data for LTCUSDT.
Downloaded data for EOSUSDT.
Downloaded data for BNBUSDT.
Downloaded data for XLMUSDT.
Downloaded data for TRXUSDT.
Downloaded data for ADAUSDT.
Downloaded data for XTZUSDT.
Downloaded data for LINKUSDT.
Downloaded data for ATOMUSDT.
Downloaded data for NEOUSDT.
Downloaded data for MKRUSDT.
Downloaded data for DASHUSDT.
Downloaded data for ETCUSDT.
Downloaded data for USDCUSDT.
Downloaded data for ONTUSDT.
Downloaded data for VETUSDT.
Downloaded data for DOGEUSDT.
Downloaded data for BATUSDT.
Downloaded data for ZECUSDT.
Downloaded data for SNXUSDT.
Downloaded data for QTUMUSDT.
Downloaded data for TUSDUSDT.
Downloaded data for ZRXUSDT.
Couldn't load data for THXUSDT.
Couldn't load data for THRUSDT.
Downloaded data for ALGOUSDT.
Couldn't load data for REPUSDT.
Couldn't load data for NANOUSDT.
Couldn't load data for KBCUSDT.
Couldn't load data 

In [5]:
px_close = {coin: px_data[coin]["close"] for coin in px_data}
px_close = pd.DataFrame(px_close).astype(float)
px_close.index.rename("datetime", inplace=True)

px_close

Unnamed: 0_level_0,BTCUSDT,ETHUSDT,XRPUSDT,BCHUSDT,LTCUSDT,EOSUSDT,BNBUSDT,XLMUSDT,TRXUSDT,ADAUSDT,...,MANAUSDT,LRCUSDT,FTMUSDT,IOTXUSDT,WAXPUSDT,LOOMUSDT,BUSDUSDT,BNTUSDT,DOTUSDT,SOLUSDT
datetime,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1,Unnamed: 7_level_1,Unnamed: 8_level_1,Unnamed: 9_level_1,Unnamed: 10_level_1,Unnamed: 11_level_1,Unnamed: 12_level_1,Unnamed: 13_level_1,Unnamed: 14_level_1,Unnamed: 15_level_1,Unnamed: 16_level_1,Unnamed: 17_level_1,Unnamed: 18_level_1,Unnamed: 19_level_1,Unnamed: 20_level_1,Unnamed: 21_level_1
2020-01-01 00:00:00,7230.71,130.18,0.19406,205.64,41.58,,13.8159,0.04515,,0.03308,...,,,,,,,,,,
2020-01-01 04:00:00,7205.50,130.52,0.19518,206.72,41.55,,13.7648,0.04493,,0.03320,...,,,,,,,,,,
2020-01-01 08:00:00,7195.80,130.84,0.19358,205.61,41.67,,13.7162,0.04509,,0.03321,...,,,,,,,,,,
2020-01-01 12:00:00,7233.02,131.84,0.19428,206.63,41.89,,13.7958,0.04542,,0.03357,...,,,,,,,,,,
2020-01-01 16:00:00,7223.72,131.98,0.19474,206.35,41.79,,13.7270,0.04547,,0.03361,...,,,,,,,,,,
...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...
2025-02-19 16:00:00,96235.29,2709.46,2.66600,321.30,135.19,0.5836,651.2500,0.33390,,0.75890,...,0.3119,0.1309,0.6902,0.02921,0.03003,0.04663,,0.577,4.878,168.79
2025-02-19 20:00:00,96650.01,2715.33,2.74310,322.90,135.10,0.5836,654.3400,0.34200,,0.77420,...,0.3118,0.1304,0.7158,0.02936,0.03005,0.04663,,0.577,4.851,169.00
2025-02-20 00:00:00,97040.40,2743.25,2.71250,325.30,133.33,0.6427,654.7100,0.34070,,0.78140,...,0.3225,0.1320,0.7221,0.02936,0.03005,0.04663,,0.577,4.905,172.56
2025-02-20 04:00:00,97016.34,2732.56,2.69230,323.70,129.49,0.6427,647.4300,0.33890,,0.78390,...,0.3181,0.1349,0.7188,0.02936,0.03377,0.04663,,0.577,4.965,172.07


# Define Functions For Picking Coins And Setting Holdings

In [7]:
from enum import Enum

class CorrSelectionMethod(Enum):
    MAX_CORR = 1
    MIN_CORR = 2

def get_newly_uncorrelated_keys(current_correlated_triples, price_indicator_window_df,
                                corr_matrix, correlation_rejection_threshold):
    newly_uncorrelated keys = set()
    for key_coin in current_correlated_triples:
        value_coin_1 = current_correlated_triples[key_coin].columns[0]
        value_coin_2 = current_correlated_triples[key_coin].columns[1]
        triple_list = [key_coin, value_coin_1, value_coin_2]
        
        # if any of the coins do not have data (e.g. if the coin no longer exists)
        if price_indicator_window_df[triple_list].iloc[-1].isnull().any():
            newly_uncorrelated_keys.add(key_coin)
        elif ((corr_matrix.loc[key_coin, value_coin_1] < correlation_rejection_threshold or \
               corr_matrix.loc[key_coin, value_coin_2] < correlation_rejection_threshold)):
            newly_uncorrelated_keys.add(key_coin)
    return newly_uncorrelated_keys


def remove_uncorrelated_keys(newly_uncorrelated_keys, coins_in_a_triple, current_correlated_triples):
    for coin in newly_uncorrelated_keys:
        coins_in_a_triple.remove(coin)
        coins_in_a_triple.remove(current_correlated_triples[coin].columns[0])
        coins_in_a_triple.remove(current_correlated_triples[coin].columns[1])
        
        if coin in current_correlated_triples: # only the dependent coin will appears as a key
            del current_correlated_triples[coin]


def has_high_corr_pair_available(coin, corr_matrix, coins_in_a_triple, price_indicator_window_df,
                                 correlation_acceptance_threshold):
    if np.isnan(price_indicator_window_df[coin]).any():
        return False
    num_available_coins = 0
    for other_coin, coin_corr in corr_matrix[coin][corr_matrix[coin] > correlation_acceptance_threshold].items():
        if (other_coin != coin and (other_coin not in coins_in_a_triple) and \
            (not np.isnan(price_indicator_window_df[other_coin]).any())):
            num_available_coins += 1
    return num_available_coins >= 2


def update_correlated_coins_to_max_corr(coin, corr_matrix, coins_in_a_triple, current_correlated_triples,
                                        price_indicator_window_df, correlation_acceptance_threshold):
    largest_corr = -1.0
    largest_corr_coin = None
    second_largest_corr = -1.0
    second_largest_corr_coin = None
    for other_coin, coin_corr in corr_matrix[coin][corr_matrix[coin] > correlation_acceptance_threshold].items():
        if np.isnan(price_indicator_window_df).any():
            continue
        elif other_coin in coins_in_a_triple:
            continue
        elif other_coin != coin and corr_matrix.loc[coin, other_coin] >= largest_corr:
            second_largest_corr, largest_corr = largest_corr, corr_matrix.loc[coin, other_coin]
            second_largest_corr_coin, largest_corr_coin = largest_corr_coin, other_coin
        elif other_coin != coin and corr_matrix.loc[coin, other_coin] >= second_largest_corr:
            second_largest_corr = corr_matrix.loc[coin, other_coin]
            second_largest_corr_coin = other_coin
            
    # assumed that at least two coins meeting the threshold have been found
    coins_in_a_triple.add(coin)
    coins_in_a_triple.add(largest_corr_coin)
    coins_in_a_triple.add(second_largest_corr_coin)
    current_correlated_triples[coin] = pd.DataFrame(
        0.0, columns=[largest_corr_coin, second_largest_corr_coin, 'alpha', 'residual'], index=[])


def update_correlated_coins_to_min_corr(coin, corr_matrix, coins_in_a_triple, current_correlated_triples,
                                        price_indicator_window_df, correlation_acceptance_threshold):
    smallest_corr = 1.0
    smallest_corr_coin = None
    second_smallest_corr = 1.0
    second_smallest_corr_coin = None
    for other_coin, other_coin_value in corr_matrix[coin][corr_matrix[coin] > correlation_acceptance_threshold].items():
        if np.isnan(price_indicator_window_df[other_coin]).any():
            continue
        elif other_coin in coins_in_a_triple:
            continue
        elif other_coin != coin and corr_matrix.loc[coin, other_coin] <= smallest_corr:
            second_smallest_corr, smallest_corr = smallest_cor, corr_matrix.loc[coin, other_coin]
            second_smallest_corr_coin, smallest_corr_coin = smallest_corr_coin, other_coin
        elif other_coin != coin and corr_Matrix.loc[coin, other_coin] <= second_smallest_corr:
            second_smallest_corr = corr_matrix.loc[coin, other_coin]
            second_smallest_corr_coin = other_coin
    
    # assumed that at least two coins meeting the threshold have been found
    coins_in_a_triple.add(coin)
    coins_in_a_triple.add(smallest_corr_coin)
    coins_in_a_triple.add(second_smallest_corr_coin)
    current_correlated_triples[coin] = pd.DataFrame(
        0.0, columns=[smallest_corr_coin, second_smallest_corr_coin, 'alpha', 'residual'], index=[])


def update_correlated_triples(current_correlated_triples, coins_in_a_triple, corr_matrix, 
                              price_indicator_window_df, correlation_acceptance_threshold,
                              corr_selection_method):
    for coin in list(price_indicator_window_df.columns):
        if (coin not in coins_in_a_triple) and \
            has_high_corr_pair_available(coin, corr_matrix, coins_in_a_triple, price_indicator_window_df,
                                         correlation_acceptance_threshold):
            
            if (corr_selection_method == CorrSelectionMethod.MAX_CORR):
                update_correlated_coins_to_max_corr(
                    coin,
                    corr_matrix,
                    coins_in_a_triple,
                    current_correlated_triples,
                    price_indicator_window_df,
                    correlation_acceptance_threshold)
            elif (corr_selection_method == CorrSelectionMethod.MIN_CORR):
                update_correlated_coins_to_min_corr(
                    coin,
                    corr_matrix,
                    coins_in_a_triple,
                    current_correlated_triples,
                    price_indicator_window_df,
                    correlation_acceptance_threshold)


def set_time_holdings(holdings_df, time_index, correlated_triples, price_indicator_window_df,
            current_correlated_triples, min_data_points_to_transact, look_back_window_size,
                      z_score_min_threshold, z_score_max_threshold):
    for coin in current_correlated_triples:
        independent_coin_1 = current_correlated_triples[coin].columns[0]
        independent_coin_2 = current_correlated_triples[coin].columns[1]
        
        model = sm.OLS(price_indicator_window)
        
        # FINISH THIS


def get_holdings_df(
    price_indicator_df,
    look_back_min_periods,
    look_back_window_size,
    correlation_acceptance_threshold,
    correlation_rejection_threshold,
    min_data_points_to_transact,
    z_score_min_threshold,
    z_score_max_threshold,
    corr_selection_method):
    
    holdings_df = pd.DataFrame(0.0, columns=price_indicator_df.columns, index=price_indicator_df.index)
    
    current_correlated_triples = dict()
    coins_in_a_triple = set()
    
    for index in range(look_back_min_periods, len(holdings_df)):
        window_start = max(index-look_back_window_size, 0)
        price_indicator_window_df = price_indicator_df.iloc[window_start:index]
        
        time_index = price_indicator_df.index[index]
        holdings_df.loc[time_index] = pd.Series(0.0, index=holdings_df.columns)
        
        corr_matrix = price_indicator_window_df.corr()
        
        newly_uncorrelated_keys = get_newly_uncorrelated_keys(
            current_correlated_triples,
            price_indicator_window_df,
            corr_matrix,
            correlation_rejection_threshold)
        
        # update coins_in_a_triple and current_correlated_triples in-place
        remove_uncorrelated_keys(newly_uncorrelated_keys, coins_in_a_triple, current_correlated_triples)

        update_correlated_triples(
            current_correlated_triples,
            coins_in_a_triple,
            corr_matrix,
            price_indicator_window_df,
            correlation_acceptance_threshold,
            corr_selection_method)
        
        set_time_holdings(
            holdings_df,
            time_index,
            correlated_triples,
            price_indicator_window_df,
            current_correlated_triples,
            min_data_points_to_transact, look_back_window_size, z_score_min_threshold, z_score_max_threshold)

    return holdings_df

In [16]:
corr_matrix_test = px_close.corr()
for index, item in corr_matrix_test['BTCUSDT'][corr_matrix_test['BTCUSDT'] > 0.5].items():
    print(index, item)

BTCUSDT 1.0
ETHUSDT 0.8294883188952125
XRPUSDT 0.7632332458680621
BCHUSDT 0.5456506034003175
BNBUSDT 0.8823358525853346
XLMUSDT 0.5698700183232913
TRXUSDT 0.5577731685529617
XTZUSDT 0.5543663664003232
LINKUSDT 0.8270432112368103
MKRUSDT 0.5194726763413628
DOGEUSDT 0.7374378878866337
OMGUSDT 0.7365392738480138
THETAUSDT 0.7179050786466018
WAVESUSDT 0.8676245063066951
RLCUSDT 0.5550715040060664
FETUSDT 0.7832287221158973
SOLUSDT 0.8525310074921438
