# Implementation of Partner Selection Procedures

## Abstract

In this paper[1], Stubinger, Mangold and Krauss  developed  a  multivariate  statistical  arbitrage  strategy  based  on  vine  copulas  -  a  highly flexible instrument for linear and nonlinear multivariate dependence modeling. Pairs trading is a relative-value arbitrage strategy, where an investor seeks to profit from mean-reversion properties of the price spread between two co-moving securities. Existing literature focused on using  bivariate  copulas  to  model the dependence structure between two stock return time series, and to identify mispricings that can potentially be exploited in a pairs trading application. 

This paper proposes a multivariate copula-based statistical arbitrage framework, where specifically, for each stock in the S&P 500 data base, the three most suitable partners are selected by leveraging different selection criteria. Then, the multivariate copula models are benchmarked to capture the dependence structure of the selected quadruples. Later on, the paper focusses on the generation of trading signals and backtesting.


## Introduction

This notebook will focus on the various Partner Selection procedures and their implementations, as described in the paper. For every stock in the S&P 500, a partner triple is identified based on adequate measures of association. The following four partner selection approaches are implemented :
- Traditional Approach - baseline approach where the high dimensional relation between the four stocks is approximated by their pairwise bivariate correlations via Spearman’s $\rho$;
- Extended Approach - calculating the multivariate version of Spearman’s $\rho$ based on Schmid and Schmidt (2007)[2];
- Geometric Approach - involves calculating the sum of euclidean distances from the 4-dimensional hyper-diagonal;
- Extremal Approach - involves calculating a non-parametric $\chi^2$ test statistic based on Mangold (2015)[3] to measure the degree of deviation from independence.

Firstly, all measures of association are calculated using the ranks of the daily discrete returns of our samples. Ranked transformation provides robustness against outliers. Secondly, only the top 50 most highly correlated stocks are taken into consideration for each target stock, to reduce the computational burden.

---

# Preprocessing

In [None]:
%load_ext autoreload
%autoreload 2
%reload_ext autoreload

In [None]:
from ps.partner_selection import PartnerSelection
from ps.ps_utils import get_sector_data
import pandas as pd

import warnings
warnings.filterwarnings('ignore')

# Loading the data set

Dataset contains daily pricing data for all stocks in S&P 500. Data from the year 2016 is taken into consideration for this notebook.

When a PartnerSelection object is created, daily returns of the stocks and their corresponding ranked returns are calculated and stored as attributes. 

In [None]:
df = pd.read_csv('./data/data/data.csv', parse_dates=True, index_col='Date').dropna()
df = df['2017-01-01':'2017-12-31'] #Taking 12 month data as mentioned in the paper
ps = PartnerSelection(df)

constituents = pd.read_csv('./data/data/constituents-detailed.csv', index_col='Symbol')

In [None]:
print(ps.top_50_correlations)

# Step 1 : Traditional Approach

- Calculate the sum of all pairwise correlations for all possible quadruples, consisting of a fixed target stock.
- Quadruple with the largest sum of pairwise correlations is considered as the final quadruple and saved to the output matrix.

In [None]:
Q = ps.traditional_multiprocess(20, num_threads=16)
print(Q)
ps.plot_selected_pairs(Q)

In [None]:
for quadruple in Q:
    display(get_sector_data(quadruple,constituents))

In [None]:
#Plotting measures of all quadruples for a given target
ps.plot_all_target_measures(target='A', procedure='traditional')

# Step 2 : Extended Approach

- Calculate the multivariate version of Spearman’s $\rho$ for all possible quadruples, consisting of a fixed target stock.
- Quadruple with the largest value is considered as the final quadruple and saved to the output matrix.

In [None]:
Q = ps.extended_multiprocess(20, num_threads=16)
print(Q)
ps.plot_selected_pairs(Q)

In [None]:
for quadruple in Q:
    display(get_sector_data(quadruple,constituents))

In [None]:
#Plotting measures of all quadruples for a given target
ps.plot_all_target_measures(target='A', procedure='extended')

# Step 3 : Geometric Approach

- Calculate the four dimensional diagonal measure for all possible quadruples, consisting of a fixed target stock.
- Quadruple with the smallest diagonal measure is considered as the final quadruple and saved to the output matrix.

In [None]:
Q = ps.geometric_multiprocess(20)
print(Q)
ps.plot_selected_pairs(Q)

In [None]:
for quadruple in Q:
    display(get_sector_data(quadruple,constituents))

In [None]:
#Plotting measures of all quadruples for a given target
ps.plot_all_target_measures(target='A', procedure='geometric')

# Step 4 : Extremal Approach

- Calculate the $\chi^2$ test statistic for all possible quadruples, consisting of a fixed target stock.
- Quadruple with the largest test statistic is considered as the final quadruple and saved to the output matrix.

In [None]:
Q = ps.extremal_multiprocess(20, 16)
print(Q)
ps.plot_selected_pairs(Q)

In [None]:
for quadruple in Q:
    display(get_sector_data(quadruple,constituents))

In [None]:
#Plotting measures of all quadruples for a given target
ps.plot_all_target_measures(target='A', procedure='extremal')
#Plotting the correlation matrix heatmap of all stocks on S&P 500
ps.plot_correlation()

In [None]:
# Export quadruples to list for more fun
cols = ['col'+str(x) for x in range(len(Q[0]))]
export_quadruples = pd.DataFrame(Q, columns=cols)
export_quadruples.to_csv('./quadruples.csv', index=False)


# Conclusion

This notebook describes the proposed Partner Selection Framework, also showcasing example usage of the implemented framework.

The first three procedures seem to generate the same final set of quadruples for every target stock in the universe. Another important takeaway is the Industry/Sub-Sector of the stocks in most of the final quadruples are highly correlated, even though clustering methods were not used in this framework.

Some Interesting Observations:
- ABC(AmerisourceBergen Corp) which is a Health Care Distibutor seems to have highly correlated partners in Financial Services. This observation holds true for all four approaches.
- For ABT(Abbott Laboratories), a Health Care Equipment Manufacturer, Extremal Approach returned three partners which are in Financial Services. This behaviour is against the results obtained from the other three approaches.
- According to all four approaches, ADM(Archer-Daniels-Midland Co) an Agricultural Products business, seems to have highly correlated partners in Asset Management.  


# References
[1]. Stübinger, Johannes; Mangold, Benedikt; Krauss, Christopher. Statistical Arbitrage with Vine Copulas. Available at: https://www.econstor.eu/bitstream/10419/147450/1/870932616.pdf

[2]. Schmid, F., Schmidt, R., 2007. Multivariate extensions of Spearman’s rho and related statis-tics. Statistics & Probability Letters 77 (4), 407–416.

[3]. Mangold, B., 2015. A multivariate linear rank test of independence based on a multipara-metric copula with cubic sections. IWQW Discussion Paper Series, University of Erlangen-N ̈urnberg. Available at: https://www.statistik.rw.fau.de/files/2016/03/IWQW-10-2015.pdf
