- By: Alex Kwon
- Email: alex.kwon [at] hudsonthames [dot] org

# Statistical Arbitrage

## Abstract

Statistical Arbitrage exploits the pricing inefficiency between two groups of assets. First developed and used in the mid-1980s by Nunzio Tartaglia’s quantitative group at Morgan Stanley, the classical strategy utilizes systematic trading signals and a market-neutral approach to generate positive returns.

The strategy can be explained in a two-step process. First, two baskets of assets that have historically moved similarly are identified. Then, the spread between the two is carefully measured to look for signals of divergence. If the spread becomes wider than the value suggested by historical data, the trader longs the losing basket and shorts the winning one. As the spread reverts back to the mean, the positions will gain in value.

## Introduction

Most strategies involving statistical arbitrage can be expressed with the following equation:

$\frac{dP_t}{P_t} = \alpha dt + \beta \frac{dQ_t}{Q_t} + dX_t$

- $P_t$: Price of the first group of assets.

- $Q_t$: Price of the second group of assets.

- $\alpha$: Drift term. For the most parts, we will assume that this value is 0.

- $\beta$: Regression coefficient between the change in returns.

- $X_t$: Cointegration residual.

This can be interpreted as going long 1 unit of $P_t$ and short $\beta$ unit of $Q_t$ if $X_t$ is a significant positive value and vice versa for a significant negative value of $X_t$. Here we assume that $X_t$ is a stationary process with mean-reverting tendencies. $X_t$ will be described much more in detail in the section that describes the Ornstein-Uhlenbeck process.

We can, therefore, interpret statistical arbitrage as a contrarian strategy to harness the mean-reverting behavior of the pair ratio to exploit the mispricing of the assets.

## Pairs Trading

Pairs trading strategy is a specific statistical arbitrage strategy that focuses on two assets. Instead of trading on a basket of assets, pairs trading focuses on two to harness the pricing inefficiency caused by the widening spread. Pairs trading strategies can be implemented in three parts.

1. Filter the universe to select a number of pairs. These pairs are two related securities, which are oftentimes in the same sector/industry and have similar fundamental values.

2. Calculate the spread between the two pairs and test for stationarity and cointegration.

3. If all the tests are satisfied, generate trading signals to long the asset that is underpriced and short the other.

## Filtering

There are multiple ways to filter the initial data. For a pairs trading example, the number of pairs grows quadratically with $n$. The number of total pairs is:

$\frac{n(n-1)}{2}$

If we only have $10$ assets that we want to test for, the total number of pairs is $\frac{9*10}{2}=45$. However, once we start scanning for a universe of stocks with over $5000$ options, the numbers quickly add up. Therefore, it is important to have an effective method to test before we start the initial process. The most commonly used filtering method is the cointegration test. Using the cointegration test, we can see which pairs of assets pass the threshold to reject the null hypothesis. More information on Cointegration is available two headings below.

1. Principal Component Analysis
2. Clustering
    - Fundamental values
    - Sector/Industry
    - K-means
3. Heuristics
4. Distance/Correlation Matrix

## Stationarity

A time series is defined to be stationary if its joint probability distribution is invariant under translations in time or space. In other words, the mean and variance of the time series do not change.

It is important to test for the spread for stationarity as statistical arbitrage typically shows the strongest and most robust results that follow stationarity and cointegration for the tested pairs.

### Augmented Dickey-Fuller Test

Augmented Dickey-Fuller or the ADF tests the null hypothesis that a unit root is present
in a time series sample. If the time series does have a mean-reverting trend, then the next
price will be proportional to the current. The original Dickey-Fuller test only tested for
lag 1, whereas the augmented version can test for lag up to $p$.

$\Delta y_t = \alpha + \beta t + \gamma y_{t-1} + \delta_1 \Delta y_{t-1} + \cdots + \delta_{p-1} \Delta y_{t-p+1} + \epsilon_t$

- $\alpha$: constant variable
- $\beta$: coefficient of temporal trend
- $\delta$: change of $y$

For the purpose of this module, we will empirically set $p$ to be $1$.

Another important variable to consider is the presence of a trend within the spread. The most
ideal scenario for a statistical arbitrage strategy is one that does not have a trend within
the process. This, however, does not always hold true. An example of a trend stationary is shown
below:

It is possible to detrend the trend_stationarity and the user can easily do so by setting the regression to
be $ct$ instead of just $c$, which is a constant residual.

### Kwiatkowski-Phillips-Schmidt-Shin

### Phillips-Perron Test

### Phillips-Ouliaris Test

## Cointegration

### Engle-Granger Test

### Johansen Test

## Regression

### Pairs Trading

There are currently two tools available for a pairs trading strategy. One calculates a rolling
z-score and regression over the given data, and the other calculates the z-score and regression
over the entire data. The first method removes data snooping bias and allows the user to backtest
and trade with the available information on that day; however, if the user wants to test for the
entire horizon, they can use the second method to calculate the scores for the entire time frame.

### Rolling Regression

### All Regression

## Trading Rules

### Kalman Filtering

### Ornstein-Uhlenbeck Process

The Ornstein-Uhlenbeck process is a stochastic mean-reverting process with the following equation:

$dX_t = \kappa(\mu − X_t)dt + \sigma dW_t$

- $X_t$: Residual from the spread.
- $\kappa$: Rate of mean reversion.
- $\mu$: Mean of the process.
- $\sigma$: Variance or volatility of the process.
- $W_t$: Wiener process or Brownian motion.

This can be changed into an $AR(1)$ model with the following properties:

$X_{n+1} = a + b X_n + \zeta_{n+1}$

- $b = e^{-\kappa \Delta_t}$
- $a = \mu(1 - b)$
- $var(\zeta) = \sigma^2 \frac{1 - b^2}{2 \kappa}$

We will primarily use the OU-process to generate trading signals for statistical arbitrage.
The trading signals will be defined as:

$s = X_t - \frac{E(X_t)}{var(X_t)} = \frac{\mu\sqrt{2\kappa}}{\sigma}$

### Hurst Exponent

### Optimal Trading Rules

### Optimal Portfolio Allocation