Tobias Kuhlmann, Karlsruhe Institute of Technology (KIT), tobias.kuhlmann@student.kit.edu

In [51]:
%matplotlib inline

import numpy as np
import pylab as pl
import pandas as pd
import matplotlib.pyplot as plt

import datetime as dt


## Option implied betas following Buss and Vilkov (2012)
Implements calculation of option implied correlations following Buss, A. and Vilkov, G., 2012. Measuring equity risk with option-implied correlations. The Review of Financial Studies, 25(10), pp.3113-3140.


## Idea

Buss and Vilkov use forward-looking information from option prices to estimate option-implied correlations and to construct an option-implied predictor of factor betas. All of that under risk neutral probability measure.

##### General comments: 
- Only using historical information implies that the future is sufficiently similar to the past, which is questionable in financial markets -> Need for option implied information as they are forward-looking and traded, which means they include current market expectations. Use of option implied information may improve predictive quality
- Variance risk premium: Overinsurance of volatility risk -> Implied volatility is higher in magnitude than realized volatility
 - Intuition: Investors are also willing to pay a risk premium to hedge against changes in variance (rising variance)
- Correlation risk premium: Premium for uncertainty of future correlation. Need to insure future correlation, as a rise in correlation reduces diversification benefits. Underlying uncertainty: Correlations go crazy in crises -> Not much diversification in large market downturns and systemic crises, except with short positions. Result is that option-implied correlations are higher than realized.
 - Intuition: Investors are also willing to pay a risk premium to hedge against changes in correlations
 
##### Risk neutral vs realized repetition
- Option implied measures are under risk neutral probability measure, which differs from realized / objective probability measure because of variance and correlation risk premium. Risk premia makes risk neutral moments biased predictors of realized moments. 
- Option implied correlations not directly observable, needs modeling choice -> parametric with assumptions

##### Technical conditions of correlation matrix
1. Propose a parametric way to estimate implied correlations not exceeding one
2. correlation risk premium is negative (so consistent with the literature)
3. correlation risk premium is higher in magnitude for low or negatively correlated stocks that are exposed to a higher risk of losing diversification benefits (so consistent with literature)
4. Correlation matrix should be consistent with empirical observations
    - Implied correlation is higher than realized
    - Correlation risk premium is larger in magnitude for pairs of stocks that provide higher diversification benefits (negative and low correlation)

## Theoretical implementation steps

##### Goal
- Implied correlation matrix $\Gamma^{Q}_{t}$ with elements $\rho^{Q}_{ij,t}$
- Implied betas $\beta^{Q}_{iM,t}$ calculated from implied correlations

##### Implied variance of market index
Observed implied variance of market index 
$$(\sigma^{Q}_{M,t})^2 = \sum_{i=1}^{N}\sum_{j=1}^{N}{w_iw_j\sigma^{Q}_{i,t}\sigma^{Q}_{j,t}\rho^{Q}_{ij,t}}$$
 
where $\sigma^{Q}_{i,t}$ is implied volatility of stock i in the index and $w_i$ are index weights
 
##### Parametric form of implied correlations $\rho^{Q}_{ij,t}$
$$\rho^{Q}_{ij,t} = \rho^{P}_{ij,t} - \alpha_t(1-\rho^{P}_{ij,t})$$
 
where $\rho^{P}_{ij,t}$ is expected correlation under objective measure (realized), $\alpha_t$ needs to be identified

##### Identify $\alpha_t$ in closed form
$$\alpha_t = \frac{(\sigma^{Q}_{M,t})^2-\sum_{i=1}^{N}\sum_{j=1}^{N}{w_iw_j\sigma^{Q}_{i,t}\sigma^{Q}_{j,t}\rho^{P}_{ij,t}}}{\sum_{i=1}^{N}\sum_{j=1}^{N}{w_iw_j\sigma^{Q}_{i,t}\sigma^{Q}_{j,t}(1-\rho^{P}_{ij,t})}}$$

To satisfy above conditions: $-1\leq\alpha_t\leq0$

##### Calculate market betas $\beta_{ik,t}$ with option implied correlation
$$\beta^{Q}_{iM,t} = \frac{\sigma^{Q}_{i,t}\sum_{j=1}^{N}{w_j\sigma^{Q}_{j,t}\rho_{ij,t}}}{(\sigma^{Q}_{M,t})^2}$$

For a multifactor model with multiple non-correlated factors, the beta $\beta_{ik,t}$, where $i$ is the stock and $k$ the factor, can be calulated as the ratio of stock-to-factor covariance $\sigma_{ik,t}$ to the factor variance $\sigma^{2}_{k,t}$. $\rho_{ij,t}$ is the pairwise stock correlation, $\sigma_{j,t}$ stock volatilities, and $w_j$ the factor-mimicking portfolio weights.
 
 
 

## Data

##### Data needed
- Return time series to calculate stock-stock correlations $\rho^{P}_{ij,t}$
- Stock weights in index over time (time-varying or time-invariant?)
- Implied volatilities of every single stock i $\sigma^{Q}_{i,t}$
- Implied volatilities of Index $\sigma^{Q}_{M,t}$
- Use stock-stock correlations, implied volatilities of single stocks and index implied volatility to calculate $\alpha_t$


##### Description of Vilkov: http://www.vilkov.net/codedata.html
- :market_betas_1996_2009.mat contains the betas themselves (6 different beta methodologies, time vector dt, and IDs vector permno) in a structure betas. 
All betas are aligned to the same timeline in time vector dt. 
In the paper we used the betas in fields impl_daily_251d_mfiv, impl_monthly_60m_mfiv for implied and hist_daily_251d, and hist_monthly_60m for historical.  
- :id_dt.mat contains the time vector dt, and the vectors of IDs (PERMNO from CRSP), the first PERMNO = 999999 is market itself (SP500).
- :weights.mat contains the synthetic weights w of stocks in the SP500 index (first column is NaN, because it is SP500 itself).
- :dailyret.mat contains daily returns (ret and retx for ex div returns) for SP500 and its components.
- :mnthly_ret.mat contains monthly returns retm for SP500 and its components, and the time vector for these returns in dtm. 

The same data as above saves in CSV. Lots of files for betas; the tables are arranged in time x cross section, where dates (time points) are located in files with '_dt...', and identifiers are in files with '_permno'. The ZIP for Download (just click it and Save As..) - let me know if there are any questions!
Id: the vectors of IDs (PERMNO from CRSP), the first PERMNO = 999999 is market itself (SP500)

In [36]:
# Validate relative file path and list files
import os
os.listdir("../Vilkov_data/RFS_Data_implied_betasCSV_1996_2009/")

['mnthly_ret_hsicmg.csv',
 'dailyret_retx.csv',
 'mnthly_ret_dtmadd.csv',
 'mnthly_ret_retm.csv',
 'id_dt_dtadd.csv',
 'market_betas_1996_2009_dt.csv',
 'market_betas_1996_2009_impl_monthly_60m_mfiv.csv',
 'mnthly_ret_dtmMatlabFormat.csv',
 'market_betas_1996_2009_hist_monthly_60m.csv',
 'id_dt_dtMatlabFormat.csv',
 'weights_w.csv',
 'id_dt_permno.csv',
 'market_betas_1996_2009_hist_daily_251d.csv',
 'market_betas_1996_2009_impl_daily_251d_mfiv.csv',
 'dailyret_cfacshr.csv',
 'dailyret_ret.csv',
 'market_betas_1996_2009_permno.csv',
 'market_betas_1996_2009_impl_daily_251d_midiv.csv',
 'market_betas_1996_2009_impl_monthly_60m_midiv.csv']

##### Transform Vilkov Matlab datetime number to python datetime

In [181]:
# time
filename = "../Vilkov_data/RFS_Data_implied_betasCSV_1996_2009/id_dt_dtMatlabFormat.csv"
dates = pd.read_csv(filename, header=None).astype(int)
dates.rename(columns={0:'date'}, inplace=True)

def matlab_datenum_to_python(matlab_datenum):
    """
    Translates matlab datenum to python datetime format
    """
    return dt.datetime.fromordinal(int(matlab_datenum) - 366) + dt.timedelta(days=matlab_datenum%1)

# test function
#matlab_datenum = 728660
#matlab_datenum_to_python(matlab_datenum=matlab_datenum)

# apply matlab datetime conversion to dates dataframe column
dates['date'] = dates.date.apply(matlab_datenum_to_python)
dates.shape

(3777, 1)

##### Read in Stock Ids
Id: the vectors of IDs (PERMNO from CRSP), the first PERMNO = 999999 is market itself (SP500)

In [182]:
# stock ids
filename = "../Vilkov_data/RFS_Data_implied_betasCSV_1996_2009/id_dt_permno.csv"
ids = pd.read_csv(filename, header=None).astype(int)
ids.shape

(950, 1)

##### Import stock weights in SP500 index

In [106]:
# import categorical/numerical data (.csv) with Pandas DataFrame
filename = "../Vilkov_data/RFS_Data_implied_betasCSV_1996_2009/weights_w.csv"
weights = pd.read_csv(filename)
print(weights.shape)
weights.head()

(3776, 950)


Unnamed: 0,NaN,0.0010306,0.0037365,0.010638,NaN.1,NaN.2,NaN.3,0.002927,NaN.4,0.0022975,...,NaN.441,NaN.442,NaN.443,NaN.444,NaN.445,NaN.446,NaN.447,0.0022601,NaN.448,NaN.449
0,,0.000995,0.003757,0.01068,,,,0.002907,,0.002328,...,,,,,,,,0.002227,,
1,,0.000988,0.003684,0.010513,,,,0.002942,,0.002323,...,,,,,,,,0.002235,,
2,,0.001009,0.003723,0.010678,,,,0.002949,,0.002274,...,,,,,,,,0.002233,,
3,,0.001022,0.003831,0.01061,,,,0.002927,,0.002259,...,,,,,,,,0.002343,,
4,,0.001014,0.003834,0.010721,,,,0.002975,,0.002262,...,,,,,,,,0.002384,,


In [154]:
filename = "../Vilkov_data/RFS_Data_implied_betasCSV_1996_2009/market_betas_1996_2009_dt.csv"
data = pd.read_csv(filename, header=None)
print(data.shape)
data.head()

(3777, 1)


Unnamed: 0,0
0,728660.0
1,728660.0
2,728660.0
3,728670.0
4,728670.0


## Implementation

##### Steps
- Return time series to calculate stock-stock correlations $\rho^{P}_{ij,t}$
- Stock weights $w_i$ in index over time (time-varying or time-invariant?)
- Implied volatilities of every single stock i $\sigma^{Q}_{i,t}$
- Implied volatilities of Index $\sigma^{Q}_{M,t}$
- Use stock-stock correlations, implied volatilities of single stocks and index implied volatility to calculate $\alpha_t$