# Risk Estimators with MlFinLab

From asset allocation to investment strategies, risk has always played a very large role in the world of finance. The performance of a large number of investment strategies are dependent on the efficient estimation of underlying portfolio risk, in which the most common way of representing levels of risk is through a covariance matrix. With these representations of risk playing a vital role in investment management, an accurate calculation of our covariance matrix is essential for an accurate representation of risk.

The RiskEstimators class from MlFinLab provides several implementations for different ways to calculate and adjust covariance matrices. Throughout this blog post, we will look at a quick description of each algorithm as well as see how we can implement through the MlFinLab library.

More specifically, the RiskEstimators class covers seven different algorithms relating to covariance matrices. These algorithms include: 
- Minimum Covariance Determinant
- Empirical Covariance
- Covariance Estimator with Shrinkage
- Semi-Covariance Matrix
- Exponentially-Weighted Covariance matrix
- De-Noising Covariance Matrix
- Covariance and Correlation Matrix Transformations

Please note that the descriptions of these algorithms are all based upon the descriptions from the [scikit-learn User Guide on Covariance Estimation](https://scikit-learn.org/stable/modules/covariance.html#robust-covariance). 


## Minimum Covariance Determinant
The Minimum Covariance Determinant (MCD) is a robust estimator of covariance that was introduced by P.J. Rousseeuw. From the scikit-learn User Guide on Covariance Estimation, "the basic idea of the algorithm is to find a set of observations that are not outliers and compute their empirical covariance matrix, which is then rescaled to compensate for the performed selection of observations". The MlFinLab implementation is a wrap around sklearn's MinCovDet class, which uses FastMCD algorithm, developed by Rousseeuw and Van Driessen.

A detailed description of the algorithm is available in the paper by _Mia Hubert_ and _Michiel Debruyne_ __Minimum covariance determinant__ [available here](https://wis.kuleuven.be/stat/robust/papers/2010/wire-mcd.pdf)

In [1]:
# importing our required libraries
import numpy as np
import pandas as pd
import matplotlib.pyplot as plt
import mlfinlab as ml

In [2]:
# reading in our data
stock_prices = pd.read_csv('stock_prices.csv', parse_dates=True, index_col='Date')
stock_prices = stock_prices.dropna(axis=1)
stock_prices.head()

Unnamed: 0_level_0,EEM,EWG,TIP,EWJ,EFA,IEF,EWQ,EWU,XLB,XLE,...,XLU,EPP,FXI,VGK,VPL,SPY,TLT,BND,CSJ,DIA
Date,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1,Unnamed: 7_level_1,Unnamed: 8_level_1,Unnamed: 9_level_1,Unnamed: 10_level_1,Unnamed: 11_level_1,Unnamed: 12_level_1,Unnamed: 13_level_1,Unnamed: 14_level_1,Unnamed: 15_level_1,Unnamed: 16_level_1,Unnamed: 17_level_1,Unnamed: 18_level_1,Unnamed: 19_level_1,Unnamed: 20_level_1,Unnamed: 21_level_1
2008-01-02,49.273335,35.389999,106.639999,52.919998,78.220001,87.629997,37.939999,47.759998,41.299999,79.5,...,42.09,51.173328,55.98333,74.529999,67.309998,144.929993,94.379997,77.360001,101.400002,130.630005
2008-01-03,49.716667,35.290001,107.0,53.119999,78.349998,87.809998,37.919998,48.060001,42.049999,80.440002,...,42.029999,51.293331,55.599998,74.800003,67.5,144.860001,94.25,77.459999,101.519997,130.740005
2008-01-04,48.223331,34.599998,106.970001,51.759998,76.57,88.040001,36.990002,46.919998,40.779999,77.5,...,42.349998,49.849998,54.536671,72.980003,65.769997,141.309998,94.269997,77.550003,101.650002,128.169998
2008-01-07,48.576668,34.630001,106.949997,51.439999,76.650002,88.199997,37.259998,47.060001,40.220001,77.199997,...,43.23,50.416672,56.116669,72.949997,65.650002,141.190002,94.68,77.57,101.720001,128.059998
2008-01-08,48.200001,34.389999,107.029999,51.32,76.220001,88.389999,36.970001,46.400002,39.599998,75.849998,...,43.240002,49.566669,55.326672,72.400002,65.360001,138.910004,94.57,77.650002,101.739998,125.849998


In our examples, we will leave only 5 assets in our dataset so the differences between the calculated covariance matrices are easy to see. 

In [3]:
stock_prices = stock_prices.iloc[:, :5]
stock_prices.head()

Unnamed: 0_level_0,EEM,EWG,TIP,EWJ,EFA
Date,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1
2008-01-02,49.273335,35.389999,106.639999,52.919998,78.220001
2008-01-03,49.716667,35.290001,107.0,53.119999,78.349998
2008-01-04,48.223331,34.599998,106.970001,51.759998,76.57
2008-01-07,48.576668,34.630001,106.949997,51.439999,76.650002
2008-01-08,48.200001,34.389999,107.029999,51.32,76.220001


Now that we have our data and libraries loaded in, we can begin constructing our Minimum Covariance Determinant estimation. We can access this method through the MlFinLab library as the minimum_covariance_determinant() method. 

First, we can construct a simple covariance matrix for comparison.

In [4]:
# A class with function to calculate returns from prices
returns_estimation = ml.portfolio_optimization.ReturnsEstimators()

# Calcualting the data set of returns
stock_returns = returns_estimation.calculate_returns(stock_prices)

# Finding the simple covariance matrix from a series of returns
cov_matrix = stock_returns.cov()

In [5]:
print("The Simple Covariance is:")
cov_matrix

The Simple Covariance is:


Unnamed: 0,EEM,EWG,TIP,EWJ,EFA
EEM,0.000466,0.00035,-1.7e-05,0.000255,0.000324
EWG,0.00035,0.000372,-1.5e-05,0.000221,0.000303
TIP,-1.7e-05,-1.5e-05,1.9e-05,-9e-06,-1.2e-05
EWJ,0.000255,0.000221,-9e-06,0.000232,0.000218
EFA,0.000324,0.000303,-1.2e-05,0.000218,0.000278


We can now construct our Minimum Covariance Determinant estimation.

In [6]:
# A class that has the Minimum Covariance Determinant estimator
risk_estimators = ml.portfolio_optimization.RiskEstimators()

# Finding the Minimum Covariance Determinant estimator on price data and with set random seed to 0
min_cov_det = risk_estimators.minimum_covariance_determinant(stock_prices, price_data=True, random_state=0)

# Transforming our estimation from a np.array to pd.DataFrame
min_cov_det = pd.DataFrame(min_cov_det, index=cov_matrix.index, columns=cov_matrix.columns)

In [7]:
print('The Minimum Covariance Determinant estimator is:')
min_cov_det

The Minimum Covariance Determinant estimator is:


Unnamed: 0,EEM,EWG,TIP,EWJ,EFA
EEM,0.000146,0.000112,-5e-06,7.6e-05,0.000102
EWG,0.000112,0.000154,-7e-06,7.6e-05,0.000114
TIP,-5e-06,-7e-06,1.1e-05,-3e-06,-5e-06
EWJ,7.6e-05,7.6e-05,-3e-06,9.8e-05,7.7e-05
EFA,0.000102,0.000114,-5e-06,7.7e-05,0.0001


From the results, the absolute values in the Minimum Covariance Determinant estimator are lower in comparison to the simple Covariance matrix, which means that the algorithm has eliminated some of the outliers in the data and the resulting covariance matrix estimator is a more robust one.

## Maximum Likelihood Covariance Estimator (Empirical Covariance)

Maximum Likelihood Estimator of a sample is an unbiased estimator of the corresponding population’s covariance matrix. This estimation works well when the number of observations is big enough in relation to the number of features.

We can implement this algorithm through the empirical_covariance method() in the MlFinLab library.

In [8]:
# Finding the Empirical Covariance on price data
empirical_cov = risk_estimators.empirical_covariance(stock_prices, price_data=True)

# Transforming Empirical Covariance from np.array to pd.DataFrame
empirical_cov = pd.DataFrame(empirical_cov, index=cov_matrix.index, columns=cov_matrix.columns)

print('The Empirical Covariance is:')
empirical_cov

The Empirical Covariance is:


Unnamed: 0,EEM,EWG,TIP,EWJ,EFA
EEM,0.000466,0.00035,-1.7e-05,0.000255,0.000324
EWG,0.00035,0.000372,-1.5e-05,0.000221,0.000303
TIP,-1.7e-05,-1.5e-05,1.9e-05,-9e-06,-1.2e-05
EWJ,0.000255,0.000221,-9e-06,0.000232,0.000218
EFA,0.000324,0.000303,-1.2e-05,0.000218,0.000278


In [9]:
print('The Simple Covariance is:')
cov_matrix

The Simple Covariance is:


Unnamed: 0,EEM,EWG,TIP,EWJ,EFA
EEM,0.000466,0.00035,-1.7e-05,0.000255,0.000324
EWG,0.00035,0.000372,-1.5e-05,0.000221,0.000303
TIP,-1.7e-05,-1.5e-05,1.9e-05,-9e-06,-1.2e-05
EWJ,0.000255,0.000221,-9e-06,0.000232,0.000218
EFA,0.000324,0.000303,-1.2e-05,0.000218,0.000278


By comparing the two results, we can see that the Empirical Covariance is the same as the standard covariance function from the pandas package.

## Covariance Estimator with Shrinkage
According to the __scikit-learn User Guide on Covariance estimation__:

_"The Maximum Likelihood Estimator is not a good estimator of the eigenvalues of the covariance matrix and the inverted matrix is not accurate. Sometimes, ... it cannot be inverted for numerical reasons"._

_"To avoid problems with inversion, a transformation of the empirical covariance matrix has been introduced: the shrinkage"._

_"Mathematically, this shrinkage consists in reducing the ratio between the smallest and the largest eigenvalues of the empirical covariance matrix"._

There are three different types of shrinkage which we will be looking at in this article. These methods include: Basic Shrinkage, Ledoit-Wolf Shrinkage, and Oracle Approximating Shrinkage. 

### Basic Shrinkage
Essentially, the Basic Shrinkage method makes use of the following convex transformation:
$$\sum_{shrunk} = (1 - \alpha)\sum_{unshrunk} + \alpha\frac{Tr \sum_{unshrunk}}{p}Id$$

Where $\alpha$ represents our trade-off between bias and variance. The definition given by the __scikit-learn User Guide on Covariance estimation__ gives a clear understanding of what this method aims to accomplish:

_"This shrinkage is done by shifting every eigenvalue according to a given offset, which is equivalent to finding the l2-penalized Maximum Likelihood Estimator of the covariance matrix"._

In the MlFinLab implementation, $\alpha$ is passed to a function as the $basic\_shrinkage$ parameter.

### Ledoit-Wolf Shrinkage
The Ledoit-Wolf Shrinkage method differs from the Basic Shrinkage method as it aims to compute the optimal $\alpha$ value to minimize the Mean Squared Error between the estimated and the real covariance matrix.

The algorithm is described in more detail in the paper by _Olivier Ledoit_ and _Michael Wolf_ __A well-conditioned estimator forlarge-dimensional covariance matrices__ [available here](http://perso.ens-lyon.fr/patrick.flandrin/LedoitWolf_JMA2004.pdf)

### Oracle Approximating Shrinkage
The Oracle Approximating Shrinkage method works under the assumption that the data we are using falls under a Gaussian distribution. In 2010, _Chen et al._ derived a formula which chooses the shrinkage coefficient $\alpha$ which yields a smaller Mean Squared Error than the one found in the Ledoit-Wolf method. 

The algorithm is described in more detail in the paper by _Y. Chen_, _A. Wiesel_, _Y.C. Eldar_ and _A.O. Hero_ __Shrinkage Algorithms for MMSE Covariance Estimation__ [available here](https://webee.technion.ac.il/people/YoninaEldar/104.pdf)

### Implementation

The shrinked_covariance() method from MlFinLab allows us to easily calculate the Shrinked Covariances for each method for comparison.

In [10]:
# Finding the Shrinked Covariances on price data with every method
shrinked_cov = risk_estimators.shrinked_covariance(stock_prices, price_data=True,
                                                   shrinkage_type='all', basic_shrinkage=0.1)

# Separating the Shrinked covariances for every method
shrinked_cov_basic, shrinked_cov_lw, shrinked_cov_oas = shrinked_cov

# Transforming each Shrinked Covariance from np.array to pd.DataFrame
shrinked_cov_basic = pd.DataFrame(shrinked_cov_basic, index=cov_matrix.index, columns=cov_matrix.columns)
shrinked_cov_lw = pd.DataFrame(shrinked_cov_lw, index=cov_matrix.index, columns=cov_matrix.columns)
shrinked_cov_oas = pd.DataFrame(shrinked_cov_oas, index=cov_matrix.index, columns=cov_matrix.columns)

print('The Basic Shrinked covariance with an alpha of 0.1 is:')
shrinked_cov_basic

The Basic Shrinked covariance with an alpha of 0.1 is:


Unnamed: 0,EEM,EWG,TIP,EWJ,EFA
EEM,0.000446,0.000315,-1.5e-05,0.00023,0.000292
EWG,0.000315,0.000362,-1.3e-05,0.000199,0.000273
TIP,-1.5e-05,-1.3e-05,4.5e-05,-8e-06,-1.1e-05
EWJ,0.00023,0.000199,-8e-06,0.000236,0.000197
EFA,0.000292,0.000273,-1.1e-05,0.000197,0.000278


In [11]:
print('The Ledoit-Wolf Shrinked covariance is:')
shrinked_cov_lw

The Ledoit-Wolf Shrinked covariance is:


Unnamed: 0,EEM,EWG,TIP,EWJ,EFA
EEM,0.000464,0.000346,-1.6e-05,0.000252,0.000321
EWG,0.000346,0.000371,-1.5e-05,0.000219,0.0003
TIP,-1.6e-05,-1.5e-05,2.2e-05,-9e-06,-1.2e-05
EWJ,0.000252,0.000219,-9e-06,0.000233,0.000216
EFA,0.000321,0.0003,-1.2e-05,0.000216,0.000278


In [12]:
print('The Oracle Approximating Shrinked covariance is:')
shrinked_cov_oas

The Oracle Approximating Shrinked covariance is:


Unnamed: 0,EEM,EWG,TIP,EWJ,EFA
EEM,0.000465,0.000349,-1.7e-05,0.000255,0.000324
EWG,0.000349,0.000372,-1.5e-05,0.000221,0.000303
TIP,-1.7e-05,-1.5e-05,2e-05,-9e-06,-1.2e-05
EWJ,0.000255,0.000221,-9e-06,0.000232,0.000218
EFA,0.000324,0.000303,-1.2e-05,0.000218,0.000278


In [13]:
print('The Simple Covariance is:')
cov_matrix

The Simple Covariance is:


Unnamed: 0,EEM,EWG,TIP,EWJ,EFA
EEM,0.000466,0.00035,-1.7e-05,0.000255,0.000324
EWG,0.00035,0.000372,-1.5e-05,0.000221,0.000303
TIP,-1.7e-05,-1.5e-05,1.9e-05,-9e-06,-1.2e-05
EWJ,0.000255,0.000221,-9e-06,0.000232,0.000218
EFA,0.000324,0.000303,-1.2e-05,0.000218,0.000278


The Shrinked Covariance matrices for the Ledoit-Wolf and Oracle Approximating algorithms are similar with absolute covariance values in the Oracle Approximating covariance matrix being slightly bigger. With the basic Shrinkage covariance matrix with $\alpha = 0.1$, the absolute values are even smaller. The Simple Covariance matrix has the highest absolute values in comparison.

## Semi-Covariance Matrix
Semi-covariance matrix is the way to measure the volatility of the negative returns or returns below a certain threshold. 

This measure can be used to decrease the negative volatility and is being more precise for this goal than the covariance matrix that measures both positive and negative variance. 

According to the __Minimum Downside Volatility Indices__ paper:

_"Each element in the Semi-Covariance matrix is calculated as:"_

$$SemiCov_{ij} = \frac{1}{T}\sum_{t=1}^{T}[Min(R_{i,t}-B,0)*Min(R_{j,t}-B,0)]$$

_where $T$ is the number of observations,_ $R_{i,t}$ _is the return of an asset $i$ at time $t$, and $B$ is the threshold return._

_If the $B$ is set to zero, the volatility of negative returns is measured._

A deeper analysis of use cases of Semi-Covariance matrix is available in the paper by _Solactive AG - German Index Engineering_ __Minimum Downside Volatility Indices__ [available here](https://www.solactive.com/wp-content/uploads/2018/04/Solactive_Minimum-Downside-Volatility-Indices.pdf)

We can calculate the Semi-Covariance and compare it to the simple covariance.

In [14]:
# Finding the Semi-Covariance on price data
semi_cov = risk_estimators.semi_covariance(stock_prices, price_data=True, threshold_return=0)

# Transforming Semi-Covariance from np.array to pd.DataFrame
semi_cov = pd.DataFrame(semi_cov, index=cov_matrix.index, columns=cov_matrix.columns)

print('The Semi-Covariance matrix is:')
semi_cov

The Semi-Covariance matrix is:


Unnamed: 0,EEM,EWG,TIP,EWJ,EFA
EEM,4.4e-05,3.5e-05,2e-06,2.5e-05,3.2e-05
EWG,3.5e-05,3.8e-05,2e-06,2.3e-05,3.1e-05
TIP,2e-06,2e-06,2e-06,2e-06,2e-06
EWJ,2.5e-05,2.3e-05,2e-06,2.3e-05,2.2e-05
EFA,3.2e-05,3.1e-05,2e-06,2.2e-05,2.9e-05


In [15]:
print('The Simple Covariance is:')
cov_matrix

The Simple Covariance is:


Unnamed: 0,EEM,EWG,TIP,EWJ,EFA
EEM,0.000466,0.00035,-1.7e-05,0.000255,0.000324
EWG,0.00035,0.000372,-1.5e-05,0.000221,0.000303
TIP,-1.7e-05,-1.5e-05,1.9e-05,-9e-06,-1.2e-05
EWJ,0.000255,0.000221,-9e-06,0.000232,0.000218
EFA,0.000324,0.000303,-1.2e-05,0.000218,0.000278


As the computation of the Semi-Covariance matrix is different from the usual computation of the covariance matrix, the absolute values in the Semi-Covariance matrix are significantly lower. Since it's a measure, let's multiply the Semi-Covariance matrix by 10 to better see the changes in the measures.

In [16]:
print('The Semi-Covariance matrix multiplied by 10 is:')
semi_cov * 10

The Semi-Covariance matrix multiplied by 10 is:


Unnamed: 0,EEM,EWG,TIP,EWJ,EFA
EEM,0.000438,0.000351,1.8e-05,0.000251,0.000322
EWG,0.000351,0.000377,1.7e-05,0.000231,0.000312
TIP,1.8e-05,1.7e-05,1.9e-05,1.6e-05,1.5e-05
EWJ,0.000251,0.000231,1.6e-05,0.00023,0.000222
EFA,0.000322,0.000312,1.5e-05,0.000222,0.000285


Now we can see that the values in the two matrices are similar, however, some differences are present.

For example, the simple Covariance between the EEM and TIP is negative, but the negative returns have positive covariance. 

## Exponentially-Weighted Covariance Matrix
Each element in the Exponentially-weighted Covariance matrix is calculated as follows.

First, we calculate the series of covariances for every observation time $t$ between each two elements $i$ and $j$:

$$CovarSeries_{i,j}^{t} = (R_{i}^{t} - Mean(R_{i})) * (R_{j}^{t} - Mean(R_{j}))$$

Then we apply the exponential weighted moving average based on the obtained series with decay in terms of span, as $\alpha=\frac{2}{span+1}$, for $span \ge 1$

$$ExponentialCovariance_{i,j} = ExponentialWeightedMovingAverage(CovarSeries_{i,j})[T]$$

So, it's the last element from an exponentially weighted moving average series based on a series of covariances between returns of the corresponding assets. It is used to give greater weight to most relevant observations in computing the covariance.

We can calculate the Exponential Covariance and compare it to the simple covariance. 

In [17]:
# Finding the Exponential Covariance on price data and span of 60
exponential_cov = risk_estimators.exponential_covariance(stock_prices, price_data=True, window_span=60)

# Transforming Semi-Covariance from np.array to pd.DataFrame
exponential_cov = pd.DataFrame(exponential_cov, index=cov_matrix.index, columns=cov_matrix.columns)

print('The Exponential Covariance matrix is:')
exponential_cov

The Exponential Covariance matrix is:


Unnamed: 0,EEM,EWG,TIP,EWJ,EFA
EEM,0.000282,0.000322,-4e-06,0.00019,0.000303
EWG,0.000322,0.000459,-1.9e-05,0.000237,0.00041
TIP,-4e-06,-1.9e-05,9e-06,-1.1e-05,-1.6e-05
EWJ,0.00019,0.000237,-1.1e-05,0.000199,0.000229
EFA,0.000303,0.00041,-1.6e-05,0.000229,0.00038


In [18]:
print('The Simple Covariance is:')
cov_matrix

The Simple Covariance is:


Unnamed: 0,EEM,EWG,TIP,EWJ,EFA
EEM,0.000466,0.00035,-1.7e-05,0.000255,0.000324
EWG,0.00035,0.000372,-1.5e-05,0.000221,0.000303
TIP,-1.7e-05,-1.5e-05,1.9e-05,-9e-06,-1.2e-05
EWJ,0.000255,0.000221,-9e-06,0.000232,0.000218
EFA,0.000324,0.000303,-1.2e-05,0.000218,0.000278


From the results it's seen that the variance of the EWG has increased in the last observations, whereas the the variance of the EEM has decreased. The covariance between the EEM and EWJ has decreased in the last observations.

So, the covariance with higher weights in the most recent observations can be analyzed in comparison to covariance with equal weights (simple covariance). And the conclusions about how the covariance has changed over time can be drawn.

## De-Noising Covariance Matrix
The main idea behind de-noising the covariance matrix is to eliminate the eigenvalues of the covariance matrix that are representing noise and not useful information. 

This is done by determining the maximum theoretical value of the eigenvalue of such matrix as a threshold and then setting all the calculated eigenvalues above the threshold to the same value.

The function provided below for de-noising the covariance works as follows:
- The given covariance matrix is transformed to the correlation matrix.
- The eigenvalues and eigenvectors of the correlation matrix are calculated.
- Using the Kernel Density Estimate algorithm a kernel of the eigenvalues is estimated.
- The Marcenko-Pastur pdf is fitted to the KDE estimate using the variance as the parameter for the optimization.
- From the obtained Marcenko-Pastur distribution, the maximum theoretical eigenvalue is calculated using the formula from the "Instability caused by noise" part.
- The eigenvalues in the set that are above the theoretical value are all set to their average value. For example, we have a set of 5 sorted eigenvalues ($\lambda_1$...$\lambda_5$), 2 of which are above the maximum theoretical value, then we set $\lambda_4^{NEW} = \lambda_5^{NEW} = \frac{\lambda_4^{OLD} + \lambda_5^{OLD}}{2}$
- The new set of eigenvalues with the set of eigenvectors is used to obtain the new de-noised correlation matrix.
- The new correlation matrix is then transformed back to the new de-noised covariance matrix.

The process of de-noising the covariance matrix is described in a paper by _Potter M._, _J.P. Bouchaud_, _L. Laloux_ __“Financial applications of random matrix theory: Old laces and new pieces.”__  [available here](https://arxiv.org/abs/physics/0507111).

In [19]:
# Setting the required parameters for de-noising

# Relation of number of observations T to the number of variables N (T/N)
tn_relation = stock_prices.shape[0] / stock_prices.shape[1]

# The bandwidth of the KDE kernel
kde_bwidth = 0.25

# Finding the Вe-noised Сovariance matrix
cov_matrix_denoised = risk_estimators.denoise_covariance(cov_matrix, tn_relation, kde_bwidth)

# Transforming De-noised Covariance from np.array to pd.DataFrame
cov_matrix_denoised = pd.DataFrame(cov_matrix_denoised, index=cov_matrix.index, columns=cov_matrix.columns)

# Outputting the result
print('The De-noised Сovariance matrix is:')
cov_matrix_denoised

The De-noised Сovariance matrix is:


Unnamed: 0,EEM,EWG,TIP,EWJ,EFA
EEM,0.000466,0.000288,-2.8e-05,0.000224,0.000252
EWG,0.000288,0.000372,-2.5e-05,0.0002,0.000226
TIP,-2.8e-05,-2.5e-05,1.9e-05,-1.9e-05,-2.2e-05
EWJ,0.000224,0.0002,-1.9e-05,0.000232,0.000175
EFA,0.000252,0.000226,-2.2e-05,0.000175,0.000278


In [20]:
print('The Simple Covariance is:')
cov_matrix

The Simple Covariance is:


Unnamed: 0,EEM,EWG,TIP,EWJ,EFA
EEM,0.000466,0.00035,-1.7e-05,0.000255,0.000324
EWG,0.00035,0.000372,-1.5e-05,0.000221,0.000303
TIP,-1.7e-05,-1.5e-05,1.9e-05,-9e-06,-1.2e-05
EWJ,0.000255,0.000221,-9e-06,0.000232,0.000218
EFA,0.000324,0.000303,-1.2e-05,0.000218,0.000278


As we can see, the main diagonal hasn't changed, but the other covariances are different. This means that the algorithm has changed the eigenvalues of the correlation matrix.

In the above example, the default denoising method (Constant Residual Eigenvalue Method) was used and the detone parameter was set by default to False. MlFinLab also has the Targeted Shrinkage de-noising method available to users. The main idea behind this method is to shrink the eigenvectors/eigenvalues that are noise-related. This is done by shrinking the correlation matrix calculated from noise-related eigenvectors/eigenvalues and then adding the correlation matrix composed from signal-related eigenvectors/eigenvalues.

Additionally, we can de-tone our correlation matrix. This is done by excluding a nummber of first eigenvectors representing the mmarket component.

We can use these methods by setting the denoise_method parameter to 'target_shrink' and the detone parameter to True

In [21]:
# Setting the required parameters for de-noising

# Relation of number of observations T to the number of variables N (T/N)
tn_relation = stock_prices.shape[0] / stock_prices.shape[1]

# The bandwidth of the KDE kernel
kde_bwidth = 0.25

# Finding the Вe-noised Сovariance matrix
cov_matrix_denoised = risk_estimators.denoise_covariance(cov_matrix, tn_relation, denoise_method='target_shrink', detone=True, kde_bwidth=kde_bwidth)

# Transforming De-noised Covariance from np.array to pd.DataFrame
cov_matrix_denoised = pd.DataFrame(cov_matrix_denoised, index=cov_matrix.index, columns=cov_matrix.columns)

# Outputting the result
print('The De-noised Сovariance matrix is:')
cov_matrix_denoised

The De-noised Сovariance matrix is:


Unnamed: 0,EEM,EWG,TIP,EWJ,EFA
EEM,0.000466,-0.000416,-2e-05,-0.000323,-0.00036
EWG,-0.000416,0.000372,-1.8e-05,-0.000294,-0.000322
TIP,-2e-05,-1.8e-05,1.9e-05,-1.2e-05,-2.7e-05
EWJ,-0.000323,-0.000294,-1.2e-05,0.000232,-0.000254
EFA,-0.00036,-0.000322,-2.7e-05,-0.000254,0.000278


## Covariance and Correlation Matrix Transformations
The MlFinLab library also provides us with simmple functions to transform our covariance matrix into a correlation matrix and back.

In [22]:
# Transforming our covariance matrix to a correlation matrix
corr_matrix = risk_estimators.cov_to_corr(cov_matrix)

# Outputting the result
print('The correlation matrix is:')
corr_matrix

The correlation matrix is:


Unnamed: 0,EEM,EWG,TIP,EWJ,EFA
EEM,1.0,0.840079,-0.175654,0.775864,0.90119
EWG,0.840079,1.0,-0.176822,0.75206,0.943192
TIP,-0.175654,-0.176822,1.0,-0.132683,-0.168585
EWJ,0.775864,0.75206,-0.132683,1.0,0.859232
EFA,0.90119,0.943192,-0.168585,0.859232,1.0


In [23]:
# The standard deviation to use when calculating the covaraince matrix back
std = np.diag(cov_matrix) ** (1/2)

# And back to the covariance matrix
cov_matrix_again = risk_estimators.corr_to_cov(corr_matrix, std)

# Outputting the result
print('The covariance matrix calculated back is:')
cov_matrix_again

The covariance matrix calculated back is:


Unnamed: 0,EEM,EWG,TIP,EWJ,EFA
EEM,0.000466,0.00035,-1.7e-05,0.000255,0.000324
EWG,0.00035,0.000372,-1.5e-05,0.000221,0.000303
TIP,-1.7e-05,-1.5e-05,1.9e-05,-9e-06,-1.2e-05
EWJ,0.000255,0.000221,-9e-06,0.000232,0.000218
EFA,0.000324,0.000303,-1.2e-05,0.000218,0.000278


In [24]:
print('Exactly the same as the original one:')
cov_matrix

Exactly the same as the original one:


Unnamed: 0,EEM,EWG,TIP,EWJ,EFA
EEM,0.000466,0.00035,-1.7e-05,0.000255,0.000324
EWG,0.00035,0.000372,-1.5e-05,0.000221,0.000303
TIP,-1.7e-05,-1.5e-05,1.9e-05,-9e-06,-1.2e-05
EWJ,0.000255,0.000221,-9e-06,0.000232,0.000218
EFA,0.000324,0.000303,-1.2e-05,0.000218,0.000278


## Conclusion
This post describes the functions implemented in the RiskEstimators class from MlFinLab, related to different ways of calculating and adjusting the Covariance matrix. Also, it shows how the corresponding functions from the MlFinLab library can be used and how the outputs can be analyzed.

Key takeaways from the post:
- A robust covariance estimator (such as the Minimum Covariance Determinant) is needed in order to discard/downweight the outliers in the data. These outliers seriously affect the Empirical covariance estimator and the Covariance estimators with shrinkage.
- The Maximum Likelihood Estimator (Empirical Covariance) of a sample is an unbiased estimator of the corresponding population’s covariance matrix.
- Shrinkage consists in reducing the ratio between the smallest and the largest eigenvalues of the empirical covariance matrix. It is used to avoid the problem with inversion of the covariance matrix.
- Ledoit-Wolf and Oracle Approximating are methods to calculate the optimal shrinkage coefficient $\alpha$ used in the Basic Shrinkage.
- The semi-covariance matrix is the way to measure the volatility of the negative returns or returns below a certain threshold. 
- Exponential Covariance is used to give greater weight to the most relevant observations in computing the covariance.
- The De-noising algorithm calculates the eigenvalues of the correlation matrix and eliminates the ones that are higher than the theoretically estimated ones, as they are caused by noise.

## Sources
- [scikit-learn User Guide on Covariance estimation](https://scikit-learn.org/stable/modules/covariance.html#robust-covariance)
- [RiskEstimators - MlFinLab Documentation](https://mlfinlab.readthedocs.io/en/latest/portfolio_optimisation/risk_estimators.html)