<img src="http://hilpisch.com/tpq_logo.png" alt="The Python Quants" width="35%" align="right" border="0"><br><br><br>

# Listed Volatility and Variance Derivatives

**Wiley Finance (2017)**

Dr. Yves J. Hilpisch | The Python Quants GmbH

http://tpq.io | [@dyjh](http://twitter.com/dyjh) | http://books.tpq.io

<img src="http://hilpisch.com/../images/lvvd_cover.png" alt="Derivatives Analytics with Python" width="30%" align="left" border="0">

# Data Analysis and Strategies

## Introduction

This chapter is about the analysis of data and investment strategies related to the EURO STOXX 50 and VSTOXX indexes. It uses public data sources ("open data") and draws heavily on the capabilities of the Python library pandas for data analytics.

The chapter has two major goals. First, it reproduces the stylized fact that stock indexes and volatility indexes in general are *negatively correlated*. This suggests that (products based on) volatility indexes are a means to hedge market risk resulting from stock indexes. The question, however, is how to best exploit the negative correlation in asset allocation terms. Therefore, the second goal is to illustrate the benefits for equity investors resulting from *constant proportion investment strategies* involving a volatility index like the  VSTOXX. For simplicity, the respective analysis assumes that a direct investment in the VSTOXX is possible. This replicates results as found, for example, in the study by Guobuzaite and Martellini (2012).

## Retrieving Base Data

This section shows how to retrieve and store historical daily closing data for the EURO STOXX 50 index and the VSTOXX volatility index. We mainly work with pandas in the following.

In [None]:
import numpy as np
import pandas as pd

### EURO STOXX 50 Data 

First, the retrieval of data for the EUROSTOXX index.

In [None]:
url = 'https://hilpisch.com/lvvd_eikon_eod_data.csv'

In [None]:
data = pd.read_csv(url, index_col=0, parse_dates=True)

Let us inspect the first few rows of the ``DataFrame`` object. Given the raw data from above, we seem to have done everything right.

In [None]:
data.tail()

The single time series start in 2010 and go to November 2020.

In [None]:
data.info()

Before proceeding to the corresponding procedure for the VSTOXX data, the following visualizes the historical closing values for the EURO STOXX 50 index, i.e. for symbol `.STOXX50E`.

In [None]:
from pylab import mpl, plt
plt.style.use('seaborn')
mpl.rcParams['font.family'] = 'serif'  # set serif font
data['.STOXX50E'].plot(grid=True, figsize=(10, 6));

<p style="font-family: monospace;">Historical EURO STOXX 50 index levels.

### VSTOXX Data

Second, visualizing the data from the same source as before for the VSTOXX index. The following figure visualizes the times series data for the main volatility index with symbol `.V2TX`. Inspection of the the figure reveals that the volatility index increases and even spikes when there are crises in the markets &mdash; which is best seen in the beginning of 2020.

In [None]:
data['.V2TX'].plot(grid=True, figsize=(10, 6));

<p style="font-family: monospace;">Historical VSTOXX index levels.

## Basic Data Analysis

As illustrated in the appendix about _Introduction to Python_, pandas provides a wealth of options to analyze data stored in ``DataFrame`` objects. A sometimes helpful method is ``describe()`` which provides selected meta statistics for the single data sub-sets stored in ``DataFrame`` columns.

In [None]:
data.describe()

Sub-plots of multiple data sub-sets are easily generated with pandas. The result of the following plotting code is found in the following figure. 

In [None]:
data[['.STOXX50E', '.V2TX']].plot(subplots=True,  # generate sub-plots per column
        figsize=(10, 6),  # sizing of the figure
        color='blue');  # color to plot the data

<p style="font-family: monospace;">Historical EURO STOXX 50 and VSTOXX index levels.

Similarly, you can visualize the absolute differences (over time) by using the ``diff()`` method and plotting them as a histogram (see the following the figure). 

In [None]:
data[['.STOXX50E', '.V2TX']].diff().hist(figsize=(10, 6),  # figure sizing
                 color='blue',  # color for the plotted data
                 bins=100);  # number of bins to be used

<p style="font-family: monospace;">Histograms of historical absolute daily differences of EURO STOXX 50 and VSTOXX.

In similar spirit, we can calculate the relative or percent changes for the time series data.

In [None]:
data.pct_change().iloc[-10:]

The code to visualize this kind of result is the same as before (see the following figure).

In [None]:
data[['.STOXX50E', '.V2TX']].pct_change().hist(figsize=(10, 6),
                       color='blue', 
                       bins=100);

<p style="font-family: monospace;">Histograms of historical percentage changes of EURO STOXX 50 and VSTOXX.

The majority of statistical analysis approaches relies on (log) returns and not on absolute time series data. The next few lines of code calculate the log returns for the two time series and stores them in yet another `DataFrame` object, called `log_rets`.

In [None]:
## fully vectorized calculation of log returns
log_rets = np.log(data / data.shift(1))
log_rets.iloc[:10]

The following figure visualizes the log returns times series for both the EURO STOXX 50 and VSTOXX indexes.

In [None]:
log_rets[['.STOXX50E', '.V2TX']].plot(subplots=True,
              figsize=(10, 6),
              color='blue',
              grid=True);

<p style="font-family: monospace;">Log returns of EURO STOXX 50 and VSTOXX over time.

## Correlation Analysis

Equipped with the log returns, a thorough investigation of the correlation between the EURO STOXX 50 and VSTOXX indexes is straightforward. For example, pandas provides the ``corr()`` method to calculate correlations between time series data stored in different ``DataFrame`` columns.

In [None]:
log_rets.corr()

Similar results are obtained by calculating the correlation of the data stored in one pandas `Series` object with another data set in another `Series` object. 

In [None]:
log_rets['.STOXX50E'].corr(log_rets['.V2TX'])

The ``plot()`` method of pandas ``DataFrame`` objects knows different types of plots. For example, scatter plots are helpful to visualize return data of two different time series (see the following figure).

In [None]:
## plot log returns as scatter plot
log_rets.plot(x='.STOXX50E', y='.V2TX',
              kind='scatter', figsize=(10, 6));

<p style="font-family: monospace;">Scatter plot of log returns of EURO STOXX 50 and VSTOXX.

The following figure adds the time dimension to the data through coloring to illustrate the relation between the two time series over time (during different "regimes").

In [None]:
log_rets = log_rets.dropna() # delete NaN values
plt.set_cmap('coolwarm');  # set color map
mpl_dates = mpl.dates.date2num(log_rets.index.to_pydatetime())  # conversion
plt.figure(figsize=(10, 6))
plt.scatter(log_rets['.STOXX50E'], log_rets['.V2TX'],
            c=mpl_dates, marker='o', alpha=0.75)  # the actual plot
plt.xlabel('.STOXX50E')
plt.ylabel('.V2TX')
plt.colorbar(ticks=mpl.dates.DayLocator(interval=250),  # adding bar
        format=mpl.dates.DateFormatter('%d %b %y')); 

<p style="font-family: monospace;">Scatter plot of log returns of EURO STOXX 50 and VSTOXX with dates.

Although the negative correlation between the two indexes is almost evident from the previous two figures, let us formally calculate and represent the negative correlation by a linear regression line.The following figure adds such a regression line to the raw log returns in the scatter plot. Negative correlation translates into a negative slope of the regression line.

In [None]:
## conduct linear regression
p = np.polyfit(log_rets['.STOXX50E'].values,
               log_rets['.V2TX'].values,
               deg=1)  # the regression
log_rets.plot(x='.STOXX50E', y='.V2TX',
              kind='scatter', figsize=(10, 6));  # the actual plotting
## plot the regression line
plt.plot(log_rets['.STOXX50E'], np.polyval(p, log_rets['.STOXX50E']),
         'r');  # adding the regression line

<p style="font-family: monospace;">Scatter plot of log returns of EURO STOXX 50 and VSTOXX with regression line.

The seaborn plotting library is specifically developed with statistical applications in mind. It therefore provides multiple useful, high level plotting capabilities. One of the them is the ``jointplot()`` function. The result of applying this function to the log return data is displayed in the following figure. For sure the richest and most insightful presentation so far for our purposes.

In [None]:
import seaborn as sns

In [None]:
sns.jointplot(x=log_rets['.STOXX50E'], y=log_rets['.V2TX'], 
              kind='reg', height=7);

<p style="font-family: monospace;">Scatter plot of log returns of EURO STOXX 50 and VSTOXX with regression line and histograms.

The final analysis in this section considers correlation *over time*. To this end, pandas provides the ``rolling()`` method which allows, among otherts, the vectorized calculation of correlation for moving time windows via ``corr()``. We chose a window size of 252 trading days which represents roughly one year.

In [None]:
data['CORR'] = log_rets['.STOXX50E'].rolling(
                window=252).corr(log_rets['.V2TX'])

The following figure shows the rolling correlation data and illustrates well that correlation fluctuates for different yearly windows but that it is negative for any chosen yearly window.

In [None]:
to_plot = data.dropna()  # drop NaN values
plt.figure(figsize=(10, 6))
plt.plot(data.index, data.CORR)
plt.ylabel('rolling yearly correlation');

<p style="font-family: monospace;">Rolling yearly correlation of EURO STOXX 50 and VSTOXX.

## Constant Proportion Investment Strategies

One way to make use of the fact that the VSTOXX index is negatively correlated with the EURO STOXX 50 index is to implement a *constant proportion investment strategy*. Respective results are found, for example, in the study by Guobuzaite and Martellini (2012). Basically, a constant (dollar) proportion investment strategy keeps the proportion of money invested in securities of a portfolio over time constant by dynamic re-balancings given the movements in the prices of the single securities.

In the example of this section, we assume that direct investments both in the EURO STOXX 50 and the VSTOXX indexes are possible and that no transaction costs apply. Although this might not be realistic, it simplifies the anaylsis and illustrates the basic idea pretty well.

To begin with, let us adjust the origingal data sets for the EURO STOXX 50 and VSTOXX indexes by mainly normalizing both time series to starting values of 100.

In [None]:
del data['CORR']  # delete correlation data
data = data.dropna()  # drop NaN values
data = data / data.iloc[0] * 100  # normalization

In [None]:
data.head()

First, we analyze a typical *passive investment strategy* allocating 30% of a portfolio to the VSTOXX index and the remaining 70% to the EURO STOXX 50.

In [None]:
invest = 100  # initial investment
cratio = 0.3  # VSTOXX ratio in the beginning
## number of EURO STOXX (fictional) securities
data['Equity'] = (1 - cratio) * invest / data['.STOXX50E'][0]
## number of VSTOXX (fictional) securities
data['Volatility'] = cratio * invest / data['.V2TX'][0]

In vectorized fashion, we calculate next the absolute values of such a portfolio over time, i.e. as a time series.

In [None]:
data['Passive'] = (data['Equity'] * data['.STOXX50E']
                + data['Volatility'] * data['.V2TX'])

The following figure illustrates the performance of this particular passive investment strategy in comparison to a passive strategy purely investing in the EURO STOXX 50 index. In times of crises, for example at the beginning of 2020, the strategy peforms better. However, over the whole period there is hardly a difference in the end result.  

In [None]:
data[['.STOXX50E', 'Passive']].plot(figsize=(10, 6), style=['--', '-']);

<p style="font-family: monospace;">Passive investment strategy (hypothetical) with EURO STOXX 50 and VSTOXX.

Second, the *active constant proportion investment strategy*, which keeps the dollar proportion invested in the VSTOXX index constant at 30% *over time* through daily re-balancings of the portfolio. In this case, the single calculations are done step-by-step through looping over the single historical trading dates.

In [None]:
for i in range(1, len(data)):  # daily re-balancing
    evalue = data['Equity'].iloc[i - 1] * data['.STOXX50E'].iloc[i]
    vvalue = data['Volatility'].iloc[i - 1] * data['.V2TX'].iloc[i]
    tvalue = evalue + vvalue  # total wealth  
    data['Equity'].iloc[i] = (1 - cratio) * tvalue / data['.STOXX50E'].iloc[i] 
    data['Volatility'].iloc[i] = cratio * tvalue / data['.V2TX'].iloc[i] 

Based on the results, we calculate the  absolute performance of this active strategy as before. Over the first few trading days no major performance differences arise.

In [None]:
data['Active'] = (data['Equity'] * data['.STOXX50E']
                + data['Volatility'] * data['.V2TX'])
np.round(data.head(), 2)

Over the whole time period, however, the active strategy &mdash; showing a 600+ percent gain &mdash; significantly outperforms the passive one.

In [None]:
np.round(data.tail(), 2)

Let us briefly verify whether the above implementation indeed yields constant proportions for the two (fictional) securities in the portfolio.

In [None]:
(data['Volatility'] * data['.V2TX'] / data['Active'])[:5]

In [None]:
(data['Equity'] * data['.STOXX50E'] / data['Active'])[:5]

Being assured that we indeed have implemented a constant proportion trading strategy, have a look at the following figure which impressively illustrates the outperformance of the active approach over a passive investment in the stock index itself. However, keep in mind that all this rests on a number of simplifying assumptions.

In [None]:
data[['.STOXX50E', 'Active']].plot(figsize=(10, 6), style=['--', '-']);

<p style="font-family: monospace;">Active, constant proportion investment strategy (hypothetical) with EURO STOXX 50 and VSTOXX.

The assumption of 30% invested in the VSTOXX index might seem a bit ad hoc. Therefore, the following derives the optimal allocation for the given time period and data sets by a brute force approach. First, we clean up the ``DataFrame`` object to reduce it again to the original time series data for the two indexes.

In [None]:
## re-initialize DataFrame
data = data[['.STOXX50E', '.V2TX']]
data.info()

The function below calculates the performance of a constant proportion investment strategy for different VSTOXX dollar proportions and different starting and ending dates (both defaulting to the data sets' start and end dates).

In [None]:
from copy import deepcopy
def vstoxx_strategy(cratio, start=data.index[0], end=data.index[-1]):
    base = deepcopy(data[(data.index >= start) & (data.index <= end)])
    invest = 100  # initial invest
    base['Equity'] = (1 - cratio) * invest / base['.STOXX50E'].iloc[0]
    base['Volatility'] = cratio * invest / base['.V2TX'].iloc[0]
    for i in range(1, len(base)):  # daily re-balancing
        evalue = base['Equity'].iloc[i - 1] * base['.STOXX50E'].iloc[i]
        vvalue = base['Volatility'].iloc[i - 1] * base['.V2TX'].iloc[i]
        tvalue = evalue + vvalue
        base['Equity'].iloc[i] = (1 - cratio) * tvalue / base['.STOXX50E'].iloc[i]
        base['Volatility'].iloc[i] = cratio * tvalue / base['.V2TX'].iloc[i]
    base['Active'] = (base['Equity'] * base['.STOXX50E']
                + base['Volatility'] * base['.V2TX'])  # wealth position
    print('A con. VSTOXX ratio of %.2f yields a net perform. of %6.1f%%.' \
            % (cratio, (base['Active'].iloc[-1] / base['Active'].iloc[0] - 1) * 100))

Equipped with this function, let us calculate the net perfomance for a VSTOXX dollar ratio of 30% as before.

In [None]:
vstoxx_strategy(0.3)

We do the same for 40% which obviously yields an even better result.

In [None]:
vstoxx_strategy(0.4)

Applying an approach which is a bit more systematic, we get the following results for different constant dollar proportion assumptions. Over the whole period for which data is available a 50%:50% investment strategy seems to be optimal with a net performance of more than 900%.

In [None]:
for cratio in np.arange(0, 1.01, 0.1):
    vstoxx_strategy(cratio)

Let us implement the same analysis for the time period beginning in January 2017 and ending with the second quarter of 2020. In this case, a constant dollar proportion invested in the VSTOXX of about 60% seems optimal and yields a net performance of about 220%.

In [None]:
for cratio in np.arange(0, 1.01, 0.1):
    vstoxx_strategy(cratio, start='2017-1-1', end='2020-7-1')

In [None]:
for cratio in np.arange(0, 1.01, 0.1):
    vstoxx_strategy(cratio, start='2019-7-1', end='2020-10-31')

## Conclusions

This chapter is about the retrieval and analysis of EURO STOXX 50 and VSTOXX historical data. It shows how to use Python and pandas to retrieve historical data sets with pandas. It also shows how to implement Python code to replicate central stylized facts about stock and volatility indexes, namely their highly negative correlation and the benefits of constant (dollar) proportion investment strategies involving (products based on) equity and volatility indexes.

<img src="http://hilpisch.com/tpq_logo.png" alt="The Python Quants" width="35%" align="right" border="0"><br>

<a href="http://tpq.io" target="_blank">http://tpq.io</a> | <a href="http://twitter.com/dyjh" target="_blank">@dyjh</a> | <a href="mailto:team@tpq.io">team@tpq.io</a>