<img src="http://hilpisch.com/tpq_logo.png" alt="The Python Quants" width="35%" align="right" border="0"><br><br><br>

# Listed Volatility and Variance Derivatives

**Wiley Finance (2017)**

Dr. Yves J. Hilpisch | The Python Quants GmbH

http://tpq.io | [@dyjh](http://twitter.com/dyjh) | http://books.tpq.io

<img src="http://hilpisch.com/../images/lvvd_cover.png" alt="Derivatives Analytics with Python" width="30%" align="left" border="0">

# Introduction to Python

Python has become a powerful programming language and has developed a huge ecosystem of helpful libraries over the last couple of years. This chapter provides a concise overview of Python and two of the major pillars of the the so-called *scientific stack*:

* NumPy (see http://numpy.org)
* pandas (see http://pandas.pydata.org)

NumPy provides performant array operations on numerical data while pandas is specifically designed to handle more complex data analytics operations, e.g. on (financial) times series data.

Such an introductory chapter &mdash; only addressing selected topics relevant to the contents of this book &mdash; can of course not replace a thourough introduction to Python and the libraries covered. However, if you are rather new to Python or programming in general you might get a first overview and a feeling of what Python is all about. If you are already experienced in another language typically used in quantitative finance (e.g. Matlab, R, C++, VBA), you see how typical data structures, programming paradigms and idioms in Python look like.

For a comprehensive overview of Python applied to finance see Hilpisch (2014). Other, more general introductions, to the language with a scientific and data analysis focus are Haenel et al. (2013), Langtangen (2009) and McKinney (2012).

This chapter and the rest of the book is based on Python 2.7 although the majority of the code should be easily transformed to Python 3.5 after some minor modifications.

## Python Basics

This section introduces basic Python data types and structures, control structures and some Python idioms.

### Data Types

It is noteworthy that Python is a *dynamically typed system* which means that types of objects are inferred from their contexts. Let us start with numbers. 

In [None]:
a = 3  # defining an integer object
type(a)

In [None]:
a.bit_length()  # number of bits used

In [None]:
b = 5.  # defining a float object
type(b)

Python can handle arbitrarily large integers which is quite beneficial for number theoretical applications, for instance.

In [None]:
c = 10 ** 100  # googol number
type(c)

In [None]:
c # long integer object

In [None]:
c.bit_length()  # number of bits used

Arithmetic operations on these objects work as expected.

In [None]:
3 / 5  # division

In [None]:
a * b  # multiplication

In [None]:
a - b  # difference

In [None]:
b + a  # addition

In [None]:
a ** b  # power

Many oftenly used mathematical functions are found in the ``math`` module which is part of Python's standard library.

In [None]:
import math  # importing the library into the namespace

In [None]:
math.log(a)  # natural logarithm

In [None]:
math.exp(a)  # exponential function

In [None]:
math.sin(b)  # sine function

Another important basic data type are string objects.

In [None]:
s = 'Listed Volatility and Variance Derivatives.'
type(s)

This object type has multiple methods attached.

In [None]:
s.lower()  # converting to lower case characters

In [None]:
s.upper()  # converting to upper case characters

Strings objects can be easily sliced. Note that Python has in general zero-based numbering and indexing.

In [None]:
s[0:6]

Such objects can also be combined using the ``+`` operator. The index value -1 represents the last character of a string (or last element of a sequence in general).

In [None]:
st = s[0:6] + s[-13:-1]
print(st)

String replacements are often used to parametrize text output.

In [None]:
repl = "My name is %s, I am %d years old and %4.2f m tall."
## replace %s by a string, %d by an integer and
## %4.2f by a float showing 2 decimal values
print(repl % ('Peter', 35, 1.88)) 

A different way to reach the same goal is the following.

In [None]:
repl = "My name is {:s}, I am {:d} years old and {:4.2f} m tall."
print(repl.format('Peter', 35, 1.88))

### Data Structures

A light weight data structure are tuples. These are immutable collections of other objects and are constucted by objects separated by commas &mdash; with or without parentheses. 

In [None]:
t1 = (a, b, st)
t1

In [None]:
type(t1)

In [None]:
t2 = st, b, a
t2

In [None]:
type(t2)

Nested structures are also possible. 

In [None]:
t = (t1, t2)

In [None]:
t

In [None]:
t[0][2]  # take 3rd element of 1st element

List objects are mutable collections of other objects and are generally constructed by providing a comma separated collection of objects in brackets.

In [None]:
l = [a, b, st]
l

In [None]:
type(l)

In [None]:
l.append(s.split()[3])  # append 4th word of string
l

Sorting is a typical operation on list objects which can also be constructed using the ``list`` constructor (here applied to a tuple object).

In [None]:
l = list(('Z', 'Q', 'D', 'J', 'E', 'H'))
l

In [None]:
l.sort()  # in-place sorting
l

Dictionary objects are so-called key-value stores and are generally constructed with curley brackets.

In [None]:
d = {'int_obj': a,
    'float_obj': b,
    'string_obj': st}
type(d)

In [None]:
d

In [None]:
d['float_obj']  # look-up of value given key

In [None]:
d['long_obj'] = c / 10 ** 90  # adding new key value pair
d

Keys and values of a dictionary object can be retrieved as list objects.

In [None]:
d.keys()

In [None]:
d.values()

### Control Structures

Iterations are very important operations in programming in general and financial analytics in particular. Many Python objects are iterable which proves rather convenient in many circumstances. Consider the special list object constructor ``range``.

In [None]:
range(5)  # all integers from zero to 5 excluded

In [None]:
range(3, 15, 2)  # start at 3, step with 2 until 15 excluded

Such a list object constructor is often used in the context of a ``for`` loop.

In [None]:
for i in range(5):
    print(i ** 2, end=' ')

However, you can iterate over any sequence.

In [None]:
## iteration over list object
for _ in l:
    print(_, end=' ')

In [None]:
## iteration over string object
for c in st:
    print(c + '|', end=' ')

``while`` loops are similar to their counterparts in other languages.

In [None]:
i = 0  # initialize counter
while i < 5:
    print(i ** 0.5, end=' ')  # output
    i += 1  # increase counter by 1

The ``if-elif-else`` control structure is introduced below in the context of Python function definitions.

### Special Python Idioms 

Python in many places relies on a number of special idioms. Let us start with a rather popular one, the list comprehension.

In [None]:
lc = [i ** 2 for i in range(10)]
lc

As the name suggests, the result is a list object.

In [None]:
type(lc)

So-called lambda or anonymous functions are usuful helpers in many places.

In [None]:
f = lambda x: math.cos(x) # returns cos of x
f(5)

List comprehensions can be combined with lambda functions to achieve concise constructions of list objects.

In [None]:
lc = [f(x) for x in range(10)]
lc

However, there is an even more concise way of constructing the same list object &mdash; using functional programming approaches, in the case to follow with ``map``.

In [None]:
list(map(lambda x: math.cos(x), range(10)))

In general, one works with regular Python functions (as opposed to lambda functions) which are constructed as follows.

In [None]:
def f(x):
    return math.exp(x)
f(5)

The general construction looks like this.

In [None]:
def f(*args):  # multiple arguments
    for arg in args:
        print(arg)
        # do something with arguments
    return None  # return result(s) (not necessary)

f(l)

Consider the following function definition which returns different values/strings based on an ``if-elif-else`` control structure.

In [None]:
import random  # import random number library
a = random.randint(0, 1000)  # draw random number between 0 and 1000
print("Random number is %d" % a)
def number_decide(number):
    if a < 10:
        return "Number is single digit."
    elif 10 <= a < 100:
        return "Number is double digit."
    else:
        return "Number is triple digit."
number_decide(a)

A specialty of Python are generator objects. One constructor for such objects that is commonly used is ``xrange``. 

In [None]:
g = range(10)
type(g)  # object type

In [None]:
g  # object instance

In [None]:
for _ in g:
    print(_, end=' ')  # integers are "generated" when needed

Generator objects can in many scenarios replace (typical) list objects and have the major advantage that they are in general much more memory efficient. Consider a financial algorithm that requires 10 mn loops. Iterating over a list of integers from 0 to 9,999,999 is not efficient since the algorithm (in general) does not need to have all these numbers available at the same time. But this is what happens when using ``range`` for such a loop.

Consider the following construction of the respective list object containing all integers.

In [None]:
%time r = list(range(10000000))

This object consumes 80 MB (!) of RAM (10 mn times 8 bytes).

In [None]:
import sys
sys.getsizeof(r)  # size in bytes of object

On the other hand, consider the analogous construction based on a generator (``range``) object. It is much, much faster since no memory has to be allocated, no list object has to be generated up-front, etc.

In [None]:
%time xr = range(10000000)

Memory consumption is also much, much more efficient &mdash; 40 bytes compared to 80 MB.

In [None]:
sys.getsizeof(xr)

However, in practical applications the two can be used often interchangeably such that one should always resort to the more efficient alternative when possible. The following examples calculate &mdash; using the functional programming operation ``reduce`` &mdash; the sum of all integers from 0 to 9,999,999. Although in this case there is hardly a performance difference, the first operation requires 80 MB of memory while the second might only require less than 100 bytes.

In [None]:
from functools import reduce

In [None]:
%time reduce(lambda x, y: x + y, list(range(1000000)))

In [None]:
%time reduce(lambda x, y: x + y, range(1000000))

More Pythonic (and faster in general) is to calculate the sum using the built-in ``sum`` function &mdash; in this case a significant performance advantage for the generator approach emerges.

In [None]:
%timeit sum(list(range(1000000)))

In [None]:
%timeit sum(range(1000000))

There is also a way of indirectly constructing a generator object, i.e. by the use of parentheses. The following code results in a generator object for the sine values of the numbers from 0 to 99.

In [None]:
g = (math.sin(x) for x in range(100))
g

Yet another way of constructing a generator object is by a definition style that resembles the  standard function definition closely. The difference is that instead of the ``return`` statement, the ``yield`` statement is used.

In [None]:
def g(start, end):
    while start <= end:
        yield start  # yield "next" value
        start += 1  # increase by one

Usage then might be as follows.

In [None]:
go = g(15, 20)
for _ in go:
    print(_, end=' ')

## NumPy

Many operations in computational finance take place over (large) arrays of numerical data. NumPy is a Python  library that allows the efficient handling of and operation on such data structures. Although quite a mighty library with a wealth of functionality, it suffices for the purposes of this book to cover the basics of NumPy.

In [None]:
import numpy as np

The workhorse is the NumPy ``ndarray`` class which provides the data structure for n-dimensional, immutable array objects. You can generate an ``ndarray`` object e.g. out of a list object.

In [None]:
a = np.array(range(24))
a

The power of these objects lies in the management of n-dimensional data structures (e.g. matrices or cubes of data).

In [None]:
b = a.reshape((4, 6))
b

In [None]:
c = a.reshape((2, 3, 4))
c

So-called *standard arrays* (in contrast to e.g. structured arrays) have a single ``dtype`` (i.e. NumPy data type). Consider the following operation which changes the ``dtype`` parameter of the ``b`` object to float.

In [None]:
b = np.array(b, dtype=np.float)
b

A major strength of NumPy are *vectorized operations*.

In [None]:
2 * b

In [None]:
b ** 2

You can also pass ``ndarray`` objects to lambda or standard Python functions.

In [None]:
f = lambda x: x ** 2 - 2 * x + 0.5
f(a)

In many scenarios, only a (small) part of the data stored in a ``ndarray`` object is of interest. NumPy supports basic and advanced slicing and other selection features.

In [None]:
a[2:6]  # 3rd to 6th element

In [None]:
b[2, 4]  # 3rd row, final (5th)

In [None]:
b[1:3, 2:4]  # middle square of numbers

Boolean operations are also supported in many places.

In [None]:
## which numbers are larger than 10?
b > 10

In [None]:
## only those numbers (flat) that are larger than 10
b[b > 10]

Furthermore, ``ndarray`` objects have multiple (convenience) methods already built in.

In [None]:
a.sum()  # sum of all elements

In [None]:
b.mean()  # mean of all elements

In [None]:
b.mean(axis=0)  # mean along 1st axis

In [None]:
b.mean(axis=1)  # mean along 2nd axis

In [None]:
c.std()  # standard deviation for all elements

Similarly, there is a wealth of so-called *universal functions* that the NumPy library provides. Universal in the sense that they can be applied in general to NumPy ``ndarray`` objects and to standard numerical Python data types.

In [None]:
np.sum(a)  # sum of all elements

In [None]:
np.mean(b, axis=0)  # mean alond 1st axis

In [None]:
np.sin(b).round(2)  # sine of all elements (rounded)

In [None]:
np.sin(4.5)  # sine of Python float object

However, you should be aware that applying NumPy universal functions to standard Python data types generally comes with a significant performance burden.

In [None]:
%time l = [np.sin(x) for x in range(100000)]

In [None]:
import math
%time l = [math.sin(x) for x in range(100000)]

Using the vectorized operations from NumPy is faster than both of the above alternatives which result in list objects.

In [None]:
%time l = np.sin(np.arange(100000))

Here, we use the ndarray object constructor ``arange`` which yields an ``ndarray`` object of integers -- below a simple example.

In [None]:
ai = np.arange(10)
ai

In [None]:
ai.dtype

Using this constructor, you can also generate ``ndarray`` objects with different ``dtype`` attributes.

In [None]:
af = np.arange(0.5, 9.5, 0.5)  # start, end, step size
af

In [None]:
af.dtype

Useful in this context is also the ``linspace`` operator providing an ``ndarray`` object with evenly spaced numbers.

In [None]:
np.linspace(0, 10, 12)  # start, end, number of elements

In financial analytics one often needs (pseudo-) random numbers. NumPy provides many functions to sample from different distributions. Those needed in this book are the standard normal distribution and the Poisson distribution. The respective functions are found in the sub-library ``numpy.random``. 

In [None]:
np.random.standard_normal(10) 

In [None]:
np.random.poisson(0.5, 10)

Let us generate a ``ndarray`` object which is a bit more "realistic" and with which we work in what follows.

In [None]:
np.random.seed(1000)  # fix the rng seed value
data = np.random.standard_normal((5, 100))

Although this is a slightly larger array one cannot expect that the 500 numbers are indeed standard normally distributed in the sense that the first moment is 0 and the second moment is 1. However, at least this can be easily corrected.

In [None]:
data.mean()  # should be 0.0

In [None]:
data.std()  # should be 1.0

The correction is called moment matching and can be implemented with NumPy by vectorized operations. 

In [None]:
data = data - data.mean()  # correction for the 1st moment
data = data / data.std()  # correction for the 2nd moment

In [None]:
data.mean()  # now really close to 0.0

In [None]:
data.std()  # now really close to 1.0

## matplotlib

At this stage, it makes sense to introduce plotting with matplotlib, the plotting work horse in the Python ecosystem. We use matplotlib (see http://matplotlib.org) with the settings of another library throughout, namely seaborn (see http://stanford.edu/~mwaskom/software/seaborn/) &mdash; this results in a more modern plotting style.

In [None]:
import matplotlib.pyplot as plt  # import main plotting library
plt.style.use('seaborn')
import matplotlib
## set font to serif
matplotlib.rcParams['font.family'] = 'serif'

A standard plot is the *line plot*. The result of the code below is shown in the following figure.

In [None]:
plt.figure(figsize=(10, 6));  # size of figure
plt.plot(data.cumsum());  # cumulative sum over all elements 

<p style="font-family: monospace;">Standard line plot with matplotlib.</p>

Multiple lines plots are also easy to generate (see the following figure). The operator ``T`` stands for the  transpose of the ``ndarray`` object ("matrix").

In [None]:
plt.figure(figsize=(10, 6));  # size of figure
## plotting five cumulative sums as lines
plt.plot(data.T.cumsum(axis=0), label='line');
plt.legend(loc=0);  # legend in best location
plt.xlabel('data point');  # x axis label
plt.ylabel('value');  # y axis label
plt.title('random series');  # figure title

<p style="font-family: monospace;">Multiple lines plot with matplotlib.</p>

Other important plotting types are *histograms* and *bar charts*. A histogram for all 500 values of the ``data`` object is shown in the following figure. In the code, the ``flatten()`` method is used to generate a one-dimensional array from the two-dimensional one.

In [None]:
plt.figure(figsize=(10, 6));  # size of figure
plt.hist(data.flatten(), bins=30);

<p style="font-family: monospace;">Histrogram with matplotlib.</p>

In [None]:
plt.figure(figsize=(10, 6));  # size of figure
plt.bar(np.arange(1, 12) - 0.25, data[0, :11], width=0.5);

<p style="font-family: monospace;">Bar chart with matplotlib.</p>

To conclude the introduction to matplotlib consider the ordinary least squares (OLS) regression of the sample data displayed in the following figure. NumPy provides with the two functions ``polyfit()`` and ``polyval()`` convenience functions to implement OLS based on simple monomials, i.e. $x, x^2, x^3, ..., x^n$. For illustration purposes consider linear, quadratic and cubic OLS.

In [None]:
x = np.arange(len(data.cumsum()))
y = data.cumsum()
rg1 = np.polyfit(x, y, 1)  # linear OLS
rg2 = np.polyfit(x, y, 2)  # quadratic OLS
rg3 = np.polyfit(x, y, 3)  # cubic OLS

In [None]:
plt.figure(figsize=(10, 6));
plt.plot(x, y, 'r', label='data');
plt.plot(x, np.polyval(rg1, x), 'b--', label='linear');
plt.plot(x, np.polyval(rg2, x), 'b-.', label='quadratic');
plt.plot(x, np.polyval(rg3, x), 'b:', label='cubic');
plt.legend(loc=0);

<p style="font-family: monospace;">Linear, quadratic and cubic regression.

## pandas

pandas is a library with which you can manage and operate on time series data and other tabular data structures. It allows to implement even sophisticated data analytics tasks on larger data sets. While the focus lies on in-memory operations, there are also multiple options for out-of-memory (on-disk) operations. Although pandas provides a number of different data structures, embodied by powerful classes, the most oftenly used structure is the ``DataFrame`` class which resembles a typical table of a relational (SQL) database and is used to manage, for instance, financial time series data. This is what we focus on in this section.

### pandas DataFrame class

In its most basic form, a ``DataFrame`` object is characterized by an index, column names and tabular data. To make this more specific, consider the following sample data set.

In [None]:
np.random.seed(1000)
a = np.random.standard_normal((10, 3)).cumsum(axis=0)

Also, consider the following dates which shall be our index.

In [None]:
index = ['2016-1-31', '2016-2-28', '2016-3-31',
        '2016-4-30', '2016-5-31', '2016-6-30',
        '2016-7-31', '2016-8-31', '2016-9-30',
        '2016-10-31']

Finally, the column names.

In [None]:
columns = ['no1', 'no2', 'no3']

The instantiation of a ``DataFrame`` object then looks as follows.

In [None]:
import pandas as pd
df = pd.DataFrame(a, index=index, columns=columns)

A look at the new object reveals its resemblence with a typical table from a relational database (or e.g. an Excel spreadsheet).

In [None]:
df

DataFrame objects have built in a multitude of basic, advanced and convenience methods, a few of which are illustrated without much commentary below.

In [None]:
df.head()  # first five rows

In [None]:
df.tail()  # last five rows

In [None]:
df.index  # index object

In [None]:
df.columns  # column names

In [None]:
df.info()  # meta information

In [None]:
df.describe()  # typical statistics

Numerical operations are in general as easy with ``DataFrame`` objects as with NumPy ``ndarray`` objects. They are also quite close in terms of syntax.

In [None]:
df * 2  # vectorized multiplication

In [None]:
df.std()  # standard deviation by column

In [None]:
df.mean(axis=1)  # mean by index value

In [None]:
np.mean(df)  # mean via universal function

Pieces of data can be looked up via different mechanisms.

In [None]:
df['no2']  # 2nd column

In [None]:
df.iloc[0]  # 1st row

In [None]:
df.iloc[2:4]  # 3rd & 4th row

In [None]:
df.iloc[2:4, 1]  # 3rd & 4th row, 2nd column

In [None]:
df.no3.iloc[3:7]  # dot look-up for column name

In [None]:
df.loc['2016-3-31']  # row given index value

In [None]:
df.loc['2016-5-31', 'no3']  # single data point

In [None]:
df['no1'] + 3 * df['no3']  # vectorized arithmetic operations

Data selections based on Boolean operations are also a strength of pandas.

In [None]:
df['no3'] > 0.5

In [None]:
df[df['no3'] > 0.5]

In [None]:
df[(df.no3 > 0.5) & (df.no2 > -0.25)]

In [None]:
df[df.index > '2016-4-30']

pandas is well integrated with the matplotlib library which makes it really convenient to plot data stored in ``DataFrame`` objects. In general, a single method call does the trick already (see the following figure). 

In [None]:
df.plot(figsize=(10, 6));

<p style="font-family: monospace;">Line plot from pandas DataFrame.

In [None]:
df.hist(figsize=(10, 6));

<p style="font-family: monospace;">Histograms from pandas DataFrame.

### Input-Output Operations

Another strength of pandas is the exporting and importing of data to and from diverse data storage formats. Consider the case of comma separated value (CSV) files.

In [None]:
df.to_csv('data.csv')  # exports to CSV file

Let us have a look at the just saved file with basic Python functionality.

In [None]:
with open('data.csv') as f:  # open file
    for l in f.readlines():  # iterate over all lines
        print(l, end='')  # print line

Reading data from such files is also straightforward.

In [None]:
from_csv = pd.read_csv('data.csv',  # filename
                      index_col=0,  # index column
                      parse_dates=True)  # date index
from_csv.head()

However, in general you would store ``DataFrame`` objects on disk in more efficient binary formats like HDF5 (see http://hdfgroup.org). pandas in this case wraps the functionality of the PyTables library (see http://pytables.org). The constructor function to be used is ``HDFStore()``.

In [None]:
h5 = pd.HDFStore('data.h5', 'w')  # open for writing
h5['df'] = df  # write object to database
h5

Data retrieval is as simple as writing.

In [None]:
from_h5 = h5['df']  # reading from database
h5.close()  # closing the database
from_h5.tail()

In [None]:
!rm data.csv data.h5 # remove the objects from disk

### Financial Analytics Examples

When it comes to financial data, there are useful data retrieval functions available that wrap both the Yahoo! Finance and Google Finance financial data APIs. The following code reads historical daily data for the S&P 500 index and the VIX volatility index.  

In [None]:
url = 'https://hilpisch.com/vola_eikon_eod_data.csv'

In [None]:
spx = pd.DataFrame(pd.read_csv(url, index_col=0, parse_dates=True)['.SPX'])

In [None]:
spx.info()

In [None]:
vix = pd.DataFrame(pd.read_csv(url, index_col=0, parse_dates=True)['.VIX'])

In [None]:
vix.info()

In [None]:
vix.info()

Let us combine the respective ``Close`` columns into a single ``DataFrame`` object. Muliple ways are possible to accomplish this goal. 

In [None]:
## construction via join
spxvix = pd.DataFrame(spx).join(vix)
spxvix.info()

In [None]:
## construction via merge
spxvix = pd.merge(spx, vix,
                  left_index=True,  # merge on left index
                  right_index=True,  # merge on right index
                )
spxvix.info()

In a case like this, the approach via `dictionary` objects might be the best and most intuitive way.

In [None]:
## construction via dictionary object
spxvix = pd.DataFrame({'SPX': spx.values.flatten(),
                       'VIX': vix.values.flatten()},
                       index=spx.index)
spxvix.info()

Having available the merged data in a single object makes visual analysis straightforward (see the following figure).

In [None]:
spxvix.plot(figsize=(10, 6), subplots=True, color='b');

<p style="font-family: monospace;">Historical closing data for S&P 500 stock index and VIX volatility index.

pandas also allows vectorized operations on whole ``DataFrame`` objects. The following code calculates the log returns over the two columns of the ``spxvix`` object simulateneously in vectorized fashion. The ``shift()`` method shifts the data set by the number of index values as provided (in this particular by one trading day). 

In [None]:
rets = np.log(spxvix / spxvix.shift(1))
rets.head()

There is one row of log returns missing at the very beginning. This row can be deleted via the ``dropna()`` method.

In [None]:
rets = rets.dropna()
rets.head()

Consider the plot in the following figure showing the VIX log returns against the SPX log returns in a scatter plot with a linear regression. It illustrates a strong negative correlation between the two indexes. This is a central result that is replicated in the chapter about the EURO STOXX 50 stock index and the VSTOXX volatility index, respectively.

In [None]:
rets.plot(kind='scatter', x='SPX', y='VIX',
              style='.', figsize=(10, 6));
rg = np.polyfit(rets['SPX'], rets['VIX'], 1)
plt.plot(rets['SPX'], np.polyval(rg, rets['SPX']), 'r.-');

<p style="font-family: monospace;">Daily log returns for S&P 500 stock index and VIX volatility index and regression line.

Having financial time series data stored in a pandas ``DataFrame`` object makes the calculation of typical statistics straightforward.

In [None]:
ret = rets.mean() * 252  # annualized return
ret

In [None]:
vol = rets.std() * math.sqrt(252)  # annualized volatility
vol

In [None]:
(ret - 0.01) / vol  # Sharpe ratio with rf = 0.01

The *maximum drawdown*, which we only calculate for the S&P 500 index, is a bit more involved. For its calculation, we use the ``cummax()`` method which records the running, historical maximum of the time series up to a certain date. Consider the plot in the following figure which shows the S&P 500 index and the running maximum.

In [None]:
plt.figure(figsize=(10, 6));
spxvix['SPX'].plot(label='S&P 500');
spxvix['SPX'].cummax().plot(label='running maximum');
plt.legend(loc=0);

<p style="font-family: monospace;">S&P 500 stock index and running maximum value.

The maximum drawdown is the largest difference between the running maximum and the current index level &mdash; in our particular case it is 1,148.

In [None]:
adrawdown = spxvix['SPX'].cummax() - spxvix['SPX']
adrawdown.max()

The relative maximum drawdown might sometimes be a bit more meaningful. It is here a drawdown of about 34%.

In [None]:
rdrawdown = (spxvix['SPX'].cummax() - spxvix['SPX']) / spxvix['SPX'].cummax() 
rdrawdown.max()

The longest drawdown period is calculated as follows. The code below selects all those data points where the drawdown is zero. It then calculates the difference between two consecutive index values (i.e. trading dates) for which the drawdown is zero and takes the maxmimum value. Given the data set we are analyzing, the longest drawdown period is 417 days.

In [None]:
temp = adrawdown[adrawdown == 0]
periods_spx = (temp.index[1:].to_pydatetime()
             - temp.index[:-1].to_pydatetime())
periods_spx[50:60]  # some selected data points

In [None]:
max(periods_spx)

See Appendix C of Hilpisch (2018) for the handling of date-time information with Python, NumPy and pandas.

## Conclusions

This chapter introduces basic data types and structures as well as certain Python idioms needed for analyses in later chapters of the book. In addition, NumPy and the ``ndarray`` class are introduced which allow the efficient handling of and operating on (numerical) data stored as arrays. Some basic visualization techniques using the matplotlib library are also introduced. However, working with pandas and its powerful ``DataFrame`` class for tabular and time series data makes plotting a bit more convenient &mdash; only a single method call is needed in general. Using pandas and the capabilities of the ``DataFrame`` class, the chapter also illustrates by the means of some basic financial examples how to implement typical interactive financial analytics tasks.  