# Quantitative Finance with Python

### Alan Moreira, University of Rochester Simon Graduate School of Business

# Notebook 3




### Topics covered
* * *
 * Return distributions and moments of a return distribution
 * Annualization of returns
 * Log returns
 * Excess returns and risk premiums
 * Variances and Covariances


In [3]:
import pandas as pd
import numpy as np
import matplotlib.pyplot as plt
%matplotlib inline
#GlobalFinMonthly
url="https://www.dropbox.com/s/3k35mt3t57ygff2/GlobalFinMonthly.csv?dl=1"
Data = pd.read_csv(url,na_values=-99)
# tell python Date is date:
Data['Date']=pd.to_datetime(Data['Date'])
# set an an index
Data=Data.set_index(['Date'])

Data.info()

URLError: <urlopen error [WinError 10061] 由于目标计算机积极拒绝，无法连接。>

#  Distribution of Returns

Lets start by looking at the moments of the distribution of returns for the "market" portfolio

This is really the portfolio of all the stocks listed in the United States

To be more precise it the value-weighted basket of these stocks

What does value weighted mean?

# $$w_{i,t}=\frac{Price_{i,t}*NShares_{i,t}}{\sum_{i=1}^I Price_{i,t}*NShares_{i,t}}$$

What is i and I?

In [None]:
# looking at it's mean
Data['MKT'].mean()

# what does this number  mean?

In [None]:
# looking at the market standard deviation

Data['MKT'].std()

# what does it number mean?

In [None]:
# to have a sense of what that means lets look at the histgram of the distribution of retruns

Data.MKT.hist(bins=50)

# what do we see?

# # centered around the mean, the amount of variation around the center is captured by the standard deviation

In [None]:
Data.MKT.plot()

### Random variables

* We think of the return realization, the numbers plotted above, as a random variable, i.e., a variable that we are uncertain about it's realization. 

* Random variable is any thing that we don’t known

  * Outcome of dice throw, value of stock market in the end of the day, …
  * For example a dice outcome is a rando variables with the following possible outcomes: (1,2,3,4,5,6)

* This uncertianty is fully described by the probability distribution associated with the random variable.



### What is a probability distribution?

* A probability distribution describes the probability that each outcome is realized.
* It can be described by a Probability Density Function (pdf).
* For a example, the pdf of a dice is (1,1/6) ,(2,1/6), (3,1/6), (4,1/6), (5,1/6), (6,1/6)
* It can also be described by the Cumulative Density Function(cdf)
* For a example, the pdf of a dice is (1,1/6) ,(2,2/6), (3,3/6), (4,4/6), (5,5/6), (6,6/6)
* A pdf or cdf fully describes the uncertainty we have with respect to a particular random variable
* Random variable is any thing that we don’t known
* Outcome of dice throw, value of stock market in the end of the day, …


### Moments

* One way to summarize the information in a probability distribution is the moments

* Mean or expected value $E[𝑥]=∑𝑥_𝑖 𝑃𝑟𝑜𝑏(𝑥=𝑥_𝑖 )$  𝑜𝑟 $∫𝑥𝑓(𝑥)𝑑𝑥$  often use µ as symbol

* The Variance $𝑣𝑎𝑟(𝑥)= ∑(𝑥_𝑖−𝐸[𝑥])^2 𝑃𝑟𝑜𝑏(𝑥=𝑥_𝑖 )$    𝑜𝑟 $∫(𝑥_𝑖−𝐸[𝑥])^2 𝑓(𝑥)𝑑𝑥 $ 
  
  * (Standard deviation $std(x)= \sqrt{𝑣𝑎𝑟(𝑥)}$)
  * measures average variability around the mean across successive drawings of x.
  * often use 𝜎 as a symbol for standard deviation and $𝜎^2$ for variance

* Skewness $𝑠𝑘𝑒𝑤(𝑥)= ∑(𝑥_𝑖−𝐸[𝑥])^3 𝑃𝑟𝑜𝑏(𝑥=𝑥_𝑖 )$ 𝑜𝑟 $∫(𝑥_𝑖−𝐸[𝑥])^3 𝑓(𝑥)𝑑𝑥$
  
  * Measures asymmetry in the distribution

* Kurtosis $𝑘𝑢𝑟𝑡(𝑥)= ∑(𝑥_𝑖−𝐸[𝑥])^4 𝑃𝑟𝑜𝑏(𝑥=𝑥_𝑖 )$ 𝑜𝑟 $∫(𝑥_𝑖−𝐸[𝑥])^4 𝑓(𝑥)𝑑𝑥$ 

  * Measures how fat are the tails

* Higher order moments…

* Observation: with enough moments you can represent any distribution, but in practice you only need a few
  
  * for example for the normal distribution you only need the first two: Expected value and variance!

### Sample moments

* Moments are  **never** truly known in real data

* Must be always estimated from same sample of the data

* We would like to know the "population" moments, i.e., the moments that describe how the population is generated

* For example to get the expected return on the market, i.e. it's population mean, we use the sample mean

$$\overline{R_{MKT}}=\frac{\sum_{t=1}^TR_{MKT}}{T}$$

* where T is the sample size
* we also call the sample average

* Note that each observation in the sample is weighted  equally by the frequency of the **realized** observations

* For population means they are weighted by the **expected** frequency, i.e. the probabilities

### Are returns normal?

* Most of what we do does not depend on the assumption of normality

* But normal distributions are very useful in statistical tests

* And they are also not a bad approximation for return data at low frequency (monthly/year) 


### The Normal distribution

* Probability that any rando draw form a Normal distrivution random variable $\tilde{x}$ is within $n=1$ standard deviation from the mean is 0.6826


$$Prob(E[\tilde{x}]-n\sigma(\tilde{x})\leq \tilde{x}\leq (E[\tilde{x}]+n\sigma(\tilde{x}))|_{n=1}=0.6826$$

* $n=2,Prob(\cdot)=0.9550$

* it is convenient to to transform a normally distributed r.v. into units of stadard deviatio from it's mean


$$\tilde{z}=\frac{\tilde{x}-E[\tilde{x}]}{\sigma(\tilde{x})}$$

* This follow the "standard" normal distribution, which has mean 0 and and standard deviation 1 

* can you show that is indeed the case that z has mean zero and standard devaiton 1? 

* This means that the normal distribution is completely characterized by it's first two moments

* This means that the investment problem is much more tractable too!

* Only two moments to worry about:

    * The expected return of the portfolio
    
    * it's variance
    
    * The probability of really bad tail events will follow immediately from these two! 

### What is a return?

* So we know what the normal distribution is

* but what exactly is a return?

* Lets say you paid 𝑃_𝑡 in date t for a stock

* In date t+1 the price is 𝑃_(𝑡+1)  and you earn some dividend as well 𝐷_(𝑡+1)

* Then we say that your return is

$$𝑅_{𝑡+1}=\frac{𝑃_{𝑡+1}+𝐷_{𝑡+1}−𝑃_𝑡}{𝑃_𝑡}$$

* It is the gain you made, divided by how much you put in

> Our data set Data contains these returns of buying an asset, earning any distributed dividends during the month, and then selling in the end of the month.

>As we will see, the return on a portfolio is just a weighthed average of the returns of the individual assets


### How to evaluate whether returns are normal?


The standard approach is to look at higher moments.

Because the normal is entired descibed by the frist two moments, looking at these higher moments can gives us a clue if that is really true for the data at hand.

Here are a few:

* Skewness 

* Kurtosis 

* Frequency of extreme return realizations 

  * 3 sigma events should almost never happen for a normal random variable
  
  * once every 500 periods

  $$PROB(|R-E{R}|>3\sigma(R))$$


 A nice thing to do is to simulate data generated by a normal with same sample mean and standard deviation as the normal, and compare it with the actual data

In [None]:
# lets look at a simulation to see this more clearly


mu=Data.MKT.mean()
std=Data.MKT.std()
T=Data.MKT.count()
X=pd.Series(np.random.normal(mu,std,T))
Data.MKT.hist(bins=50)

X.hist(bins=50)
# what happens if we increase the mean?

# what happens if we increase the standard deviation?

# for the same standard mean and standard deviation what do you notice when you compare the real data and the simulated data?



In [None]:
# To evaluate how close a distribution is to the normal distribution we typically look at

# skewness

[Data.MKT.skew(), X.skew()]


In [None]:
# To evaluate how close a distribution is to the normal distribution we typically look at

# kurtosis
[Data.MKT.kurtosis(),X.kurtosis()]


In [None]:
threshold=3
X=pd.Series(np.random.normal(mu,std,T))
# counts for the real data
A=((Data.MKT-Data.MKT.mean())<-threshold*Data.MKT.std())
# counts for simulated data (which we know it is normal!)
B=((X-X.mean())<-threshold*X.std())

[A.sum(),B.sum()]



### Log returns (also called continously compounded) and Simple returns

Look at also at log returns

Insight: If innovations in log returns are iid then, log returns at long enough horizons must be normal (Central Limit Theorem)

$$1+𝑅_{1→𝑇}=((1+𝑅_1 )(1+𝑅_2 )…(1+𝑅_𝑇))⁡\\
ln⁡(1+𝑅_{1→𝑇} )=ln⁡((1+𝑅_1 )(1+𝑅_2 )…(1+𝑅_𝑇 ))\\
ln⁡(1+𝑅_{1→𝑇} )=ln⁡(1+𝑅_1 )+ln⁡(1+𝑅_2 )+…+ln⁡(1+𝑅_𝑇 )\\
                    𝑟_{𝑡→𝑇}=𝑟_1+𝑟_2+…+𝑟_𝑇 $$ 



* Thus, if T is large,$𝑟_{𝑡→𝑇}$ will be approximately normally distributed. (central limit theorem)

* $𝑟_𝑡=ln⁡(1+𝑅_𝑡 )$ is the rate of return assuming continuous compounding for the period t.

* This turns out to hold up well if you look at horizons longer than a month, but does not work at all at daily frequency (much fatter left tail than the normal distribution predicts)


* Log returns are also very convenient when thinking about long-term investing

In [None]:
np.log(1+Data.MKT).hist(bins=50)

In [None]:
print([np.log(1+Data.MKT).skew(),X.skew()])

print([np.log(1+Data.MKT).kurtosis(),X.kurtosis()])

How to go back and fourth log and simple returns?

Let R be a simple net return per period, i.e., if you invest 10 , you get 10(1+R) in the end of the period

R is a number like 5% (0.05)

We say 1+R, a number like 1.05 is a gross return. 

Why gross? Because it includes 1, your initial investment.

To get log returns r=log(1+R)


To get back simple returns

R=exp(r)-1

In [None]:
r=np.log(1+Data.MKT)
R=np.exp(r)-1
pd.concat([R,Data.MKT],axis=1).head()

In [None]:
threshold=2
# now real data in log returns

A=((np.log(Data.MKT+1)-np.log(Data.MKT+1).mean())<-threshold*np.log(Data.MKT+1).std())
B=((X-X.mean())<-threshold*X.std())
[A.sum(),B.sum()]

### Lecture 5 9/9 ended here

### Statistical package

We can also do the computation precisely  without using a simulation which is the way we will typically do.

For that we will import the normal density function using the SCIPY library


SCIPY
 - This is a stats package
 - It has a lot of stuff in it
 - I will talk about only the stuff we need 
 - But feel free to have fun

https://docs.scipy.org/doc/scipy/reference/

https://docs.scipy.org/doc/scipy/reference/tutorial/stats.html

In [None]:
# Here I am only importing the normal distribution

from scipy.stats import norm

# I will use the MKT sample moments to calibrate the distribution so we can compare apples to apples
mu,sigma=Data.MKT.mean(),Data.MKT.std()
# and I am creating this p object which is the standard normal distribution
p=norm(Data.MKT.mean(),Data.MKT.std())
p

Here what that looks like

In [None]:
# creating a grid to evaluate the density around the relevant range (-6 to +6 std relative to the mean)
grid=np.linspace(mu-6*sigma,mu+6*sigma,1000)

#Plot the normal density for this range 
plt.plot(grid, p.pdf(grid))

In [None]:
plt.plot(grid, p.cdf(grid))

In [None]:
threshold=3
Below=((Data.MKT-Data.MKT.mean())<-threshold*Data.MKT.std())
Above=((Data.MKT-Data.MKT.mean())>threshold*Data.MKT.std())
T=Data.MKT.count()
[Below.sum(),Above.sum()]

In [None]:
# However it is often more covenient to work with the standard normal distribution
ps=norm(0,1)
probm=ps.cdf(-threshold)
probp=1-ps.cdf(threshold)
print('The probability of such extreme tail events')
print([probm,probp])
print('The number of such extreme tail events we expect in a sample as large as ours')
print([probm*T,probp*T])


### Confidence Intervals and T-tests

We often will use the opposite operation:

* given a probability , lets say P=5%, we want to know the value X for which there is less than P probability that some reference distribution produces a value as high as X.

* Given that we observed X, P gives us the likelihood that X came from this reference distribution

* This reference distribution is our NULL-Hypothesis

* For example, lets say our null hypothesis is that the market returns are normally distributed with a certain mean and standard deviation. 

* IF you observe X, how low X would have to be for the probability that X was sampled from this reference distribution is less than, say, 1%?

* The function that answers this question is the inverse cdf which we call using .isf


In [None]:
# here for the standard normal

#So, the observation would have to be below the mean by the follow number of standard deviations

ps.isf(0.01)


In [None]:
plt.plot(grid, p.pdf(grid))

* This is a "one-sided" test.

* You can also have asked" how extreme (either way) X would have to be so that the probability that it came from a normal with such mean/standard-deviation

* This would lead to a two-sided test



In [None]:
ps.isf(0.01/2)


* If we observe such an extreme observation we would say that we can reject the reference distribution with a confidence of 99% or a pvalue of 1%.

* Of course, a formal test of whether a random variable is normally distributed for any mean and standard deviation is much more complicated and less powerful.

* So it is not really used that much with financial data in practice.

>But as practicioners is important to be aware of large deviations of normality specially at higher frequecies, daily returns, minute-to-minute returns

* We will revisit this once we work with daily data

###  The choice of frequency and Annualization of returns 

* The data that we get is structured at a particular frequency.

* For example, the data set "Data" that we have been working with is at the "monthly" frequencies

* So the returns there tell us what you would have earned if you bought a particular asset at the closing of the last Trading day of the month-say 31 of January and sold 28 of Ferbruary.

* But this is Frequency choice is entirely arbitrary since there are transactions all the time

* In this course we will work at monthly or daily frequency since that is what most practioners work with (exception of course for High Frequency trading funds)

* It also keeps it managable--as you will quickly see that the data set can get very large once you go to higher frequencies

* One could argue that monthly is too short. Most people have one year or ever multiple year investment plans so maybe it makes sense to look at the data at lower frequecies. There is metrit to this view, but you end up with much less data, so harder to make conclusive statements

* What we end up doing in the academic world and in the industry is to make out analayis at monthly frquency, and we then extraploate the results to yearly and so on.

* Now we will discuss how to do that.

* For us this will be important, because it is much easier to kepe the units at the yearly frequency in your head. So we will be frequently annualize our results just to get intution about what they mean

Standard annualization (the quick and dirty way)

* $\hat{\mu}_A=12\times\hat{\mu}_M$
* $\hat{\sigma}^2_A=12\times\hat{\sigma}^2_M$
* $\hat{\sigma}_A=\sqrt{12}\times\hat{\sigma}_M$
* $SR_A=\frac{\hat{\mu}_A}{\hat{\sigma}_A}=\sqrt{12}SR_M$

> This last one is what we call the Sharpe ratio (for William Sharpe). It is a key moment that we will be looking at and the entire financial industry is organized around this moment.
It measures how much return you get per unit of volatility

* Formulas make sense if monthly returns are i.i.d. and a sum of monthly returns (e.g. log returns)

* However, annual return are given by

$𝑅_𝐴=(1+𝑅_1 )(1+𝑅_2 )…(1+𝑅_{12})⁡−1$

* if returns were i.i.d, averages are

$$\mu_A=(1+\mu_M)^{12}-1$$

* and variances are uglier still,

$$\sigma_A^2=[\sigma^2_M+(1+\mu_M)^2]^{12}-(1+\mu_M)^{24}$$

* and this still ignores time-variation in volatility (very strong feature of the data!), auto-correlations  in returns (there is a little bit)

#### However, we will always use the standard annualization

* $\hat{\mu}_A=12\times\hat{\mu}_M$
* $\hat{\sigma}^2_A=12\times\hat{\sigma}^2_M$
* $\hat{\sigma}_A=\sqrt{12}\times\hat{\sigma}_M$
* $SR_A=\frac{\hat{\mu}_A}{\hat{\sigma}_A}=\sqrt{12}SR_M$


In [None]:


# simply multiply by number of months in a year
Data.MKT.mean()*12



In [None]:
# for standard deviation you multiply by the square root of number of periods since the varaince grows with T

Data.MKT.var()*12

In [None]:
Data.MKT.std()*12**0.5


#### If it is wrong, why we will use it?

* Because it is the standard
* good idea about annual magnitudes
* allows you to compare across assets pretty well
* easy to get t-stats from monthly data
* ok if you don't compare returns across frequencies, i.e., use annual data for real estate, and monthly for stocks. 

* It would lead to incorrect conclusions

* Lets say that you only have one asset at the yearly frequency, 

  * then you have to aggregate the monthly data set to yearly and the compute the moments

  * This makes no assumptions and it is always right







- For every year we want to compute the cumulative returns

$$r_{year}=\prod_{t \in year} (1+r_t)-1$$

In the end we want a table that looks like

year|MKT...
--|-
1997|$r_{1997}$
1998|$r_{1998}$
1999|$r_{1999}$

In [None]:
#Here we are at the same time adding 1 to the return variable, so we have a gross return which we can compound
#And also use the year of the observation to group all the year

Datayear=(Data+1).groupby(Data.index.year).prod()-1
# the last step is how we want to group
# we could want to calculate the standard devition, the mean, the man, the min, or even to apply some custom fucntion
# in our case all we need is the product

# where we are substracting 1 to get back to a net return
[Datayear.MKT.mean(),Data.MKT.mean()*12,Datayear.MKT.std(),Data.MKT.std()*12**0.5]

In [None]:
Datayear.head()

- In problem set I will always ask you about annual numbers

- You should always go for the quick and dirty annualization unless told otherwise

# Excess returns and risk-premiums


- it is convenient to decompose the return earned in terms of what you earn due 
 
 1. compensation for waiting (time-value of money)
 2. compensation for bearing risk (risk premium)
 
 
 We call an "excess return", the return minus the risk-free rate 
 
 $$R_i^e=R_i-R_f$$
 
 We typically use the returns of a 3-month treasury bill
 
So the excess return of the market is

$$R^e_{MKT}=R_{MKT}-R_f$$

i.e. how much more I would get if I invested in the market instead of a short-term risk-free U.S. treasury bond

We call the Expected difference, the risk-premium

$$E[R_i^e]=E[R_i-R_f]$$

It is how much more you expect to get by investing in asset i instead of the risk-free rate

When asset i, is the total market portfolio of US equities, we call this, the equity risky premium

Equity Risk premium

$$E[R_{MKT}^e]=E[R_{MKT}-R_f]$$


In [None]:
(Data.MKT-Data.RF).mean()

In [None]:
(Data.MKT-Data.RF).mean()*12

What does this mean?

Lets look at how much money one would have if they had invested 1 dollar in the market and kept reinvesting until the end of our sample

lets then compare with an investment in the risk-free rate

$$(1+r_1)(1+r_2)....(1+r_T)$$

In [None]:
(1+Data[['RF','MKT']]).cumprod().plot()

This means that someone that invested 1 dollar in the market in 63, would have 175 dollars today.

A tota return of 182/1-1=18,200\%

If you invested in the risk-free rate you would have

12.5/1-1= 1,150\% which is a bit above the inflation in this same period

In [None]:
(1+Data[['RF','MKT']]).cumprod().tail(1)

# Risk, Variances, and Covariances

We know how to compute variance and what they are

$$Data.MKT.std()=\sqrt{\sum_{t=1}^T\frac{(R_{MKT,t}-\overline{R_{MKT}})^2}{T}}$$

where $\overline{R_{MKT}}=\sum_{t=1}^T\frac{R_{MKT,t}}{T}$ is the sample mean.


So for each series we get a number. We oftern refer to the standard deviation as the "vol" or the volatility of an asset.

Variance and volatility have the same content, but volatility is int he same units as returns, and not square returns, so it easier (for me!) to have inution what it means.

For example, if the market has a vol of 30% per year, I know that there is a 2.5% probability that I will loose 60% of my investments by the end of the year!

So it is a great gauge of risk...at least at the portfolio level

But when thinking about a specific stock, it's volatility means very little

Unless your entire portfolio is just that stock, you don't really need to bear the stock risk--if you have 1% is a stock a stock drops 20% that is only 0.2% in your portfolio. You will not even noticed either way.


#### But what risk should you care about? What stock should be risky for you?

The great insight from Harry Markowitz was to think of risk in terms of what the stock adds given your portfolio

Just like meat can be good for you if you are not eating any meat, it is terrible if you are eating a lot of it

What investors should care about is, just like eaters, their final diet. If a given stock brings a lot of what you already have it, it will be bad for you, i.e., risky.

The way to measure this degree of commonality between your portfolio and this particular stock is the covariance


$$Cov(R_{i,t},R_{j,t})={\sum_{t=1}^T\frac{(R_{i,t}-\overline{R_{i}})(R_{j,t}-\overline{R_{j}})}{T}}$$


The Covariance matrix of a set of stocks is the matrix where cell(i,j) has the covariance between asset i and asset j:

In [None]:
# here for two assets
Data[['MKT','WorldxUSA']].cov()

note that the diagonal, cell(i,i)nhas the covariance between asset i and asset i, which is just the variance of asset

In [None]:
Data[['MKT','WorldxUSA']].var()

In [None]:
# here for all the assets except the risk-free rate

Data.drop('RF',axis=1).cov()


Another way of looking at this is the correlation matrix, which normalizes the covariances by the volatilities in each asset:

$$Corr(R_{i,t},R_{j,t})={\frac{Cov(R_{i,t},R_{j,t})}{\sqrt{Var(R_{i,t})Var(R_{j,t})}}}$$

In [None]:
Data.drop('RF',axis=1).corr()

* What is noteworthy about these relationships?

* What is safer for an US investor US bond portfolio or World bond portfolio?

* What is safer for an international investor US bond portfolio or World bond portfolio?

* Why did I drop the risk-free rate to make these calculations?

* Why did I drop RF, the risk-free rate, to compute the correlations? 

* In what sense the rate of return of the risk-free rate is different from the rate of return of these other assets?
