# 5th Lab of the _"Investment-Management-with-Python-and-Machine-Learning-Specialization"_.

**TOPIC:** Deviations from Normality.

There are two very different ways of being volatile for a stocks:
1. upside volatility, which is good;
2. downside volatility, which means losses for the investor.

In order to forecast extreme downside, we introduce the **VaR** (_Value at Risk_) concept: it represents the maximum expected loss over a given time period.
Beyond VaR there still is meaningfull information, which is expressed through the **CVaR** (_Conditional Value at Risk_) that is the expected loss beyond VaR. It simply is the average of the distribution beyond the VaR.

In [1]:
import pandas as pd
import edhec_risk_kit as erk

%load_ext autoreload
%autoreload 2

In [2]:
hfi = erk.get_hfi_returns()
hfi.head()

  hfi = pd.read_csv(r'C:\Users\User\Desktop\Python\IM with Python\data\edhec-hedgefundindices.csv',


Unnamed: 0_level_0,Convertible Arbitrage,CTA Global,Distressed Securities,Emerging Markets,Equity Market Neutral,Event Driven,Fixed Income Arbitrage,Global Macro,Long/Short Equity,Merger Arbitrage,Relative Value,Short Selling,Funds Of Funds
date,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1,Unnamed: 7_level_1,Unnamed: 8_level_1,Unnamed: 9_level_1,Unnamed: 10_level_1,Unnamed: 11_level_1,Unnamed: 12_level_1,Unnamed: 13_level_1
1997-01,0.0119,0.0393,0.0178,0.0791,0.0189,0.0213,0.0191,0.0573,0.0281,0.015,0.018,-0.0166,0.0317
1997-02,0.0123,0.0298,0.0122,0.0525,0.0101,0.0084,0.0122,0.0175,-0.0006,0.0034,0.0118,0.0426,0.0106
1997-03,0.0078,-0.0021,-0.0012,-0.012,0.0016,-0.0023,0.0109,-0.0119,-0.0084,0.006,0.001,0.0778,-0.0077
1997-04,0.0086,-0.017,0.003,0.0119,0.0119,-0.0005,0.013,0.0172,0.0084,-0.0001,0.0122,-0.0129,0.0009
1997-05,0.0156,-0.0015,0.0233,0.0315,0.0189,0.0346,0.0118,0.0108,0.0394,0.0197,0.0173,-0.0737,0.0275


We are now going to evaluate the **skewness** and **kurtosis** of this dataset.

One quick way of measuring skewness is by assessing mean and median and comparing their values. 
If in fact **median < mean**, it means that **data** is **negatively skewed**. In order to make such a comparison, we can use the **pd.concat()** function:

In [9]:
pd.concat([hfi.mean(), hfi.median(), hfi.median()<hfi.mean()], axis ="columns")

Unnamed: 0,0,1,2
Convertible Arbitrage,0.005508,0.0065,False
CTA Global,0.004074,0.0014,True
Distressed Securities,0.006946,0.0089,False
Emerging Markets,0.006253,0.0096,False
Equity Market Neutral,0.004498,0.0051,False
Event Driven,0.006344,0.0084,False
Fixed Income Arbitrage,0.004365,0.0055,False
Global Macro,0.005403,0.0038,True
Long/Short Equity,0.006331,0.0079,False
Merger Arbitrage,0.005356,0.006,False


As we can see from the boolean column, CTA Global, Global Macro and Short Selling returns are positively skewed. Skewness is defined as
$$ S(R)=\frac{E[(R-E(R)^3)]}{\sigma_R^3} $$

Note that there's a built-in function in scipy.stats which computes skewness: **scipy.stats.skew()**.
We are now going to use both this built-in function and the one that we created in _erk_.

In [3]:
erk.skewness(hfi).sort_values()

Fixed Income Arbitrage   -3.940320
Convertible Arbitrage    -2.639592
Equity Market Neutral    -2.124435
Relative Value           -1.815470
Event Driven             -1.409154
Merger Arbitrage         -1.320083
Distressed Securities    -1.300842
Emerging Markets         -1.167067
Long/Short Equity        -0.390227
Funds Of Funds           -0.361783
CTA Global                0.173699
Short Selling             0.767975
Global Macro              0.982922
dtype: float64

In [5]:
import scipy.stats
scipy.stats.skew(hfi)

array([-2.63959223,  0.17369864, -1.30084204, -1.16706749, -2.12443538,
       -1.40915356, -3.94032029,  0.98292188, -0.39022677, -1.32008333,
       -1.81546975,  0.76797484, -0.36178308])

**KURTOSIS** instead is defined as
$$K(R)=\frac{E[(R-E(R))^4]}{\sigma_R^4} $$
We are now going to compute it again through **scipy.stats.kurtosis()** and the bulit-in function in _erk_.

In [3]:
erk.kurtosis(hfi).sort_values()

CTA Global                 2.952960
Long/Short Equity          4.523893
Global Macro               5.741679
Short Selling              6.117772
Funds Of Funds             7.070153
Distressed Securities      7.889983
Event Driven               8.035828
Merger Arbitrage           8.738950
Emerging Markets           9.250788
Relative Value            12.121208
Equity Market Neutral     17.218555
Convertible Arbitrage     23.280834
Fixed Income Arbitrage    29.842199
dtype: float64

In [6]:
scipy.stats.kurtosis(hfi)

array([20.28083446, -0.04703963,  4.88998336,  6.25078841, 14.21855526,
        5.03582817, 26.84219928,  2.74167945,  1.52389258,  5.73894979,
        9.12120787,  3.11777175,  4.07015278])

Note that _erk.kurtosis_ outputs the actual kurtosis of the data and _scipy.stats.kurtosis()_ instead outputs the **excess kurtosis** which is $$(ACTUAL \;\; KURTOSIS)-3$$ 
3 is the expected kurtosis of a normal distribution.

Kurtosis and skewness aren't good analytical indexes to state whetether returns are or aren't normally distributed. In order to assess if **data are normally distributed or not** we have to execute the **JARCQUE BERA TEST**.

In [7]:
scipy.stats.jarque_bera(hfi)

Jarque_beraResult(statistic=25656.585999171326, pvalue=0.0)

As you can see, the function is applied to the whole dataset but we would like to apply it to the columns separately. In order to do so we can use the pandas function **.aggregate(function)** which applies the function on every column.

In [3]:
hfi.aggregate(erk.is_normal)

Convertible Arbitrage     False
CTA Global                 True
Distressed Securities     False
Emerging Markets          False
Equity Market Neutral     False
Event Driven              False
Fixed Income Arbitrage    False
Global Macro              False
Long/Short Equity         False
Merger Arbitrage          False
Relative Value            False
Short Selling             False
Funds Of Funds            False
dtype: bool

We can compute skewness, kurtosis and test whether Small Cap and Large Cap returns are normally distributed or not.

In [4]:
ffme = erk.get_ffme_returns()
erk.skewness(ffme)

Small Cap    4.410739
Large Cap    0.233445
dtype: float64

In [5]:
erk.kurtosis(ffme)

Small Cap    46.845008
Large Cap    10.694654
dtype: float64

In [6]:
ffme.aggregate(erk.is_normal)

Small Cap    False
Large Cap    False
dtype: bool