<img style="float: right;" width="120" src="../Images/supplier-logo.png">
<img style="float: left; margin-top: 0" width="80" src="../Images/client-logo.png">
<br><br><br>

# Synopsis

Some of the following content is quite advanced

Do not panic about this, just be aware that there are some very fast and efficient libraries available for number crunching using python.

This notebook will explain the following topics and concepts:

- **Built in Statistical Functions** 

- **Correlation & Covariance**

- **Function application**
  - Applying a function to rows of a DataFrame

- **Common Front Office Calculations**
  - Normalized prices
  - the log of returns
  - Daily Percentage Change
  - Cumulative returns
  - macd - Moving Average Convergence/Divergence

- **Measuring Performance**
  - the %timeit magic
  - numpy and numexpr



# Built in Statistical Functions

The following functions can all be applied to a Series.

As a column is a Series, they can all be applied to a column or columns of a DataFrame or even an entire DataFrame

- Simple Functions
- Accumulators
- General Purpose Functions

In [None]:
# Load the pandas library
import pandas as pd

## Simple Functions


- count() 
- min() 
- max() 
- sum() 
- mean()
- median() 
- std() 
- describe()

In [None]:
# Create a demonstration Series and call some of it's aggregation functions
tmp = pd.Series([13, 2, 4, 24, 9, 25, 6, 50])

# Use print to display the result of each function

print('count:', tmp.count())
print('min:', tmp.min())
print('max:', tmp.max())
print('sum:', tmp.sum())
print('mean:', tmp.mean())
print('median:', tmp.median())
print('std:', tmp.std())

In [None]:
# describe() gives a number of statistical values in one function
tmp.describe()

## accumulators

- cumsum()
- cummin()
- cummax()
- cumprod()

In [None]:
# Use print to display the result of each function ('\n' inserts a new line for readability)

print('cumsum:\n', tmp.cumsum())
print('\ncummin:\n', tmp.cummin())
print('\ncummax:\n', tmp.cummax())
print('\ncumprod:\n', tmp.cumprod())

## General purpose Functions

There are also a few general purpose functions

- diff()  - difference between adjacent values
- pct_change() - percentage change between adjacent values
- idxmin() - numerical index of minimum value in series (Series begin at index 0)
- idxmax() - numerical index of maximum value in series
- skew()
- kurt()
- quantile()

In [None]:
# Use print to display the result of each function ('\n' inserts a new line for readability)

print('diff:\n', tmp.diff())
print('\npct_change:\n', tmp.pct_change())
print('\nidxmin:', tmp.idxmin())
print('idxmax:', tmp.idxmax())
print('skew:', tmp.skew())
print('kurt:', tmp.kurt())
print('quantile:', tmp.quantile())

# Common Front Office Calculations


**Common Front Office Calculations**
- Normalized prices
- the log of returns
- Daily Percentage Change
- Cumulative returns
- macd - Moving Average Convergence/Divergence

## Load in the Data

In [None]:
# Load the libraries we'll use
import pandas as pd
import numpy as np
import matplotlib.pyplot as plt

# Load in the famous FANG stocks, make sure the index is the Date and it's sorted ascending
df_FB = pd.read_excel(io='../Data/market_data.xls', sheet_name='FB', parse_dates=True, index_col='Date')
df_AMZN = pd.read_excel(io='../Data/market_data.xls', sheet_name='AMZN', parse_dates=True, index_col='Date')
df_AAPL = pd.read_excel(io='../Data/market_data.xls', sheet_name='AAPL', parse_dates=True, index_col='Date')


## Visualizing Returns

### 1 - Look at the  Closing Prices

In [None]:
df = pd.DataFrame()

df['FB'] = df_FB['Close']

df['FB'].plot(grid=True)

### 2 - Look at Normalized Prices

The difference between price(t0) and price (t+1)

This is the same as cumulative daily returns

In [None]:
df['NormdP'] = df['FB']/df.iloc[0]['FB']

df['NormdP'].plot(grid=True)

### 3 - Look at the Log of the Daily Returns

- When calculating the return of an investment or position, an accumulation of the log of daily returns is used.

- This allows a direct comparison to be made between different instruments

- When backtesting technical analysis you will be employing this measure to compare a simple trading strategy against market performance.

- This is a very simple value to arrive at

- log (price / price(t-1))

- Use a combination of np.log and the time shift functions

- Use the **cumsum()** function to arrive at the payoff

- Where there is a choice between adjusted and unadjusted, use the Adjusted values (e.g. AdjOpen, AdjVolume, etc.)

In [None]:
# Returns for Facebook
# Have the Adjusted Close so use that
df['Log Returns'] = np.log(df['FB'] / df['FB'].shift(1))

# Plot returns for both for a direct comparison
# applying the exponential function to the accumulator
# Very very common in financial analysis
df['Log Returns'].cumsum().apply(np.exp).plot(grid=True)

### 4 - Look at Cumulative Returns

The formula for a cumulative daily return is:

$ i_i = (1+r_t) * i_{t-1} $

Reached by multiplying previous investment at i at t-1 by 1+ percent returns. <BR>
Easy to calculate using pandas with its `cumprod()` method. 


In [None]:
df['Cumulative Return'] = (1 + df['Pct Chg']).cumprod()

df.head()


df['Cumulative Return'].plot(grid=True)

## Daily Percentage Change

Defined by the following formula: $ r_t = \frac{p_t}{p_{t-1}} -1$ <BR>
    
>
> The percent gain (or loss) if you bought the stock on day and then sold it the next day. <BR>
> Very useful in analyzing the volatility of the stock. <BR>
> A wide distribution implies the stock is more volatile from one day to the next<BR>
>
    
2 Methods
- Use `shift()`
- Use built in `pct_change()`

In [None]:
# Using Shift
df['R_t'] = (df['FB'] / df['FB'].shift(1) ) - 1

df['R_t'].plot(grid=True)

In [None]:
# Using pct_change
df['Pct Chg'] = df['FB'].pct_change()

df['Pct Chg'].plot(grid=True)

## Visualize returns

In [None]:
df_Returns = pd.DataFrame()

df_Returns['FB'] = df_FB['Close'].pct_change()
df_Returns['AMZN'] = df_AMZN['Close'].pct_change()
df_Returns['AAPL'] = df_AAPL['Close'].pct_change()


### Plot a histogram of the returns

In [None]:
df_Returns['FB'].hist(bins=50)

In [None]:
df_Returns['AAPL'].hist(bins=50)

### Stack the returns on top of each other

In [None]:
num_bins = 100

df_Returns['FB'].hist(bins=num_bins, label='FB', figsize=(10,8), alpha = 0.5)
df_Returns['AMZN'].hist(bins=num_bins, label='AMZN', figsize=(10,8), alpha = 0.5)
df_Returns['AAPL'].hist(bins=num_bins, label='AAPL', figsize=(10,8), alpha = 0.5)


plt.legend()

### Insert a KDE


In [None]:
df_Returns['FB'].plot(kind='kde', label='FB', figsize=(10,8))
df_Returns['AMZN'].plot(kind='kde', label='AMZN', figsize=(10,8))
df_Returns['AAPL'].plot(kind='kde', label='AAPL', figsize=(10,8))

plt.legend()

### Box Plots

In [None]:
df_Returns.plot(kind='box', figsize=(8,11), colormap='coolwarm')

## MACD

- Turns two moving averages into a momentum oscillator by subtracting the longer moving average from the shorter moving average. 
- Results in the best of both worlds: trend following and momentum.
- MACD formula : (12-day EMA - 26-day EMA)
- Uses the pandas ewma function (exponentially weighted moving average)

In [None]:
# Facebook 2017
df_MACD = pd.DataFrame()
df_MACD['26 ewm'] = df_FB['2017']['Close'].ewm(span=26).mean()
df_MACD['12 ewm'] = df_FB['2017']['Close'].ewm(span=12).mean()
df_MACD['MACD'] = df_MACD['12 ewm'] - df_MACD['26 ewm']

# Correlation and Co-Variance

- Pandas has some convenient built-ins for calculating these.

- We'll Use some previous datasets for demonstration.

- Calculate the correlation and covariance between the daily percentage change of the Adjusted Close price of FANG Stocks and Gold Futures.

- Display the correlation

- Calculate the covariance of the same data

In [None]:
df_CORR = pd.DataFrame()

df_CORR['Facebook'] = df_FB['Close'].pct_change()
df_CORR['Apple'] = df_AAPL['Close'].pct_change()
df_CORR['Amazon'] = df_AMZN['Close'].pct_change()

df_CORR.head()

## Calculate correlation and covariance

- Use the **corr()** function
- Use the **cov()** function



In [None]:
df_CORR.corr()

# OR for a more recent correlation
df_CORR['2017'].corr()

# Covariance
df_CORR['2017'].cov()


## Use a scatter plot to display a visual of correlation


In [None]:
from pandas.plotting import scatter_matrix
p = scatter_matrix(df_CORR['2017'], alpha=0.9, hist_kwds={'bins':50}, figsize=(18,6))

## Rolling Correlations

In [None]:
## Rolling Correlations

ax = df_CORR['Facebook'].rolling(window=252).corr(df_CORR['Apple']).plot(figsize=(10, 6))  

# This line shows the corralation of Facebook and Apple over the entire time period
# Note how the rolling correlation is much more telling


ax.axhline(df_CORR.corr().iloc[0, 1], c='r');  

## Rolling Covariances

In [None]:
## Rolling Covarianbce

ax = df_CORR['Facebook'].rolling(window=252).cov(df_CORR['Apple']).plot(figsize=(10, 6))  

# This line shows the covariance of Facebook and Apple over the entire time period
# Note how the rolling covariance is much more telling


ax.axhline(df_CORR.cov().iloc[0, 1], c='r');  

# Applying functions to Series and DataFrames

- You can easily apply arbitrary functions to DataFrames.

- Use the **apply()** function

- This method can be used to apply a function to a Series, Column, Columns or an entire DataFrame



In [None]:
# Apply np.sqrt (square root) to the Close Column of FB
df = df_FB.copy()
df['Close'].apply(np.sqrt)

# Or apply np.cumsum to a set of columns for the year 2017
cols = ['Open', 'High', 'Low', 'Close']
df['2015':][cols].apply(np.cumsum)


# Measuring Performance 

Quite often, there are a number of ways of using python to accomplish a particular task.

The choice of which to use is often a factor of complexity, personal preferences and simplicity.

Occasionally the performance of a particular approach is the main consideration.

A fairly common task in financial analysis is to evaluate complex mathematical expressions on large arrays of numbers.

Use the %timeit 'magic' to time how long a routine takes

e.g. 

\begin{eqnarray*}
\huge 3log(x) + cos(x)^2
\end{eqnarray*}

## First Attempt - Use pure python 

In [None]:
from math import *

loops = 2500000

a = range(1,loops)

def f(x):
    return 3 * log(x) + cos(x) ** 2

%timeit r = [f(x) for x in a]


## Second Attempt - Use numpy

The same task can be performed using numpy, which has precompiled functions to handle such operations

In [None]:
import numpy as np

loops = 2500000

a = range(1,loops)

%timeit r = 3 * np.log(a) + np.cos(a) ** 2

## Third attempt - Use numexpr

To improve even further, use numexpr, short for numerical expressions.

This library compiles expressions for even better performance.

In [None]:
import numexpr as ne
ne.set_num_threads(1)

loops = 2500000

a = range(1,loops)

f = '3 * log(a) + cos(a) ** 2'

%timeit r = ne.evaluate(f)

Note the performance improvements

In [3]:
import pandas as pd

df = pd.DataFrame()

In [4]:
df.corr?