In [None]:
#-*- coding: utf-8 -*-

<img align="right" style="max-width: 200px; height: auto" src="cfds_logo.png">

###  Lab 01 - "Exemplary Financial Data Science Process"

AZEK Seminar on the "Introduction to Financial Data Science", Dec. 5th, 2019, Zurich, Switzerland 

In this introductory lab, we create our first **financial data science process**. The main objective of this lab is to walk you through the general process of implementing and evaluating a simple **trend-following** trading strategy. To achieve this, we will follow the distinct process steps as outlined below:

<img align="middle" style="max-width: 600px; height: auto" src="fds_process.png">

As always, pls. don't hesitate to ask all your questions either during the lab or send us an email via marco.schreyer@fds.ai or damian.borth@fds.ai.

### Lab Objectives:

After today's lab you should be able to:
    
> 1. Implement a **trend-following trading strategy** and apply it to distinct financial instruments.
> 2. Convert the trading strategy results into **trade signals** to be used in backtest.
> 3. Understand how to use the **python backtesting bt** library to backtest the implemented strategy.
> 4. Interpret the backtests results using the distinct **backtest performance** measures.

Before we start let's watch a motivational video:

In [None]:
from IPython.display import YouTubeVideo
# Nvidia GTC 2017: "I Am AI" Opening in Keynote"
# YouTubeVideo('SUNPrR4o5ZA', width=800, height=600)

### Step 0: Setup of the Analysis Environment

Prior to importing the needed lab libraries, let's install the `quandl`, `bt`, and update the `pandas`
library which are not available in the off the shelf MS Azure instance. This can be accomplished, as always, by using the Python packacke installer `pip`:

In [None]:
!pip install pandas==0.25.3 --user
!pip install quandl==3.4.1 --user
!pip install bt==0.2.7 --user

We need to import a couple of Python libraries that allow for data analysis and data visualization. In this lab will use the pandas, numpy, quandl, bt and the matplotlib library. Let's import the libraries by the execution of the statements below:

In [None]:
# import python utility libraries
import os as os
import datetime as dt

# import python data science libraries
import pandas as pd
import numpy as np

# import the quandl financial data library
import quandl as ql

# import the Python bt backtest library
import bt as bt

# import the matplotlib library
import matplotlib.pyplot as plt

Let's also set a couple of general plot parameters:

In [None]:
# set general plot parameters
plt.style.use('seaborn')
plt.rcParams['figure.figsize'] = [10, 5]
plt.rcParams['figure.dpi']= 150

Enable inline Jupyter notebook plotting:

In [None]:
%matplotlib inline

Suppress potential warnings due to recent library enhancements:

In [None]:
import warnings
warnings.filterwarnings('ignore')

Create a **dataset** sub-folder that we will use to store the financial data downloaded:

In [None]:
if not os.path.exists('./datasets'): os.makedirs('./datasets')

### Step 1: Aquire the Financial Data

Init the "Quandl" financial data download API and set the corresponding API key. If you haven't registered for a quandl API key yet: Visit the quandl webpage (https://www.quandl.com/) and sign up for an **Academic** account (indicating that you will use the data solely in an academic context). 

Once you did register, you will receive an API key that can be used to download financial data provided by quandl.

In [None]:
#ql.ApiConfig.api_key = # '<your API key here>'
ql.ApiConfig.api_key = 'xn6g-K_ebmMgSJRTCSUk'

Let's now specify the start date and end date of the data download. We aim to download market price data starting from the **31.12.1990** until the **31.12.2017** in order to extensively test the momentum trading strategy that we aim to implement:

In [None]:
start_date = dt.datetime(1990, 12, 31)
end_date = dt.datetime(2017, 12, 31)

Download the **daily** "International Business Machines" (IBM) market data of the defined timeframe using the Quandl API:

In [None]:
# download ibm market data
ibm_data = ql.get('WIKI/IBM', start_date=start_date, end_date=end_date, collapse='daily')

### Step 2: Pre-Process the Financial Data

Inspect the top 10 records of the "IBM" data downloaded:

In [None]:
ibm_data.head(10)

Visually adjusted inspect the **adjusted closing price** of the downloaded "IBM" data:

In [None]:
plt.rcParams['figure.figsize'] = [15, 5]
fig = plt.figure()
ax = fig.add_subplot(111)

# plot ibm stock quarterly closing prices
ax.plot(ibm_data.index, ibm_data['Adj. Close'], color='#9b59b6')

# rotate x-ticks
for tick in ax.get_xticklabels():
    tick.set_rotation(45)

# set axis labels
ax.set_xlabel('[time]', fontsize=10)
ax.set_xlim([pd.to_datetime('01-01-1990'), pd.to_datetime('31-12-2017')])
ax.set_ylabel('[stock closing price]', fontsize=10)

# set plot title
plt.title('International Business Machines Corporation (IBM) - Historical Stock Prices', fontsize=10);

Save the downloaded "IBM" data to the local directory:

In [None]:
ibm_data.to_csv('./datasets/ibm_data_1990_2017_daily.csv', sep=';', encoding='utf-8')

### Step 3: Data Analysis - Moving Average Crossover Strategy Implementation

Let's implement a simple **Moving Average Crossover** trading strategy. In general, **crossover trading** refers to the idea that changes of market situations can be determined based on price “breakouts”. A crossover can be interpreted as another measure of a financial instruments momentum. In the past crossover signals have been extensively used to determine that it’s time to either buy or sell the underlying asset.

The price crossover signals of a simple **Moving Average Crossover** trading strategy are triggered by the following events:

>- Generate a **short** trading signal once the price of a financial instrument drops below the general price trend, e.g., 100-days moving average band ("Sell Sign Crossover", left image below). 
>- Generate a **long** trading signal once the price of a financial instrument exceeds the general price trend, e.g., 100-days moving average band ("Buy Sign Crossover", right image below).

<img align="middle" style="max-width: 800px; height: auto" src="crossovertrading.png">

An enhancement of the **Moving Average Crossover** is to apply two moving averages to a chart: one long running moving average (e.g., a 200-days SMAV) and one short running moving average (e.g., 20-days SMAV). Once the short running moving average crosses above the long running moving average a **Buy** or **Long** signal is triggered, as it indicates that the trend is shifting up (this is known as a "golden cross"). On the other hand, when the short running moving average crosses below the long running moving average, a **Sell** or **Short** signal is triggered, as it indicates that the trend is shifting down (his is known as a "dead/death cross")

Let's start implementing this enhanced trading strategy by setting the distinct moving average window sizes that specify the number of historical daily adjusted closing prices of the IBM stock to be considered in the calculation of the rolling moving average:

In [None]:
cross_mav_days_15 = 15  # set "fast" short-running moving average indicator lookback, days = 15
cross_mav_days_60 = 60  # set "slow" short-running moving average indicator lookback, days = 60
cross_mav_days_200 = 200  # set "trend" long-running moving average indicator lookback, days = 200

Calculate the rolling moving averages of window sizes: 15 days, 50 days and 200 days. In general the **"Simple Moving Average (SMAV)"** of a financial instrument $i$ (e.g., a stock, commodity, fx-rate) is defined as the mean of the previous $n$ prices, formally denoted by: 


$$SMA_{i}(t)=\frac{1}{n} \sum_{k=0}^{n-1} p_{i}(t-k)$$

were $t$ denotes the current point in time and $n$ the lookback.

We can calculate the SMAV by just using the Pandas `rolling()` and `mean()`function:

In [None]:
cross_mav_15 = pd.Series(ibm_data['Adj. Close'].rolling(window = cross_mav_days_15).mean(), name = 'SMAV_15')
cross_mav_60 = pd.Series(ibm_data['Adj. Close'].rolling(window = cross_mav_days_60).mean(), name = 'SMAV_60')
cross_mav_200 = pd.Series(ibm_data['Adj. Close'].rolling(window = cross_mav_days_200).mean(), name = 'SMAV_200')

Merge the rolling moving average values with the original to market data (adjusted closing prices):

In [None]:
cross_mav_ibm_data = ibm_data.join(cross_mav_15)
cross_mav_ibm_data = cross_mav_ibm_data.join(cross_mav_60)
cross_mav_ibm_data = cross_mav_ibm_data.join(cross_mav_200)

Inspect and validate the daily adjusted closint prices of the IBM stock as well as the derived moving average values starting from the first obtained 200-day moving average market price:

In [None]:
cross_mav_ibm_data[['Adj. Close', 'SMAV_15', 'SMAV_60', 'SMAV_200']].iloc[200:210]

Plot the historical daily adjusted closing prices of the IBM stock (blue) as well as its 15 days (green), 60 days (red) as well as 200 days (yellow) rolling moving averages:

In [None]:
plt.rcParams['figure.figsize'] = [15, 5]
fig = plt.figure()
ax = fig.add_subplot(111)

# plot adjusted closing prices and moving averages
ax.plot(cross_mav_ibm_data['Adj. Close'], lw=1.0, color='#9b59b6', label='Closing Prices (purple)')
ax.plot(cross_mav_ibm_data['SMAV_15'], color='C1',lw=1.0, label='15-day MAV (green)')
ax.plot(cross_mav_ibm_data['SMAV_60'], color='C1',lw=1.0, label='60-day MAV (red)')
ax.plot(cross_mav_ibm_data['SMAV_200'], color='C4', lw=1.0, label='200-day MAV (yellow)')

# rotate x-tick labels
for tick in ax.get_xticklabels():
    tick.set_rotation(45)
    
# set axis labels
ax.set_xlabel('[time]', fontsize=10)
ax.set_xlim([pd.to_datetime('01-01-1990'), pd.to_datetime('31-12-2017')])
ax.set_ylabel('[market price]', fontsize=10)

# set plot legend
plt.legend(loc="upper left", numpoints=1, fancybox=True)

# set plot title
plt.title('International Business Machines Corporation (IBM) - Daily Historical Stock Closing Prices', fontsize=10);

### Step 4: Moving Average Crossover Signal Generation

Derive trading signals from of two distinct moving average crossover trading strategy configurations. We will generate a **long-signal** (+1.0) for the time intervals where the fast moving averages are above the 200-day moving average. In addition we generate a **short-signal** (-1.0) for the time intervals where the fast moving averages are below the 200-day moving average:

In [None]:
# create 'fast' trend-following signals
cross_mav_ibm_data['SIGNAL_15'] = 0.0
cross_mav_ibm_data.loc[cross_mav_ibm_data['SMAV_15'] > cross_mav_ibm_data['SMAV_200'], 'SIGNAL_15'] = 1.0
cross_mav_ibm_data.loc[cross_mav_ibm_data['SMAV_15'] < cross_mav_ibm_data['SMAV_200'], 'SIGNAL_15'] = -1.0

# create 'slow' trend-following signals
cross_mav_ibm_data['SIGNAL_60'] = 0.0
cross_mav_ibm_data.loc[cross_mav_ibm_data['SMAV_60'] > cross_mav_ibm_data['SMAV_200'], 'SIGNAL_60'] = 1.0
cross_mav_ibm_data.loc[cross_mav_ibm_data['SMAV_60'] < cross_mav_ibm_data['SMAV_200'], 'SIGNAL_60'] = -1.0

In addition, let's also prepare a backtest of a **"baseline"** in terms of a simple **buy-and-hold** trading strategy for comparison purposes. Our buy-and-hold strategy sends a "long" (+1.0) signal for each time step: 

In [None]:
cross_mav_ibm_data['SIGNAL_BASE'] = 1.0

Prepare the trading signal data to be utilized in backtesting the long-/short-term moving-average trading strategy:

In [None]:
# convert signals to Pandas DataFrame
cross_mav_ibm_signal_data = pd.DataFrame(cross_mav_ibm_data[['SIGNAL_15', 'SIGNAL_60', 'SIGNAL_BASE']], columns=['SIGNAL_15', 'SIGNAL_60', 'SIGNAL_BASE'])

# convert pandas DataFrame index to datatype: datetime
cross_mav_ibm_signal_data = cross_mav_ibm_signal_data.set_index(pd.to_datetime(ibm_data.index))

Inspect top 10 rows of the prepared trading signals:

In [None]:
cross_mav_ibm_signal_data.head(10)

Inspect some of the exemplary signal deviations between the 15-days and 60-days crossover moving average trading strategies:

In [None]:
cross_mav_ibm_signal_data[cross_mav_ibm_signal_data['SIGNAL_15'] != cross_mav_ibm_signal_data['SIGNAL_60']].head(10)

Visualize the prepared trading signals:

In [None]:
plt.rcParams['figure.figsize'] = [15, 5]
fig, ax = plt.subplots(ncols=1, nrows=3)

ax[0].plot(cross_mav_ibm_signal_data['SIGNAL_15'], lw=1.0, color='C2', label='SMAV 16 (red)')
ax[1].plot(cross_mav_ibm_signal_data['SIGNAL_60'], lw=1.0, color='C1', label='SMAV 60 (green)')
ax[2].plot(cross_mav_ibm_signal_data['SIGNAL_BASE'], lw=1.0, color='C3', label='BASE (purple)')
    
# set axis labels
plt.xlabel('[time]', fontsize=10)
ax[0].set_xlim([pd.to_datetime('01-01-1990'), pd.to_datetime('31-12-2017')])
ax[0].set_ylabel('[smav 15 signal]', fontsize=10)
ax[1].set_xlim([pd.to_datetime('01-01-1990'), pd.to_datetime('31-12-2017')])
ax[1].set_ylabel('[smav 60 signal]', fontsize=10)
ax[2].set_xlim([pd.to_datetime('01-01-1990'), pd.to_datetime('31-12-2017')])
ax[2].set_ylabel('[base signal]', fontsize=10)

for tick in ax[0].get_xticklabels():
    tick.set_rotation(45)
    
for tick in ax[1].get_xticklabels():
    tick.set_rotation(45)
    
for tick in ax[2].get_xticklabels():
    tick.set_rotation(45)

# set plot title
ax[0].set_title('International Business Machines Corporation (IBM) - 15 days Crossover Moving Average Trading Signals', fontsize=10)
ax[1].set_title('International Business Machines Corporation (IBM) - 60 days Crossover Moving Average Trading Signals', fontsize=10)
ax[2].set_title('International Business Machines Corporation (IBM) - Baseline Moving Average Trading Signals', fontsize=10)

# reset plot layout
plt.tight_layout()

### Step 5: Moving Average Crossover Signal Backtest

Prepare the market data to be utilized in backtesting the crossover moving average trading strategy configurations:

In [None]:
# extract the ibm stock closing prices
ibm_market_data = pd.DataFrame(ibm_data['Adj. Close'], columns=['Adj. Close'])

# rename the 'close' column to 'ibm' (since this is the column we want to allocate to in the backtest)
ibm_market_data = ibm_market_data.rename(columns={'Adj. Close': 'IBM'})

# convert pandas DataFrame index to datatype: datetime
ibm_market_data = ibm_market_data.set_index(pd.to_datetime(ibm_data.index))

Inspect top 10 rows of the prepared market data:

In [None]:
ibm_market_data.head(10)

Implementing a simple Moving Average Trading Strategy by interfacing the Python bt's Algo class:

In [None]:
class MovingAverageStrategy(bt.Algo):
    
    # inits the strategy
    def __init__(self, signals):
        
        # set class signals
        self.signals = signals
        
    # calss the trading strategy
    def __call__(self, target):
        
        # case: current timestep in signals
        if target.now in self.signals.index[1:]:
            
            # get actual signal
            signal = self.signals.ix[target.now]
            
            # set target weights according to signal
            target.temp['weights'] = dict(IBM=signal)
            
        # return 'True' since we want to move on to the next timestep
        return True

Define the moving average trading strategy backtest algorithm stack. 

**Note:** That in the Python `bt` library a trading strategy usually consists of a so-called **stack of algorithms**. For each timestep of our backtest timeframe, the `bt` library executes all algorithm of the stack in sequential order. Each moving average strategy we aim to design and backtest consists in total of three algorithms, briefly described in the following: 

> 1. `bt.algos.SelectAll()`: Selects all available stocks for trading except stock prices that correspond to NaN or 0.00.  
> 2. `MovingAverageStrategy()`: Assigns the calculated momentum signal in terms of a weight value to the IBM stock.
> 3. `bt.algos.Rebalance()`: Rebalances the available capital based on the weights assigned to each stock.

Define the long-/short-term moving average trading strategy backtest algorithm stack: 

In [None]:
cross_mav_ibm_strategy_15 = bt.Strategy('smav_15', [bt.algos.SelectAll(), MovingAverageStrategy(cross_mav_ibm_signal_data['SIGNAL_15']), bt.algos.Rebalance()])
cross_mav_ibm_strategy_60 = bt.Strategy('smav_60', [bt.algos.SelectAll(), MovingAverageStrategy(cross_mav_ibm_signal_data['SIGNAL_60']), bt.algos.Rebalance()])
cross_mav_ibm_strategy_base = bt.Strategy('base', [bt.algos.SelectAll(), MovingAverageStrategy(cross_mav_ibm_signal_data['SIGNAL_BASE']), bt.algos.Rebalance()])

Upon completion of defining the long-/sort-term moving average strategies let's now init the corresponding backtests using (1) both strategies as well as (2) the market data that we aim to evaluate during the backtest:

In [None]:
ibm_backtest_cross_mav_15 = bt.Backtest(strategy=cross_mav_ibm_strategy_15, data=ibm_market_data, name='ibm_backtest_smav_15')
ibm_backtest_cross_mav_60 = bt.Backtest(strategy=cross_mav_ibm_strategy_60, data=ibm_market_data, name='ibm_backtest_smav_60')
ibm_backtest_cross_mav_base = bt.Backtest(strategy=cross_mav_ibm_strategy_base, data=ibm_market_data, name='ibm_backtest_smav_base')

Now, let's run the backtest of both configurations of the crossover moving average strategy as well as the defined baseline:

In [None]:
backtest_results_ibm = bt.run(ibm_backtest_cross_mav_15, ibm_backtest_cross_mav_60, ibm_backtest_cross_mav_base)

Inspect the individual backtest results and performance measures:

In [None]:
backtest_results_ibm.display()

Collect detailed backtest performance per timestep of the fast crossover moving average strategy:

In [None]:
backtest_sma_15_ibm_details = ibm_backtest_cross_mav_15.strategy.data
backtest_sma_15_ibm_details.columns = ['% EQUITY', 'EQUITY', 'CASH', 'FEES']

Inspect detailed backtest results per timestep:

In [None]:
backtest_sma_15_ibm_details.head(10)

Collect detailed backtest performance per timestep of the slow crossover moving average strategy:

In [None]:
backtest_sma_60_ibm_details = ibm_backtest_cross_mav_60.strategy.data
backtest_sma_60_ibm_details.columns = ['% EQUITY', 'EQUITY', 'CASH', 'FEES']

Inspect detailed backtest results per timestep:

In [None]:
backtest_sma_60_ibm_details.head(10)

Collect detailed backtest performance per timestep of the baseline strategy:

In [None]:
backtest_sma_base_ibm_details = ibm_backtest_cross_mav_base.strategy.data
backtest_sma_base_ibm_details.columns = ['% EQUITY', 'EQUITY', 'CASH', 'FEES']

Inspect detailed backtest results per timestep:

In [None]:
backtest_sma_base_ibm_details.head(10)

Visualize each strategie's backtest equity progression over time:

In [None]:
plt.rcParams['figure.figsize'] = [15, 5]
fig = plt.figure()
ax = fig.add_subplot(111)

ax.plot(backtest_sma_15_ibm_details['% EQUITY'], color='C2',lw=1.0, label='15-day SMAV strategy (red)')
ax.plot(backtest_sma_60_ibm_details['% EQUITY'], color='C1',lw=1.0, label='60-day SMAV strategy (green)')
ax.plot(backtest_sma_base_ibm_details['% EQUITY'], color='C3',lw=1.0, label='Base SMAV strategy (purple)')

for tick in ax.get_xticklabels():
    tick.set_rotation(45)
    
# set axis labels
ax.set_xlabel('[time]', fontsize=10)
ax.set_xlim([pd.to_datetime('01-01-1990'), pd.to_datetime('31-12-2017')])
ax.set_ylabel('[equity %]', fontsize=10)

# set plot legend
plt.legend(loc="upper left", numpoints=1, fancybox=True)

# set plot title
plt.title('International Business Machines Corporation (IBM) - Backtest % Equity Progression', fontsize=10);

## Lab Exercises:

We recommend you to try the following exercises as part of the lab:

**1. Evaluation of distinct daily moving average parameters.**

> Evaluate the simple crossover moving average trading strategy using distinct moving average look-backs, e.g., 10 days, 30 days, 50 days, 200 days and 300 days. Compare the performance of the lookback parametrizations in terms of total-return, equity progression and yearly sharpe-ratio. Gain an intuition about the years in which the strategy didn't perform well and the potential reason for the poor performance.

In [None]:
# ***************************************************
# INSERT YOUR CODE HERE
# ***************************************************

**2. Evaluation of distinct trading frequencies.** 

>Decreasing the lookback in calculating the moving average may results in a significant increase of the corresponding trading frequency (referred to as 'signal changes', and defined as a change of the trading signal from -1 to 1 and vice versa). Calculate the number of signal changes for each of the distinct lookback parametrizations evaluated in the first exercise.

In [None]:
# ***************************************************
# INSERT YOUR CODE HERE
# ***************************************************

**3. Consideration of trading commissions and impact on strategy performance.** 

>Run the backtest of the lookback parametrizations evaluated in exercise 1. but include commission per trade. Set the trading commission of each backtest to 10 USD per trade (hint: you may want to use the `bt` backtest objects `set_commission()` function). Determine the impact of such a trading fee on the performance of each strategy in terms of total-return, equity progression and yearly sharpe-ratio.

In [None]:
# ***************************************************
# INSERT YOUR CODE HERE
# ***************************************************

**4. Optimization of the simple crossover moving average trading strategy parameters.**

>The lab notebook backtest results obtained for the simple crossover moving average trading strategy reveals that we didn't identify a well-performing strategy parametrization yet. Grid search the parameter space of the strategy to determine a parametrization that results in a positive total-return within the time interval 30.12.1990 until 31.12.2017.

In [None]:
# ***************************************************
# INSERT YOUR CODE HERE
# ***************************************************

**5. Implementation of a long-/short-term exponential moving average trading strategy.**

>A potential further enhancement of the simple crossover moving average trading strategy lies in the application of exponentially weighted moving averages. Adapt the crossover trading strategy of this notebook using an exponential moving average instead of the equally weighted moving average (hint: you may want to use the `ewm()` of the Pandas library). Evaluate the performance of the exponential moving average trading strategy using distinct alpha parameters, e.g.: 0.2, 0.5 and 0.8. 

In [None]:
# ***************************************************
# INSERT YOUR CODE HERE
# ***************************************************

## Lab Summary:

In this lab, a step by step implementation and backtest of a basic **moving average trading strategy** using the Python programming language is presented. The implemented strategy trades a specific financial instrument based on its adjusted closing price trend. The degree of success of the implemented strategy is evaluated based os its backtest results with particular focus on (1) the strategy's **total return** as well as (2) its **equity progression** over time. The code provided in this lab provides a blueprint to develop and backtest more complex trading strategies. It furthermore can be tailored to be applied for momentum trading of other financial instruments.

You may want to execute the content of your lab outside of the Jupyter notebook environment, e.g. on a compute node or a server. The cell below converts the lab notebook into a standalone and executable python script. Pls. note that to convert the notebook, you need to install Python's **nbconvert** library and its extensions:

In [None]:
# installing the nbconvert library
!pip install nbconvert
!pip install jupyter_contrib_nbextensions

Let's now convert the Jupyter notebook into a plain Python script:

In [None]:
!jupyter nbconvert --to script azek_lab_01.ipynb