# Alternative to Python for Finance, Part 2

[Here](http://www.learndatasci.com/python-finance-part-yahoo-finance-api-pandas-matplotlib/) is the original website. The original text is copied from the source.

***

In Python for Finance, Part I, we focused on using Python and Pandas to

* retrieve financial time-series from free online sources (Yahoo),
* format the data by filling missing observations and aligning them,
* calculate some simple indicators such as rolling moving averages and
* visualise the final time-series.

As a reminder, the dataframe containing the three “cleaned” price timeseries has the following format:

In [1]:
# Import modules

from pandas_datareader import data
import matplotlib.pyplot as plt
import pandas as pd
import numpy as np
import time

In [2]:
# Reload data from Part 1

tickers = ['AAPL', 'MSFT', 'SP']
today = time.strftime("%Y-%m-%d")
start_date = '2000-01-01'
panel_data = data.DataReader(tickers, "google", start_date, today)
adj_close = panel_data.ix['Close']
all_weekdays = pd.date_range(start=start_date, end=today, freq='B')
adj_close = adj_close.reindex(all_weekdays)
adj_close = adj_close.fillna(method='ffill')

In [8]:
adj_close.tail()

Unnamed: 0,AAPL,MSFT,SP
2017-05-22,153.99,68.45,29.2
2017-05-23,153.8,68.68,29.1
2017-05-24,153.34,68.77,29.4
2017-05-25,153.87,69.62,29.25
2017-05-26,153.61,69.96,29.4


We have also calculated the rolling moving averages of these three timeseries as follows. Note that when calculating the $M$ days moving average, the first $M−1$ are not valid, as $M$ prices are required for the first moving average data point.

In [9]:
# Calculating the short-window moving average
short_rolling = adj_close.rolling(window=20).mean()
short_rolling.tail()

Unnamed: 0,AAPL,MSFT,SP
2017-05-22,150.1805,68.569,31.489
2017-05-23,150.644,68.607,31.269
2017-05-24,151.127,68.654,30.989
2017-05-25,151.631,68.7215,30.699
2017-05-26,152.129,68.7965,30.4465


In [10]:
# Calculating the short-window moving average
long_rolling = adj_close.rolling(window=100).mean()
long_rolling.tail()

Unnamed: 0,AAPL,MSFT,SP
2017-05-22,137.0479,65.2178,31.0633
2017-05-23,137.4244,65.2788,31.0653
2017-05-24,137.7976,65.3435,31.0663
2017-05-25,138.1702,65.4167,31.0713
2017-05-26,138.5272,65.4879,31.0798


Building on these results, our ultimate goal will be to design a simple yet realistic trading strategy. However, first we need to go through some of the basic concepts related to quantitative trading strategies, as well as the tools and techniques in the process.

## General considerations about trading strategies

There are several ways one can go about when a trading strategy is to be developed. One approach would be to use the price time-series directly and work with numbers that correspond to some monetary value. For example, a researcher could be working with time-series expressing the price of a given stock, like the time-series we used in the previous article. Similarly, if working with fixed income instruments, e.g. bonds, one could be using a time-series expressing the price of the bond as a percentage of a given reference value, in this case the par value of the bond. Working with this type of time-series can be more intuitive as people are used to thinking in terms of prices. However, price time-series have some drawbacks. Prices are usually only positive, which makes it harder to use models and approaches which require or produce negative numbers. In addition, price time-series are usually non-stationary, that is their statistical properties are less stable over time.

An alternative approach is to use time-series which correspond not to actual values but changes in the monetary value of the asset. These time-series can and do assume negative values and also, their statistical properties are usually more stable than the ones of price time-series. The most frequently used forms used are relative returns defined as
$$r_{relative}(t) = \frac{p(t) - p(t - 1)}{p(t - 1)}$$
and log-returns defined as
$$r(t) = log \Big( \frac{p(t) - p(t - 1)}{p(t - 1)} \Big) $$

where $p(t)$ is the price of the asset at time $t$. For example, if $p(t)=101$ and $p(t−1)=100$ then $r_{relative}(t)= \frac{101–100}{100} = 1\%$. 

There are several reasons why log-returns are being used in the industry and some of them are related to long-standing assumptions about the behaviour of asset returns and are out of our scope. However, what we need to point out are two quite interesting properties. Log-returns are additive and this facilitates treatment of our time-series, relative returns are not. We can see the additivity of log-returns in the following equation.

$$r(t_1) + r(t_2) = log \Big( \frac{p(t_1)}{p(t_0)} \Big) + log \Big( \frac{p(t_2)}{p(t_1)} \Big) = log \Big( \frac{p(t_2)}{p(t_0)} \Big) $$

which is simply the log-return from $t_0$ to $t_2$. Secondly, log-returns are approximately equal to the relative returns for values of $\frac{p(t)}{p(t−1)}$ sufficiently close to $1$. By taking the 1st order Taylor expansion of $log \Big( \frac{p(t)}{p(t - 1)} \Big)$ around $1$, we get

$$\log\left( \frac{p\left(t\right)}{p\left(t-1\right)} \right) \simeq \log\left(1\right) + 
\frac{p\left(t\right)}{p\left(t-1\right)} – 1 = r_{\text{relative}}\left(t\right) $$

Both of these are trivially calculated using Pandas as:

In [6]:
# Relative returns
returns = adj_close.pct_change(1)
returns.tail()

Unnamed: 0,AAPL,MSFT,SP
2017-05-22,0.006076,0.011228,0.019197
2017-05-23,-0.001234,0.00336,-0.003425
2017-05-24,-0.002991,0.00131,0.010309
2017-05-25,0.003456,0.01236,-0.005102
2017-05-26,-0.00169,0.004884,0.005128


In [7]:
# Log returns - First the logarithm of the prices is taken and the the difference of consecutive (log) observations
log_returns = np.log(adj_close).diff()
log_returns.tail()

Unnamed: 0,AAPL,MSFT,SP
2017-05-22,0.006058,0.011165,0.019015
2017-05-23,-0.001235,0.003354,-0.003431
2017-05-24,-0.002995,0.00131,0.010257
2017-05-25,0.00345,0.012284,-0.005115
2017-05-26,-0.001691,0.004872,0.005115


Since log-returns are additive, we can create the time-series of cumulative log-returns defined as
 
$$c\left(t\right) = \sum_{k=1}^t r\left(t\right)$$

The cumulative log-returns and the total relative returns from 2000/01/01 for the three time-series can be seen below. Note that although log-returns are easy to manipulate, investors are accustomed to using relative returns. For example, a log-return of $1$ does not mean an investor has doubled the value of his portfolio. A relative return of $1 = 100\%$ does! Converting between the cumulative log-return $c\left(t\right)$ and the total relative return $c_{\text{relative}}\left(t\right) = \frac{p\left(t\right) – p\left(t_o\right)}{p\left(t_o\right)}$ is simple

$$c_{\text{relative}}\left(t\right) = e^{c\left(t\right)} – 1$$

For those who are wondering if this is correct, yes it is. If someone had bought $\$1000$ worth of AAPL shares in January 2000, her/his portfolio would now be worth over $\$30,000$. If only we had a time machine…

In [12]:
# Build figure
fig = plt.figure(figsize=[16,9])

ax = fig.add_subplot(2,1,1)

for c in log_returns:
    ax.plot(log_returns.index, log_returns.cumsum(), label=str(c))

ax.set_ylabel('Cumulative log returns')
ax.legend(loc='best')
ax.grid()

ax = fig.add_subplot(2,1,2)

for c in log_returns:
    ax.plot(log_returns.index, 100*(np.exp(log_returns.cumsum()) - 1), label=str(c))

ax.set_ylabel('Total relative returns (%)')
ax.legend(loc='best')
ax.grid()

plt.show()

![Graphs supporting the above calculations](finance2.png "Cumulative log returns plus percentages of relative returns.")