<img style="float: right;" width="120" src="../Images/supplier-logo.png">
<img style="float: left; margin-top: 0" width="80" src="../Images/client-logo.png">
<br><br><br>


# Overview

This is a set of python and pandas routines intended to demonstrate how quickly (and easy) it is to pull together some very simple, yet effective analytics to do reasonably complex tasks

# Load in the libraries

In addition to the standard libraries of pandas, numpy and matplotlib, we will need some other libraries.

In [None]:
import matplotlib.pyplot as plt
import pandas as pd
import numpy as np


In [None]:
# the python package to follow a URL and return its results
import requests

# beautiful soup - for web scraping
import bs4 as bs

resp = requests.get('https://en.wikipedia.org/wiki/List_of_S%26P_500_companies')
soup = bs.BeautifulSoup(resp.text, 'lxml')
table = soup.find('table', {'class':'wikitable sortable'})

tickers = []

for row in table.findAll('tr')[1:]:
    ticker = row.findAll('td')[0].text
    tickers.append(ticker)

tickers

# Financial Data from the internet

There is a plethora of web sites, web services that allow users to download all sorts of data in all manner of formats.

e.g. Manual, Automated
excel format, pdfs, word, json, csv and text and **DataFrames!!!**

A very good website is https://www.quandl.com/

This offers good quality data for free and for a fee.

Log on, register and download data in a variety of formats.

To use their APIs you need an api_key

In [None]:
import quandl

quandl.ApiConfig.api_key = "YOUR API KEY HERE" 

In [None]:
quandl.get('NSE/RELIANCE', start_date = '2017-JAN-01', end_date='2019-JAN-24')
quandl.get('OPEC/ORB', start_date='2009-01-23', end_date='2019-01-24')
quandl.get('LBMA/GOLD', start_date='2018-01-01', end_date='2019-01-23')
quandl.get('WIKI/IBM', start_date='2018-01-01', end_date='2019-01-23')

In [None]:
def getData(start, end, symbol):
    data = 'WIKI/'+ symbol

    return quandl.get(dataset = data, start_date = start, end_date = end)

In [None]:
df = getData("2000-01-01", "2018-12-31","C")

df.head()

In [None]:
df[['Adj. Close']].plot()

# Technical Analysis

A Very simple piece of technical analysis based on comparing 2 month and 1 year rolling averages for a single symbol

In [None]:
# Based on a 2 month (42 day) and a 1 year (252 trading day)
df_SPX = pd.read_csv('../Data/Indices/SP500.csv', index_col='Date', parse_dates=True)
df_SPX = df_SPX.sort_index(ascending=True)

# Need to change strings to numerics - quite often needed to be done when loading data from files
df_SPX['Price'] = pd.to_numeric(df_SPX['Price'].str.replace(',',''))

# Calculate Rolling Averages
# Based on a 2 month (42 day) and a 1 year (252 trading day)
x42 = df_SPX['Price'].rolling(42).mean()
x252 = df_SPX['Price'].rolling(252).mean()

df_SPX['42d'] = np.round(x42,2)
df_SPX['252d'] = np.round(x252,2)

# Simple plot
cols = ['Price','42d', '252d']
df_SPX[cols].plot(figsize=(18,9))

## Plot the Regime

Again a quick eyeball of what we have done aids understanding.

This plot is the beginning of how we generate our buy/sell/hold signals


In [None]:
# Take the difference between the long (252d) and short (42d) rolling averages
df_SPX['Diff'] = df_SPX['42d'] - df_SPX['252d']

# Plot this, this is the basis of our trading regime
df_SPX['Diff'].plot(grid=True)

## Add the rules of a Trading Regime

Add a new column called 'Regime'

The trading regime here is based on the following rule.

- Generate a LONG signal when the 42-252 day difference first passes the **3% threshhold ** 
- Generate a SHORT signal when the 42-252 day difference first passes the ** - 3% threshhold ** 
- Generate a HOLD signal (convert to cash) when the 42-252 day difference first enters the ** +/- 3% band ** 

The encoding for the Regime is
- 1 ==> Long
- -1 ==> Short
- 0 ===> Hold

Note I am using the np.where function here - there are many other ways to do this but the vector arithmetic in numpy fits extremely well with the rows and columns of pandas DataFrames.


In [None]:
# 3 %
threshold = 0.03
    
df_SPX['Regime'] = np.where(df_SPX['Diff'] / df_SPX['252d'] > threshold, 1, 0)
df_SPX['Regime'] = np.where(df_SPX['Diff'] / df_SPX['252d'] < -threshold, -1, df_SPX['Regime'])

## Plot the trading signals

This plot basically tells us when to go Long, Short or Hold.

e.g. It says hold in cash from start of 2006.<br>
Go LONG around early 2007 and stay there im ner enf of 2007<br>
Go to HOLD for a short period, then go LONG, then go to HOLD shorly after.<br>
etc etc etc

over the oeriod 2006 to 2018 - this strategy has generated about 20 trading signals

In [None]:
df_SPX['Regime'].plot(figsize=(18,6))

## Back test

2 new columns 
- 'Market' - how the SP500 performed during this period
- 'Strategy' = how our trading strategy performed during this period

Use log of differences to calculate performance of the market
Our strategy's performance is the prodct of market performance and our regime.

In [None]:
df_SPX['Market'] = np.log(df_SPX['Price'] / df_SPX['Price'].shift(1))
df_SPX['Strategy'] = df_SPX['Regime'].shift(1) * df_SPX['Market']

## Plot the results

A simple plot of market vs strategy shows how well the strategy perfomred against the market.

Calculate a continuous accumulation of the returns (np.exp)

The strategy 
- slightly underperforms at the start of the analysis period
- significantly out performs the market between middle 2008 thru to start of 2013
- slightly outperforms the market significantly during form 2013 to 2016
- slighly underperfoms between 2016 to present day


In [None]:
df_SPX[['Market', 'Strategy']].cumsum().apply(np.exp).plot(grid=True)

## Refinements

- The 3% threshhold is arbritary <br>
This could change and make quite a difference. <br>
e.g Upper is 3%, lower is 2% 
The values of Upper and lower threshholds could also vary over time based on other factors


- Add in costs eg tnx costs<br>
This strategy does not generate too many trading signals so txn cost not that significant.


- Operational isses - Trade Execution<br>
This example only works with closing values - could expand it to further refine when to exit the market (same day, COB close, next day Open etc et)




# Pulling it all together

Can we 

- Go to Wikipedia and get all the symbols for the SP500

- Get closing day stock prices for these symbols (or a sub set of them)

- Perform technical analysis and back test for each of these sysmbols

- Compare symbols against each other.

**Of Course**

##  Write a function that scrapes symbols from Wikipedia

In [None]:
# the python package to follow a URL and return its results
import requests

# beautiful soup - for web scraping
import bs4 as bs

def getSP500Symbols():
    resp = requests.get('https://en.wikipedia.org/wiki/List_of_S%26P_500_companies')
    soup = bs.BeautifulSoup(resp.text, 'lxml')
    table = soup.find('table', {'class':'wikitable sortable'})

    tickers = []

    for row in table.findAll('tr')[1:]:
        ticker = row.findAll('td')[0].text
        tickers.append(ticker)

    return tickers

##  Write a function that performs some technical analysis

In [None]:
def perform_TA(start, end, symbol, threshold):
    
    # Load in data from somewhere. e.g. quandle, bloomberg, etc etc
    df = getData(start, end, symbol)
    
    # Based on a 2 month (42 day) and a 1 year (252) trading days (trend)
    x42 = df['Adj. Close'].rolling(42).mean()
    x252 = df['Adj. Close'].rolling(252).mean()
    df['42d'] = np.round(x42,2)
    df['252d'] = np.round(x252,2)
    
    # Store the difference between the short term and the long term
    df['42d-252d'] = df['42d'] - df['252d']
    
    # set the regime
    df['Regime'] = np.where(df['42d-252d'] / df['252d']> threshold, 1, 0)
    df['Regime'] = np.where(df['42d-252d'] / df['252d'] < -threshold, -1, df['Regime'])
    
    df['Market'] = np.log(df['Adj. Close'] / df['Adj. Close'].shift(1))
    df['Strategy'] = df['Regime'].shift(1) * df['Market']
    
    # Create a new DataFrame just to store the results and return this to the call site
    df_Res = pd.DataFrame()
    
    df_Res['Market'] = np.log(df['Adj. Close'] / df['Adj. Close'].shift(1))
    df_Res['Strategy'] = df['Regime'].shift(1) * df['Market']
    
    return df_Res

## Pull it all together

In [None]:
ta_results = {}

# Get some symbols / tickers that we want to use

# This will perform a web scrape
#tickers = getSP500Symbols()

# For demonstration purposes, just use a small subset of symbols

tickers = ['GM','AAPL', 'JPM', 'C', 'MMM', 'AMZN', 'GOOGL']

start = "2000-Jan-01"
end = "2018-Feb-28"

for ticker in tickers:
    df = perform_TA(start, end, ticker, 0.03)
    ta_results[ticker] = df

## Try it out

In [None]:
df = ta_results['C']

df[['Market', 'Strategy']].cumsum().apply(np.exp).plot(grid=True)

In [None]:
df2 = pd.DataFrame()

df2['GM'] = ta_results['GM']['Market']
df2['AAPL'] = ta_results['AAPL']['Market']
df2['C'] = ta_results['C']['Market']
df2['JPM'] = ta_results['JPM']['Market']
df2['AMZN'] = ta_results['AMZN']['Market']
df2['MMM'] = ta_results['MMM']['Market']
df2['GOOGL'] = ta_results['GOOGL']['Market']

df2.cumsum().apply(np.exp).plot(grid=True, figsize=(18,9))

In [None]:
df2 = pd.DataFrame()

df2['GM'] = ta_results['GM']['Strategy']
df2['AAPL'] = ta_results['AAPL']['Strategy']
df2['C'] = ta_results['C']['Strategy']
df2['JPM'] = ta_results['JPM']['Strategy']
df2['AMZN'] = ta_results['AMZN']['Strategy']
df2['MMM'] = ta_results['MMM']['Strategy']
df2['GOOGL'] = ta_results['GOOGL']['Strategy']

df2.cumsum().apply(np.exp).plot(grid=True, figsize=(18,9))

In [None]:
ta_results['C']['Market'].cumsum().apply(np.exp).plot()