<a href="https://colab.research.google.com/github/tleitch/Machine-Learning-for-Algorithmic-Trading-Second-Edition/blob/master/assignments/projectOptimization.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

# Inputs
There are 3 ways to input. 

First, you can save strategy returns in a direrctory names "projects", namely, the path is /content/drive/MyDrive/ahfs/project . The sma.ipynb atomatically saves to this folder.

Second, the codse below loads the FF five factors and it allows you to use them as a strategy. It is set up to just HML. It drops RF and the market factor due to the fact we don't want the market in our portfolio (you do remember we want low correlation with the market, right?) But if you want you can ad SMB HML or RMW  by changing the column drop below.

Third, and most flexible, you can use any ticker you can get from YAHOO finance. Do not use straight stocks, use ETF's and mutual funds. Be careful you don't use something with a life shorter than 2014 to now because the algo trunctaes to the smallest time servies in the bunch.

The code builds the returns and then does a maximized Sharpe Ratio portfolio. It allows shorting but does not allow leverage (sorry, needs a complex check for returns at end that is beyond me getting this done today).

At the bottom, it runs a Tear Sheet for your overall strategy and then one for each strategy. Use these results for you presentations.




# Mean-Variance Optimization

MPT solves for the optimal portfolio weights to minimize volatility for a given expected return, or maximize returns for a given level of volatility. The key requisite input are expected asset returns, standard deviations, and the covariance matrix. 

Diversification works because the variance of portfolio returns depends on the covariance of the assets and can be reduced below the weighted average of the asset variances by including assets with less than perfect correlation. In particular, given a vector, ω, of portfolio weights and the covariance matrix, $\Sigma$, the portfolio variance, $\sigma_{\text{PF}}$ is defined as:
$$\sigma_{\text{PF}}=\omega^T\Sigma\omega$$

Markowitz showed that the problem of maximizing the expected portfolio return subject to a target risk has an equivalent dual representation of minimizing portfolio risk subject to a target expected return level, $μ_{PF}$. Hence, the optimization problem becomes:
$$
\begin{align}
\min_\omega & \quad\quad\sigma^2_{\text{PF}}= \omega^T\Sigma\omega\\
\text{s.t.} &\quad\quad \mu_{\text{PF}}= \omega^T\mu\\ 
&\quad\quad \lVert\omega\rVert =1
\end{align}
$$

We can calculate an efficient frontier using `scipy.optimize.minimize` and the historical estimates for asset returns, standard deviations, and the covariance matrix. 

## Imports & Settings

In [1]:
import warnings
warnings.filterwarnings('ignore')

In [2]:
%matplotlib inline

import pandas as pd
import numpy as np
from numpy.random import random, uniform, dirichlet, choice
from numpy.linalg import inv

from scipy.optimize import minimize

import pandas_datareader.data as web
import matplotlib.pyplot as plt
from matplotlib.ticker import FuncFormatter
import seaborn as sns
!pip install backtrader
!pip install git+https://github.com/quantopian/pyfolio
!pip install yfinance
!pip install --upgrade tables
import backtrader as bt
from backtrader.feeds import PandasData
import pyfolio as pf
import yfinance as yf
from datetime import datetime
from datetime import date

Collecting backtrader
[?25l  Downloading https://files.pythonhosted.org/packages/1a/bf/78aadd993e2719d6764603465fde163ba6ec15cf0e81f13e39ca13451348/backtrader-1.9.76.123-py2.py3-none-any.whl (410kB)
[K     |▉                               | 10kB 11.5MB/s eta 0:00:01[K     |█▋                              | 20kB 14.5MB/s eta 0:00:01[K     |██▍                             | 30kB 16.8MB/s eta 0:00:01[K     |███▏                            | 40kB 16.2MB/s eta 0:00:01[K     |████                            | 51kB 7.2MB/s eta 0:00:01[K     |████▉                           | 61kB 7.0MB/s eta 0:00:01[K     |█████▋                          | 71kB 7.7MB/s eta 0:00:01[K     |██████▍                         | 81kB 8.4MB/s eta 0:00:01[K     |███████▏                        | 92kB 8.9MB/s eta 0:00:01[K     |████████                        | 102kB 7.6MB/s eta 0:00:01[K     |████████▉                       | 112kB 7.6MB/s eta 0:00:01[K     |█████████▋                      | 12

In [3]:
sns.set_style('whitegrid')
np.random.seed(42)

In [4]:
cmap = sns.diverging_palette(10, 240, n=9, as_cmap=True)

## Prepare Data

In [5]:
from google.colab import drive
drive.mount('/content/drive',force_remount=True)

Mounted at /content/drive


We select historical data for tickers included in the S&P500 (according to Wikipedia) from 1998-2017.

In [6]:
url = 'https://en.wikipedia.org/wiki/List_of_S%26P_500_companies'
df = pd.read_html(url, header=0)[0]

In [7]:

df.columns = ['ticker', 'name', 'sec_filings', 'gics_sector', 'gics_sub_industry',
              'location', 'first_added', 'cik', 'founded']
df = df.drop('sec_filings', axis=1).set_index('ticker')

In [8]:

with pd.HDFStore('/content/drive/MyDrive/ahfs/assets.h5') as store:
    store.put('sp500/stocks', df)

In [9]:
with pd.HDFStore('/content/drive/MyDrive/ahfs/assets.h5') as store:
    sp500_stocks = store['sp500/stocks']

In [10]:
sp500_stocks.head()

Unnamed: 0_level_0,name,gics_sector,gics_sub_industry,location,first_added,cik,founded
ticker,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1,Unnamed: 7_level_1
MMM,3M,Industrials,Industrial Conglomerates,"Saint Paul, Minnesota",1976-08-09,66740,1902
ABT,Abbott Laboratories,Health Care,Health Care Equipment,"North Chicago, Illinois",1964-03-31,1800,1888
ABBV,AbbVie,Health Care,Pharmaceuticals,"North Chicago, Illinois",2012-12-31,1551152,2013 (1888)
ABMD,Abiomed,Health Care,Health Care Equipment,"Danvers, Massachusetts",2018-05-31,815094,1981
ACN,Accenture,Information Technology,IT Consulting & Other Services,"Dublin, Ireland",2011-07-06,1467373,1989


In [None]:
with pd.HDFStore('/content/drive/MyDrive/ahfs/assets.h5') as store:
    prices = (store['quandl/wiki/prices']
              .adj_close
              .unstack('ticker')
              .filter(sp500_stocks.index)
              .sample(n=30, axis=1))

## Compute Inputs

In [None]:
startDt='2014-01-01'
endDt='2017-12-31'

In [None]:
ff_factor = 'F-F_Research_Data_5_Factors_2x3_daily'
ff_factor_data = web.DataReader(ff_factor, 'famafrench', start=startDt, end=endDt)[0]
ff_factor_data.info()

In [None]:
#data = np.genfromtxt(,delimiter=',',skip_header=1)
import glob



# get data file names

local_path = r'/content/drive/MyDrive/ahfs/project'

filenames = glob.glob(local_path + "/*.csv")

dfs = []

i=0
for filename in filenames:
    if i==0:
      assetName=filename.rsplit( ".", 1 )[ 0 ]
      assetName=assetName.rsplit( "/", 1 )[ 1 ]
      dfs=pd.read_csv(filename)
      dfs.rename(columns={ dfs.columns[i+1]: assetName }, inplace = True)
    else:
      assetName=filename.rsplit( ".", 1 )[ 0 ]
      assetName=assetName.rsplit( "/", 1 )[ 1 ]
      ts=pd.read_csv(filename)
      ts.head()
      dfs=dfs.merge(ts,on="index")
      dfs.rename(columns={ dfs.columns[i+1]: assetName }, inplace = True)
    i=i+1




In [None]:
dfs.head()
dfs['Date']=dfs['index']
dfs['Date'] = pd.to_datetime(dfs['Date'],utc=False)
dfs.index=dfs['Date']

dfs=dfs.drop("index",axis=1)
dfs=dfs.drop("Date",axis=1)
dfs=dfs.tz_convert(None)

In [None]:
dfs=pd.merge(dfs, ff_factor_data/100, left_index=True, right_index=True)

In [None]:
dfs.head()

# Pick tickers here using mutual funds and ETF's for alternative benchmarks

In [None]:
numCol=len(dfs.columns)
tickers = ["MNA","ICVT","AQMNX"]
i=0
for ticker in tickers:
    data=yf.download(ticker,startDt, endDt)    
    rets=data['Adj Close'].pct_change()
    dfs=pd.merge(dfs, rets, left_index=True, right_index=True)
    dfs.rename(columns={ dfs.columns[numCol+i]: ticker }, inplace = True)
    i=i+1


In [None]:
dfs.head()

### Risk-Free Rate

Load historical 10-year Treasury rate:

In [None]:
rf_rate = dfs['RF'].mean()
dfs2=dfs.drop(columns=["RF","Mkt-RF"]) # RF no longer needed and Mkt should not be part of your portfolio

In [None]:
dfs2.head()

# FF Factors removed here, HML left in by default for example

In [None]:
dfs2=dfs2.drop(columns=["SMB","RMW","CMA"]) # Leave HML as an example so delete if not part of your strategy

### Compute Returns

Create month-end monthly returns and drop dates that have no observations:

In [None]:
daily_returns= dfs2
daily_returns.info()


In [None]:
daily_returns.head()

### Set  Parameters

In [None]:
strategies = daily_returns.columns

In [None]:
n_obs, n_assets = daily_returns.shape
n_assets, n_obs

In [None]:
x0 = uniform(0, 1, n_assets)
x0 /= np.sum(np.abs(x0))

### Annualization Factor

In [None]:
periods_per_year = round(daily_returns.resample('A').size().mean())
periods_per_year

### Compute Mean Returns, Covariance and Precision Matrix

In [None]:
mean_returns = daily_returns.mean()
cov_matrix = daily_returns.cov()

In [None]:
mean_returns



In [None]:
cov_matrix

The precision matrix is the inverse of the covariance matrix:

In [None]:
precision_matrix = pd.DataFrame(inv(cov_matrix), index=strategies, columns=strategies)

## Compute Annualize PF Performance

Now we'll set up the quadratic optimization problem to solve for the minimum standard deviation for a given return or the maximum SR. 

To this end, define the functions that measure the key metrics:

In [None]:
def portfolio_std(wt, rt=None, cov=None):
    """Annualized PF standard deviation"""
    return np.sqrt(wt @ cov @ wt * periods_per_year)

In [None]:
def portfolio_returns(wt, rt=None, cov=None):
    """Annualized PF returns"""
    return (wt @ rt + 1) ** periods_per_year - 1

In [None]:
def portfolio_performance(wt, rt, cov):
    """Annualized PF returns & standard deviation"""
    r = portfolio_returns(wt, rt=rt)
    sd = portfolio_std(wt, cov=cov)
    return r, sd

## Max Sharpe PF

Define a target function that represents the negative SR for scipy's minimize function to optimize, given the constraints that the weights are bounded by [-1, 1], if short trading is permitted, and [0, 1] otherwise, and sum to one in absolute terms.

In [None]:
def neg_sharpe_ratio(weights, mean_ret, cov):
    r, sd = portfolio_performance(weights, mean_ret, cov)
    return -((r - rf_rate) / sd)

In [None]:
weight_constraint = {'type': 'eq', 
                     'fun': lambda x: np.sum(np.abs(x))-1}

In [None]:
def max_sharpe_ratio(mean_ret, cov, short=False):   # short-F because you can't short your own strategy
    return minimize(fun=neg_sharpe_ratio,
                    x0=x0,
                    args=(mean_ret, cov),
                    method='SLSQP',
                    bounds=((-1 if short else 0, 1),) * n_assets,
                    constraints=weight_constraint,
                    options={'tol':1e-10, 'maxiter':1e4})

## Run Calculations

### Get Max Sharpe PF

In [None]:
max_sharpe_pf = max_sharpe_ratio(mean_returns, cov_matrix, short=False)
max_sharpe_perf = portfolio_performance(max_sharpe_pf.x, mean_returns, cov_matrix)

In [None]:
len(mean_returns)

In [None]:
print(max_sharpe_pf.x)

In [None]:
r, sd = max_sharpe_perf
pd.Series({'ret': r, 'sd': sd, 'sr': (r-rf_rate)/sd})

From simulated pf data

In [None]:
# Save weights
pWeights=max_sharpe_pf.x

# Now contsruct portfolios from weights

In [None]:
myStrat=dfs2[strategies].mul(pWeights)
myStratRets=dfs2[strategies].mul(pWeights).sum(1)

In [None]:
myStratRets.head()

## Run PyFolio Analysis

### Get Benchmark

In [None]:
start = str(myStratRets.index.min().year)
end = str(myStratRets.index.max().year + 1)

In [None]:
myStratRets.head()

In [None]:
benchmark = web.DataReader('SP500', 'fred',
                           start=start,
                           end=end).squeeze()
benchmark = benchmark.pct_change().tz_localize(None)


In [None]:
fig, axes = plt.subplots(ncols=3, figsize=(20,5))
pf.plotting.plot_rolling_returns( myStratRets, ax=axes[0])
axes[0].set_title('Cumulative Returns')
pf.plotting.plot_rolling_sharpe(myStratRets, benchmark,ax=axes[1])
pf.plotting.plot_rolling_beta(myStratRets, benchmark, ax=axes[2])
sns.despine()
fig.tight_layout();

### Create full tearsheet for overall and then for each

In [None]:
pf.create_full_tear_sheet(myStratRets,
                          estimate_intraday=False)

In [None]:
for strat in strategies:
  print(strat)
  pf.create_full_tear_sheet(myStrat[strat],
                          estimate_intraday=False)