# Project 3: Smart Beta Portfolio and Portfolio Optimization
## Instructions
Each problem consists of a function to implement and instructions on how to implement the function.  The parts of the function that need to be implemented are marked with a `# TODO` comment. After implementing the function, run the cell to test it against the unit tests we've provided. For each problem, we provide one or more unit tests from our `project_tests` package. These unit tests won't tell you if your answer is correct, but will warn you of any major errors. Your code will be checked for the correct solution when you submit it Udacity.

## Packages
When you implement the functions, you'll only need to use the [Pandas](https://pandas.pydata.org/) and [Numpy](http://www.numpy.org/) packages. Don't import any other packages, otherwise the grader willn't be able to run your code.

The other packages that we're importing is `helper` and `project_tests`. These are custom packages built to help you solve the problems.  The `helper` module contains utility functions and graph functions. The `project_tests` contains the unit tests for all the problems.
### Install Packages

In [1]:
import sys
!{sys.executable} -m pip install -r requirements.txt

[33mYou are using pip version 9.0.1, however version 10.0.1 is available.
You should consider upgrading via the 'pip install --upgrade pip' command.[0m


### Load Packages

In [2]:
import pandas as pd
import numpy as np
import helper
import project_tests

## Market Data
The data source we'll be using is the [Wiki End of Day data](https://www.quandl.com/databases/WIKIP) hosted at [Quandl](https://www.quandl.com). This contains data for many stocks, but we'll just be looking at the S&P 500 stocks. We'll also make things a little easier to solve by narrowing our range of time from 2007-06-30 to 2017-09-30.
### Set API Key
Set the `quandl.ApiConfig.api_key ` variable to your Quandl api key. You can find your Quandl api key [here](https://www.quandl.com/account/api).

In [3]:
import quandl

# TODO: Add your Quandl API Key
quandl.ApiConfig.api_key  = 'wDVaeHgSSAiMBv7k4Pfw'

### Download Data

In [4]:
import os

snp500_file_path = 'data/tickers_SnP500.txt'
wiki_file_path = 'data/WIKI_PRICES.csv'
start_date, end_date = '2013-07-01', '2017-06-30'
use_columns = ['date', 'ticker', 'adj_close', 'adj_volume', 'ex-dividend']

if not os.path.exists(wiki_file_path):
    with open(snp500_file_path) as f:
        tickers = f.read().split()
    
    print('Downloading data...')
    helper.download_quandl_dataset('WIKI', 'PRICES', wiki_file_path, use_columns, tickers, start_date, end_date)
    print('Data downloaded')
else:
    print('Data already downloaded')

Downloading data...
Data downloaded


### Load Data

In [5]:
df = pd.read_csv(wiki_file_path)

### Create the Universe
We'll be selecting dollar volume stocks for our stock universe. This universe is similar to large market cap stocks, because they are the highly liquid.

In [6]:
percent_top_dollar = 0.2
high_volume_symbols = helper.large_dollar_volume_stocks(df, 'adj_close', 'adj_volume', percent_top_dollar)
df = df[df['ticker'].isin(high_volume_symbols)]

### 2-D Matrices
In the previous projects, we used a [multiindex](https://pandas.pydata.org/pandas-docs/stable/advanced.html) to store all the data in a single dataframe. As you work with larger datasets, it come infeasable to store all the data in memory. Starting with this project, we'll be storing all our data as 2-D matrices to match what you'll be expecting in the real world.

In [7]:
close = df.reset_index().pivot(index='ticker', columns='date', values='adj_close')
volume = df.reset_index().pivot(index='ticker', columns='date', values='adj_volume')
ex_dividend = df.reset_index().pivot(index='ticker', columns='date', values='ex-dividend')

### View Data
To see what one of these 2-d matrices looks like, let's take a look at the closing prices matrix.

In [8]:
helper.print_dataframe(close)

# Part 1: Smart Beta Portfolio
In Part 1 of this project, you'll build a smart beta portfolio using dividend yield. To see how well it performs, you'll compare this portfolio to an index.
## Index Weights
After building the smart beta portfolio, should compare it to a similar strategy or index.

Implement `generate_dollar_volume_weights` to generate the weights for this index. For each date, generate the weights based on dollar volume traded for that date. For example, assume the following is dollar volume traded data:

|          | 10/02/2010 | 10/03/2010 |
|----------|------------|------------|
| **AAPL** |      2     |      2     |
| **BBC**  |      5     |      6     |
| **GGL**  |      1     |      2     |
| **ZGB**  |      6     |      5     |

The weights should be the following:

|          | 10/02/2010 | 10/03/2010 |
|----------|------------|------------|
| **AAPL** |    0.142   |    0.133   |
| **BBC**  |    0.357   |    0.400   |
| **GGL**  |    0.071   |    0.133   |
| **ZGB**  |    0.428   |    0.333   |

In [9]:
def generate_dollar_volume_weights(close, volume):
    """
    Generate dollar volume weights.

    Parameters
    ----------
    close : DataFrame
        Close price for each ticker and date
    volume : str
        Volume for each ticker and date

    Returns
    -------
    dollar_volume_weights : DataFrame
        The dollar volume weights for each ticker and date
    """
    assert close.index.equals(volume.index)
    assert close.columns.equals(volume.columns)
    
    #TODO: Implement function
    dollar_volume = close * volume

    return dollar_volume / dollar_volume.sum()

project_tests.test_generate_dollar_volume_weights(generate_dollar_volume_weights)

Tests Passed


### View Data
Let's generate the index weights using `generate_dollar_volume_weights` and view them using a heatmap.

In [10]:
index_weights = generate_dollar_volume_weights(close, volume)
helper.plot_weights(index_weights, 'Index Weights')

## ETF Weights
Now that we have the index weights, it's time to build the weights for the smart beta ETF. Let's build an ETF portfolio that is based on dividends. This is a common factor used to build portfolios. Unlike most portfolios, we'll be using a single factor for simplicity.

Implement `calculate_dividend_weights` to returns the weights for each stock based on its total dividend yield over time. This is similar to generating the weight for the index, but it's dividend data instead.

In [11]:
def calculate_dividend_weights(ex_dividend):
    """
    Calculate dividend weights.

    Parameters
    ----------
    ex_dividend : DataFrame
        Ex-dividend for each stock and date

    Returns
    -------
    dividend_weights : DataFrame
        Weights for each stock and date
    """
    #TODO: Implement function
    dividend_cumsum_per_ticker = ex_dividend.T.cumsum().T

    return dividend_cumsum_per_ticker/dividend_cumsum_per_ticker.sum()

project_tests.test_calculate_dividend_weights(calculate_dividend_weights)

Tests Passed


### View Data
Let's generate the ETF weights using `calculate_dividend_weights` and view them using a heatmap.

In [12]:
etf_weights = calculate_dividend_weights(ex_dividend)
helper.plot_weights(etf_weights, 'ETF Weights')

## Returns
Implement `generate_returns` to generate the returns. Note this isn't log returns. Since we're not dealing with volatility, we don't have to use log returns.

In [13]:
def generate_returns(close):
    """
    Generate returns for ticker and date.

    Parameters
    ----------
    close : DataFrame
        Close price for each ticker and date

    Returns
    -------
    returns : Dataframe
        The returns for each ticker and date
    """
    #TODO: Implement function

    return (close.T / close.T.shift(1) -1).T

project_tests.test_generate_returns(generate_returns)

Tests Passed


### View Data
Let's generate the closing returns using `generate_returns` and view them using a heatmap.

In [14]:
returns = generate_returns(close)
helper.plot_returns(returns, 'Close Returns')

## Weighted Returns
With the returns of each stock computed, we can use it to compute the returns for for an index or ETF. Implement `generate_weighted_returns` to create weighted returns using returns and weights for an Index or ETF.

In [15]:
def generate_weighted_returns(returns, weights):
    """
    Generate weighted returns.

    Parameters
    ----------
    returns : DataFrame
        Returns for each ticker and date
    weights : DataFrame
        Weights for each ticker and date

    Returns
    -------
    weighted_returns : DataFrame
        Weighted returns for each ticker and date
    """
    assert returns.index.equals(weights.index)
    assert returns.columns.equals(weights.columns)
    
    #TODO: Implement function

    return returns * weights

project_tests.test_generate_weighted_returns(generate_weighted_returns)

Tests Passed


### View Data
Let's generate the etf and index returns using `generate_weighted_returns` and view them using a heatmap.

In [16]:
index_weighted_returns = generate_weighted_returns(returns, index_weights)
etf_weighted_returns = generate_weighted_returns(returns, etf_weights)
helper.plot_returns(index_weighted_returns, 'Index Returns')
helper.plot_returns(etf_weighted_returns, 'ETF Returns')

## Cumulative Returns
Implement `calculate_cumulative_returns` to calculate the cumulative returns over time.

In [17]:
def calculate_cumulative_returns(returns):
    """
    Calculate cumulative returns.

    Parameters
    ----------
    returns : DataFrame
        Returns for each ticker and date

    Returns
    -------
    cumulative_returns : Pandas Series
        Cumulative returns for each date
    """
    #TODO: Implement function
    
    return (pd.Series([0]).append(returns.sum()) + 1).cumprod().iloc[1:]

project_tests.test_calculate_cumulative_returns(calculate_cumulative_returns)

Tests Passed


### View Data
Let's generate the etf and index cumulative returns using `calculate_cumulative_returns` and compare the two.

In [18]:
index_weighted_cumulative_returns = calculate_cumulative_returns(index_weighted_returns)
etf_weighted_cumulative_returns = calculate_cumulative_returns(etf_weighted_returns)
helper.plot_benchmark_returns(index_weighted_cumulative_returns, etf_weighted_cumulative_returns, 'Smart Beta ETF vs Index')

## Tracking Error
In order to check the performance of the smart beta portfolio, we can compare it against the index. Let's generate the tracking error using the helper function's `tracking_error` and graph it over time.

In [19]:
smart_beta_tracking_error = helper.tracking_error(index_weighted_cumulative_returns, etf_weighted_cumulative_returns)
helper.plot_tracking_error(smart_beta_tracking_error, 'Smart Beta Tracking Error')

# Part 2: Portfolio Optimization
In Part 2, you'll optimize the index you created in part 1. You'll use `cvxopt` to optimize the convex problem of finding the optimal weights for the portfolio. Just like before, we'll compare these results to the index.
## Covariance
Implement `get_covariance` to calculate the covariance of `returns` and `weighted_index_returns`. We'll use this to feed into our convex optimization function. By using covariance, we can prevent the optimizer from going all in on a few stocks.

In [20]:
def get_covariance(returns, weighted_index_returns):
    """
    Calculate covariance matrices.

    Parameters
    ----------
    returns : DataFrame
        Returns for each ticker and date
    weighted_index_returns : DataFrame
        Weighted index returns for each ticker and date

    Returns
    -------
    xtx, xty  : (2 dimensional Ndarray, 1 dimensional Ndarray)
    """
    assert returns.index.equals(weighted_index_returns.index)
    assert returns.columns.equals(weighted_index_returns.columns)
    
    #TODO: Implement function
    returns = returns.fillna(0)
    weighted_index_returns = weighted_index_returns.sum().fillna(0)

    xtx = returns.dot(returns.T)
    xty = returns.dot(np.matrix(weighted_index_returns).T)[0]

    return xtx.values, xty.values

project_tests.test_get_covariance(get_covariance)

Tests Passed


### View Data
Let's look the the covariance generated from `get_covariance`.

In [21]:
xtx, xty = get_covariance(returns, index_weighted_returns)
xtx = pd.DataFrame(xtx, returns.index, returns.index)
xty = pd.Series(xty, returns.index)
helper.plot_covariance(xty, xtx)

## Quadratic Programming
Now that you have the covariance, we can use this to optimize the weights. Run the following cell to generate optimal weights using helper function's `solve_qp`.

In [22]:
raw_optim_etf_weights = helper.solve_qp(xtx.values, xty.values)
raw_optim_etf_weights_per_date = np.tile(raw_optim_etf_weights, (len(returns.columns), 1))
optim_etf_weights = pd.DataFrame(raw_optim_etf_weights_per_date.T, returns.index, returns.columns)

## Optimized Portfolio
With our optimized etf weights built using quadratic programming, let's compare it to the index. Run the next cell to calculate the optimized etf returns and compare the returns to the index returns.

In [23]:
optim_etf_returns = generate_weighted_returns(returns, optim_etf_weights)
optim_etf_cumulative_returns = calculate_cumulative_returns(optim_etf_returns)
helper.plot_benchmark_returns(index_weighted_cumulative_returns, optim_etf_cumulative_returns, 'Optimized ETF vs Index')

optim_etf_tracking_error = helper.tracking_error(index_weighted_cumulative_returns, optim_etf_cumulative_returns)
helper.plot_tracking_error(optim_etf_tracking_error, 'Optimized ETF Tracking Error')

## Rebalance Portfolio
The optimized etf portfolio used different weights for each day. After calculating in transaction fees, this amount of turnover to the portfolio can reduce the total returns. Let's find the optimal times to rebalance the portfolio instead of doing it every day.

Implement `rebalance_portfolio` to rebalance a portfolio.

In [24]:
def rebalance_portfolio(returns, weighted_index_returns, shift_size, chunk_size):
    """
    Get weights for each rebalancing of the portfolio.

    Parameters
    ----------
    returns : DataFrame
        Returns for each ticker and date
    weighted_index_returns : DataFrame
        Weighted index returns for each ticker and date
    shift_size : int
        The number of days between each rebalance
    chunk_size : int
        The number of days to look in the past for rebalancing

    Returns
    -------
    all_rebalance_weights  : list of Ndarrays
        The etf weights for each point they are rebalanced
    """
    assert returns.index.equals(weighted_index_returns.index)
    assert returns.columns.equals(weighted_index_returns.columns)
    assert shift_size > 0
    assert chunk_size >= 0
    
    #TODO: Implement function
    date_len = returns.shape[1]
    all_rebalance_weights = []

    for shift in range(chunk_size, date_len, shift_size):
        start_idx = shift - chunk_size
        xtx, xty = get_covariance(returns.iloc[:, start_idx:shift], weighted_index_returns.iloc[:, start_idx:shift])

        all_rebalance_weights.append(helper.solve_qp(xtx, xty))

    return all_rebalance_weights

project_tests.test_rebalance_portfolio(rebalance_portfolio)

Tests Passed


Run the following cell to rebalance the portfolio using `rebalance_portfolio`.

In [25]:
chunk_size = 250
shift_size = 5
all_rebalance_weights = rebalance_portfolio(returns, index_weighted_returns, shift_size, chunk_size)

## Portfolio Rebalance Cost
With the portfolio rebalanced, we need to use a metric to measure the cost of rebalancing the portfolio. Implement `get_rebalance_cost` to calculate the rebalance cost.

In [26]:
def get_rebalance_cost(all_rebalance_weights, shift_size, rebalance_count):
    """
    Get the cost of all the rebalancing.

    Parameters
    ----------
    all_rebalance_weights : list of Ndarrays
        ETF Returns for each ticker and date
    shift_size : int
        The number of days between each rebalance
    rebalance_count : int
        Number of times the portfolio was rebalanced

    Returns
    -------
    rebalancing_cost  : float
        The cost of all the rebalancing
    """
    assert shift_size > 0
    assert rebalance_count > 0
    
    #TODO: Implement function
    all_rebalance_weights_df = pd.DataFrame(np.array(all_rebalance_weights))
    rebalance_total = (all_rebalance_weights_df - all_rebalance_weights_df.shift(-1)).abs().sum().sum()

    return (shift_size / rebalance_count) * rebalance_total

project_tests.test_get_rebalance_cost(get_rebalance_cost)

Tests Passed


Run the following cell to get the rebalance cost from  `get_rebalance_cost`.

In [27]:
unconstrained_costs = get_rebalance_cost(all_rebalance_weights, shift_size, returns.shape[1])
print(unconstrained_costs)

0.107399657589


In [28]:
# IGNORE THIS CODE
# THIS CODE IS TEST CODE FOR BUILDING PROJECT
# THIS WILL BE REMOVED BEFORE FINAL PROJECT

# Error checking while refactoring
assert np.isclose(optim_etf_weights, np.load('check_data/po_weights.npy'), equal_nan=True).all()
assert np.isclose(optim_etf_tracking_error, np.load('check_data/po_tracking_error.npy'), equal_nan=True).all()
assert np.isclose(smart_beta_tracking_error, np.load('check_data/sb_tracking_error.npy'), equal_nan=True).all()

# Error checking while refactoring
assert np.isclose(unconstrained_costs, 0.10739965758876144), unconstrained_costs

FileNotFoundError: [Errno 2] No such file or directory: 'check_data/po_weights.npy'

## Submission
Now that you're done with the project, it's time to submit it. Click the submit button in the bottom right. One of our reviewers will give you feedback on your project with a pass or not passed grade. You can continue to the next section while you wait for feedback.