# Project 3: Smart Beta Portfolio and Portfolio Optimization
## Instructions
Each problem consists of a function to implement and instructions on how to implement the function.  The parts of the function that need to be implemented are marked with a `# TODO` comment. After implementing the function, run the cell to test it against the unit tests we've provided. For each problem, we provide one or more unit tests from our `project_tests` package. These unit tests won't tell you if your answer is correct, but will warn you of any major errors. Your code will be checked for the correct solution when you submit it to Udacity.

## Packages
When you implement the functions, you'll only need to use the [Pandas](https://pandas.pydata.org/) and [Numpy](http://www.numpy.org/) packages. Don't import any other packages, otherwise the grader will not be able to run your code.

The other packages that we're importing is `helper`, `project_helper`, and `project_tests`. These are custom packages built to help you solve the problems.  The `helper` and `project_helper` module contains utility functions and graph functions. The `project_tests` contains the unit tests for all the problems.
### Install Packages

In [None]:
import sys
!{sys.executable} -m pip install -r requirements.txt

### Load Packages

In [None]:
import pandas as pd
import numpy as np
import helper
import project_helper
import project_tests

## Market Data
The data source we'll be using is the [Wiki End of Day data](https://www.quandl.com/databases/WIKIP) hosted at [Quandl](https://www.quandl.com). This contains data for many stocks, but we'll just be looking at the S&P 500 stocks. We'll also make things a little easier to solve by narrowing our range of time from 2007-06-30 to 2017-09-30.
### Set API Key
Set the `quandl_api_key` variable to your Quandl api key. You can find your Quandl api key [here](https://www.quandl.com/account/api).

In [None]:
# TODO: Add your Quandl API Key
quandl_api_key  = ''

### Download Data

In [None]:
import os

snp500_file_path = 'data/tickers_SnP500.txt'
wiki_file_path = 'data/WIKI_PRICES.csv'
start_date, end_date = '2013-07-01', '2017-06-30'
use_columns = ['date', 'ticker', 'adj_close', 'adj_volume', 'ex-dividend']

if not os.path.exists(wiki_file_path):
    with open(snp500_file_path) as f:
        tickers = f.read().split()
    
    helper.download_quandl_dataset(quandl_api_key, 'WIKI', 'PRICES', wiki_file_path, use_columns, tickers, start_date, end_date)
else:
    print('Data already downloaded')

### Load Data

In [None]:
df = pd.read_csv(wiki_file_path)

### Create the Universe
We'll be selecting large dollar volume stocks for our stock universe. We're using this universe, since it is highly liquid.

In [None]:
percent_top_dollar = 0.2
high_volume_symbols = project_helper.large_dollar_volume_stocks(df, 'adj_close', 'adj_volume', percent_top_dollar)
df = df[df['ticker'].isin(high_volume_symbols)]

### 2-D Matrices
Here we convert df into multiple DataFrames for each OHLC. We could use a multiindex, but that just stacks the columns for each ticker. We want to be able to apply calculations without using groupby each time.

In [None]:
close = df.reset_index().pivot(index='date', columns='ticker', values='adj_close')
volume = df.reset_index().pivot(index='date', columns='ticker', values='adj_volume')
ex_dividend = df.reset_index().pivot(index='date', columns='ticker', values='ex-dividend')

### View Data
To see what one of these 2-d matrices looks like, let's take a look at the closing prices matrix.

In [None]:
project_helper.print_dataframe(close)

# Part 1: Smart Beta Portfolio
In Part 1 of this project, you'll build a smart beta ETF using dividend yield. You'll compare this ETF to an index to see how well it performs. To get the index, let's first generate the weights.
## Index Weights
The index we'll be using is based on large dollar volume stocks. Implement `generate_dollar_volume_weights` to generate the weights for this index. For each date, generate the weights based on dollar volume traded for that date. For example, assume the following is close prices and volume data:
```
                 Prices
               A         B         ...
2013-07-08     2         2         ...
2013-07-09     5         6         ...
2013-07-10     1         2         ...
2013-07-11     6         5         ...
...            ...       ...       ...

                 Volume
               A         B         ...
2013-07-08     100       340       ...
2013-07-09     240       220       ...
2013-07-10     120       500       ...
2013-07-11     10        100       ...
...            ...       ...       ...
```
The weights created from the function `generate_dollar_volume_weights` should be the following:
```
               A         B         ...
2013-07-08     0.126..   0.194..   ...
2013-07-09     0.759..   0.377..   ...
2013-07-10     0.075..   0.285..   ...
2013-07-11     0.037..   0.142..   ...
...            ...       ...       ...
```

In [None]:
def generate_dollar_volume_weights(close, volume):
    """
    Generate dollar volume weights.

    Parameters
    ----------
    close : DataFrame
        Close price for each ticker and date
    volume : str
        Volume for each ticker and date

    Returns
    -------
    dollar_volume_weights : DataFrame
        The dollar volume weights for each ticker and date
    """
    assert close.index.equals(volume.index)
    assert close.columns.equals(volume.columns)
    
    #TODO: Implement function

    return None

project_tests.test_generate_dollar_volume_weights(generate_dollar_volume_weights)

### View Data
Let's generate the index weights using `generate_dollar_volume_weights` and view them using a heatmap.

In [None]:
index_weights = generate_dollar_volume_weights(close, volume)
project_helper.plot_weights(index_weights, 'Index Weights')

## ETF Weights
Now that we have the index weights, it's time to build the weights for the smart beta ETF. Let's build a portfolio that is based on dividends. This is a common factor used to build portfolios. Unlike most portfolios, we'll be using a single factor for simplicity.

Implement `calculate_dividend_weights` to returns the weights for each stock based on its total dividend yield over time. This is similar to generating the weight for the index, but it's dividend data instead.
For example, assume the following is ex_dividend data:
```
                 Prices
               A         B
2013-07-08     0         0
2013-07-09     0         1
2013-07-10     0.5       0
2013-07-11     0         0
2013-07-12     2         0
...            ...       ...
```
The weights created from the function `calculate_dividend_weights` should be the following:
```
               A         B
2013-07-08     NaN       NaN
2013-07-09     0         1
2013-07-10     0.333..   0.666..
2013-07-11     0.333..   0.666..
2013-07-12     0.714..   0.285..
...            ...       ...
```

In [None]:
def calculate_dividend_weights(ex_dividend):
    """
    Calculate dividend weights.

    Parameters
    ----------
    ex_dividend : DataFrame
        Ex-dividend for each stock and date

    Returns
    -------
    dividend_weights : DataFrame
        Weights for each stock and date
    """
    #TODO: Implement function

    return None

project_tests.test_calculate_dividend_weights(calculate_dividend_weights)

### View Data
Just like the index weights, let's generate the ETF weights and view them using a heatmap.

In [None]:
etf_weights = calculate_dividend_weights(ex_dividend)
project_helper.plot_weights(etf_weights, 'ETF Weights')

## Returns
Implement `generate_returns` to generate returns data for all the stocks and dates from price data. You might notice we're implementing returns and not log returns. Since we're not dealing with volatility, we don't have to use log returns.

In [None]:
def generate_returns(prices):
    """
    Generate returns for ticker and date.

    Parameters
    ----------
    prices : DataFrame
        Price for each ticker and date

    Returns
    -------
    returns : Dataframe
        The returns for each ticker and date
    """
    #TODO: Implement function

    return None

project_tests.test_generate_returns(generate_returns)

### View Data
Let's generate the closing returns using `generate_returns` and view them using a heatmap.

In [None]:
returns = generate_returns(close)
project_helper.plot_returns(returns, 'Close Returns')

## Weighted Returns
With the returns of each stock computed, we can use it to compute the returns for an index or ETF. Implement `generate_weighted_returns` to create weighted returns using the returns and weights.

In [None]:
def generate_weighted_returns(returns, weights):
    """
    Generate weighted returns.

    Parameters
    ----------
    returns : DataFrame
        Returns for each ticker and date
    weights : DataFrame
        Weights for each ticker and date

    Returns
    -------
    weighted_returns : DataFrame
        Weighted returns for each ticker and date
    """
    assert returns.index.equals(weights.index)
    assert returns.columns.equals(weights.columns)
    
    #TODO: Implement function

    return None

project_tests.test_generate_weighted_returns(generate_weighted_returns)

### View Data
Let's generate the ETF and index returns using `generate_weighted_returns` and view them using a heatmap.

In [None]:
index_weighted_returns = generate_weighted_returns(returns, index_weights)
etf_weighted_returns = generate_weighted_returns(returns, etf_weights)
project_helper.plot_returns(index_weighted_returns, 'Index Returns')
project_helper.plot_returns(etf_weighted_returns, 'ETF Returns')

## Cumulative Returns
To compare performance between the ETF and Index, we're going to calculate the tracking error. Before we do that, we first need to calculate the index and ETF comulative returns. Implement `calculate_cumulative_returns` to calculate the cumulative returns over time given the returns.

In [None]:
def calculate_cumulative_returns(returns):
    """
    Calculate cumulative returns.

    Parameters
    ----------
    returns : DataFrame
        Returns for each ticker and date

    Returns
    -------
    cumulative_returns : Pandas Series
        Cumulative returns for each date
    """
    #TODO: Implement function
    
    return None

project_tests.test_calculate_cumulative_returns(calculate_cumulative_returns)

### View Data
Let's generate the ETF and index cumulative returns using `calculate_cumulative_returns` and compare the two.

In [None]:
index_weighted_cumulative_returns = calculate_cumulative_returns(index_weighted_returns)
etf_weighted_cumulative_returns = calculate_cumulative_returns(etf_weighted_returns)
project_helper.plot_benchmark_returns(index_weighted_cumulative_returns, etf_weighted_cumulative_returns, 'Smart Beta ETF vs Index')

## Tracking Error
In order to check the performance of the smart beta portfolio, we can calculate the tracking error against the index. Implement `tracking_error` to return the tracking error between the ETF and index over time.

For reference, we'll be using the following tracking error function:
$$ TE = \sqrt{\frac{\sum_{i=1}^{n}(R_{P} - R_{B})^{2}}{N-1}} $$

Where the $ R_{P}$ variable is the etf returns and $ R_{B} $ varable is the index returns.

In [None]:
def tracking_error(index_weighted_cumulative_returns, etf_weighted_cumulative_returns):
    """
    Calculate the tracking error.

    Parameters
    ----------
    index_weighted_cumulative_returns : Pandas Series
        The weighted index Cumulative returns for each date
    etf_weighted_cumulative_returns : Pandas Series
        The weighted ETF Cumulative returns for each date

    Returns
    -------
    tracking_error  : Pandas Series
        The tracking error for each date
    """
    assert index_weighted_cumulative_returns.index.equals(etf_weighted_cumulative_returns.index)
    
    #TODO: Implement function

    return None

project_tests.test_tracking_error(tracking_error)

### View Data
Let's generate the tracking error using `tracking_error` and graph it over time.

In [None]:
smart_beta_tracking_error = tracking_error(index_weighted_cumulative_returns, etf_weighted_cumulative_returns)
project_helper.plot_tracking_error(smart_beta_tracking_error, 'Smart Beta Tracking Error')

# Part 2: Portfolio Optimization
In Part 2, you'll optimize the index you created in part 1. You'll optimize a convex problem to find the optimal weights for this portfolio. Just like before, we'll compare these results to the index.
## Covariance
Implement `get_covariance_returns` to calculate the covariance of the `returns`. We'll use this to feed into our convex optimization function. By using covariance, we can prevent the optimizer from going all in on a few stocks.

_Note: We reccommend using Numpy's cov function_

In [None]:
def get_covariance_returns(returns):
    """
    Calculate covariance matrices.

    Parameters
    ----------
    returns : DataFrame
        Returns for each ticker and date

    Returns
    -------
    returns_covariance  : 2 dimensional Ndarray
        The covariance of the returns
    """
    #TODO: Implement function
    
    return None

project_tests.test_get_covariance_returns(get_covariance_returns)

### View Data
Let's look at the covariance generated from `get_covariance_returns`.

In [None]:
covariance_returns = get_covariance_returns(returns)
covariance_returns = pd.DataFrame(covariance_returns, returns.columns, returns.columns)

covariance_returns_correlation = np.linalg.inv(np.diag(np.sqrt(np.diag(covariance_returns))))
covariance_returns_correlation = pd.DataFrame(
    covariance_returns_correlation.dot(covariance_returns).dot(covariance_returns_correlation),
    covariance_returns.index,
    covariance_returns.columns)

project_helper.plot_covariance_returns_correlation(
    covariance_returns_correlation,
    'Covariance Returns Correlation Matrix')

## Quadratic Programming
Now that you have the covariance of the returns, we can use this to optimize the weights. Implement `get_optimal_weights` to find $ X $, the optimal weights, by minimizing $ X^{T}PX - s\sqrt{\lambda_{max}(X^{T}X)} $ where $ P $ is returns covariance matrix and $ s $ is the penalty factor for weights that deviate from the index.

We'll also use the following constraints to generate valid weights for the ETF:
- $ \sum X_{i} = 1 $
- $ X_{i} >= 0 $

In [None]:
import cvxpy as cvx

def get_optimal_weights(covariance_returns, index_weights, scale=2.0):
    """
    Find the optimal weights.

    Parameters
    ----------
    covariance_returns : 2 dimensional Ndarray
        The covariance of the returns
    index_weights : Pandas Series
        Index weights for all tickers at a period in time
    scale : int
        The penalty factor for weights the deviate from the index 
    Returns
    -------
    x : 1 dimensional Ndarray
        The solution for x
    """
    assert len(covariance_returns.shape) == 2
    assert len(index_weights.shape) == 1
    assert covariance_returns.shape[0] == covariance_returns.shape[1]  == index_weights.shape[0]

    #TODO: Implement function
    
    return None

project_tests.test_get_optimal_weights(get_optimal_weights)

## Optimized Portfolio
Using the `get_optimal_weights` function, let's generate the optimal ETF weights without rebalanceing. We can do this by feeding in the covariance of the entire history of data. We also need to feed in a set of index weights. We'll go with the average weights of the index over time.

In [None]:
# The average index weights at each point in time
median_index_weights = (index_weights.cumsum().T / range(1, len(index_weights)+1)).T

raw_optimal_single_rebalance_etf_weights = get_optimal_weights(covariance_returns.values, median_index_weights.iloc[-1])
optimal_single_rebalance_etf_weights = pd.DataFrame(
    np.tile(raw_optimal_single_rebalance_etf_weights, (len(returns.index), 1)),
    returns.index,
    returns.columns)

With our ETF weights built, let's compare it to the index. Run the next cell to calculate the ETF returns and compare it to the index returns.

In [None]:
optim_etf_returns = generate_weighted_returns(returns, optimal_single_rebalance_etf_weights)
optim_etf_cumulative_returns = calculate_cumulative_returns(optim_etf_returns)
project_helper.plot_benchmark_returns(index_weighted_cumulative_returns, optim_etf_cumulative_returns, 'Optimized ETF vs Index')

optim_etf_tracking_error = tracking_error(index_weighted_cumulative_returns, optim_etf_cumulative_returns)
project_helper.plot_tracking_error(optim_etf_tracking_error, 'Optimized ETF Tracking Error')

## Rebalance Portfolio Over Time
The single optimized ETF portfolio used the same weights for the entire history. This might not be the optimal weights for the entire period. Let's rebalance the portfolio over the same period instead of using the same weights. Implement `rebalance_portfolio` to rebalance a portfolio.

Reblance the portfolio every n number of days, which is given as `shift_size`. When rebalancing, you should look back a certain number of days of data in the past, denoted as `chunk_size`. Using this data, compute the optoimal weights using `get_optimal_weights` and `get_covariance_returns`.

In [None]:
def rebalance_portfolio(returns, median_index_weights, shift_size, chunk_size):
    """
    Get weights for each rebalancing of the portfolio.

    Parameters
    ----------
    returns : DataFrame
        Returns for each ticker and date
    median_index_weights : DataFrame
        Median index weight for each ticker and date
    shift_size : int
        The number of days between each rebalance
    chunk_size : int
        The number of days to look in the past for rebalancing

    Returns
    -------
    all_rebalance_weights  : list of Ndarrays
        The ETF weights for each point they are rebalanced
    """
    assert returns.index.equals(median_index_weights.index)
    assert returns.columns.equals(median_index_weights.columns)
    assert shift_size > 0
    assert chunk_size >= 0
    
    #TODO: Implement function
    
    return None

project_tests.test_rebalance_portfolio(rebalance_portfolio)

Run the following cell to rebalance the portfolio using `rebalance_portfolio`.

In [None]:
chunk_size = 250
shift_size = 5
all_rebalance_weights = rebalance_portfolio(returns, median_index_weights, shift_size, chunk_size)

## Portfolio Turnover
With the portfolio rebalanced, we need to use a metric to measure the cost of rebalancing the portfolio. Implement `get_portfolio_turnover` to calculate the annual portfolio turnover. You can calculate this by multiplying the average turnover by the number of rebalances in a year.

In [None]:
def get_portfolio_turnover(all_rebalance_weights, shift_size, rebalance_count, n_trading_days_in_year=252):
    """
    Calculage portfolio turnover.

    Parameters
    ----------
    all_rebalance_weights : list of Ndarrays
        The ETF weights for each point they are rebalanced
    shift_size : int
        The number of days between each rebalance
    rebalance_count : int
        Number of times the portfolio was rebalanced
    n_trading_days_in_year: int
        Number of trading days in a year

    Returns
    -------
    portfolio_turnover  : float
        The portfolio turnover
    """
    assert shift_size > 0
    assert rebalance_count > 0
    
    #TODO: Implement function
    
    return None

project_tests.test_get_portfolio_turnover(get_portfolio_turnover)

Run the following cell to get the portfolio turnover from  `get_portfolio turnover`.

In [None]:
print(get_portfolio_turnover(all_rebalance_weights, shift_size, returns.shape[1]))

That's it! You've built a smart beta portfolio in part 1 and did portfolio optimization in part 2. You can now submit your project.

## Submission
Now that you're done with the project, it's time to submit it. Click the submit button in the bottom right. One of our reviewers will give you feedback on your project with a pass or not passed grade. You can continue to the next section while you wait for feedback.