# Project 1: Trading with Momentum
## Instructions
Each problem consists of a function to implement and instructions on how to implement the function.  The parts of the function that need to be implemented are marked with a `# TODO` comment. After implementing the function, run the cell to test it against the unit tests we've provided. For each problem, we provide one or more unit tests from our `project_tests` package. These unit tests won't tell you if your answer is correct, but will warn you of any major errors. Your code will be checked for the correct solution when you submit it to Udacity.

## Packages
When you implement the functions, you'll only need to you use the packages you've used in the classroom, like [Pandas](https://pandas.pydata.org/) and [Numpy](http://www.numpy.org/). These packages will be imported for you. We recommend you don't add any import statements, otherwise the grader might not be able to run your code.

The other packages that we're importing are `helper`, `project_helper`, and `project_tests`. These are custom packages built to help you solve the problems.  The `helper` and `project_helper` module contains utility functions and graph functions. The `project_tests` contains the unit tests for all the problems.

<br>

## Install Packages

<br>

### [Click here to see the instructions for installing packages](install_packages.ipynb)

<br>

### Load Packages

In [2]:
import pandas as pd
import numpy as np
import helper
import project_helper
import project_tests

## Market Data
### Load Data
The data we use for most of the projects is end of day data. This contains data for many stocks, but we'll be looking at stocks in the S&P 500. We also made things a little easier to run by narrowing down our range of time period instead of using all of the data.

In [3]:
df = pd.read_csv('eod-quotemedia.csv', parse_dates=['date'], index_col=False)

close = df.reset_index().pivot(index='date', columns='ticker', values='adj_close')

print(close)

ticker               A         AAL          AAP         AAPL        ABBV  \
date                                                                       
2013-07-01 29.99418563 16.17609308  81.13821681  53.10917319 34.92447839   
2013-07-02 29.65013670 15.81983388  80.72207258  54.31224742 35.42807578   
2013-07-03 29.70518453 16.12794994  81.23729877  54.61204262 35.44486235   
2013-07-05 30.43456826 16.21460758  81.82188233  54.17338125 35.85613355   
2013-07-08 30.52402098 16.31089385  82.95141667  53.86579916 36.66188936   
...                ...         ...          ...          ...         ...   
2017-06-26 58.57854478 48.36234805 121.52159207 143.57270901 70.35520945   
2017-06-27 58.22256443 48.08474540 121.69121741 141.51491885 70.01668424   
2017-06-28 58.73675827 48.82832394 116.45278767 143.58255490 70.52930812   
2017-06-29 58.27398382 49.19515602 115.79424221 141.46568942 70.10373358   
2017-06-30 58.77942143 49.88916265 116.33305213 141.80044954 70.13275003   

ticker     

### View Data
Run the cell below to see what the data looks like for `close`.

In [4]:
project_helper.print_dataframe(close)

### Stock Example
Let's see what a single stock looks like from the closing prices. For this example and future display examples in this project, we'll use Apple's stock (AAPL). If we tried to graph all the stocks, it would be too much information.

In [5]:
apple_ticker = 'AAPL'
project_helper.plot_stock(close[apple_ticker], '{} Stock'.format(apple_ticker))

## Resample Adjusted Prices

The trading signal you'll develop in this project does not need to be based on daily prices, for instance, you can use month-end prices to perform trading once a month. To do this, you must first resample the daily adjusted closing prices into monthly buckets, and select the last observation of each month.

Implement the `resample_prices` to resample `close_prices` at the sampling frequency of `freq`.

In [6]:
def resample_prices(close_prices, freq='M'):
    """
    Resample close prices for each ticker at specified frequency.
    
    Parameters
    ----------
    close_prices : DataFrame
        Close prices for each ticker and date
    freq : str
        What frequency to sample at
        For valid freq choices, see http://pandas.pydata.org/pandas-docs/stable/timeseries.html#offset-aliases
    
    Returns
    -------
    prices_resampled : DataFrame
        Resampled prices for each ticker and date
    """
    # TODO: Implement Function
    
    prices_resampled = close_prices.resample(freq).last()
    
    return prices_resampled

project_tests.test_resample_prices(resample_prices)

Tests Passed


### View Data
Let's apply this function to `close` and view the results.

In [7]:
monthly_close = resample_prices(close)
project_helper.plot_resampled_prices(
    monthly_close.loc[:, apple_ticker],
    close.loc[:, apple_ticker],
    '{} Stock - Close Vs Monthly Close'.format(apple_ticker))

ValueError: 
    Invalid value of type 'builtins.str' received for the 'mode' property of scatter
        Received value: 'line'

    The 'mode' property is a flaglist and may be specified
    as a string containing:
      - Any combination of ['lines', 'markers', 'text'] joined with '+' characters
        (e.g. 'lines+markers')
        OR exactly one of ['none'] (e.g. 'none')

## Compute Log Returns

Compute log returns ($R_t$) from prices ($P_t$) as your primary momentum indicator:

$$R_t = log_e(P_t) - log_e(P_{t-1})$$

Implement the `compute_log_returns` function below, such that it accepts a dataframe (like one returned by `resample_prices`), and produces a similar dataframe of log returns. Use Numpy's [log function](https://docs.scipy.org/doc/numpy/reference/generated/numpy.log.html) to help you calculate the log returns.

In [8]:
def compute_log_returns(prices):
    """
    Compute log returns for each ticker.
    
    Parameters
    ----------
    prices : DataFrame
        Prices for each ticker and date
    
    Returns
    -------
    log_returns : DataFrame
        Log returns for each ticker and date
    """
    # TODO: Implement Function
    
    log_returns = np.log(prices/prices.shift(1))
    
    return log_returns

project_tests.test_compute_log_returns(compute_log_returns)

Tests Passed


### View Data
Using the same data returned from `resample_prices`, we'll generate the log returns.

In [9]:
monthly_close_returns = compute_log_returns(monthly_close)
project_helper.plot_returns(
    monthly_close_returns.loc[:, apple_ticker],
    'Log Returns of {} Stock (Monthly)'.format(apple_ticker))

ValueError: 
    Invalid value of type 'builtins.str' received for the 'mode' property of scatter
        Received value: 'line'

    The 'mode' property is a flaglist and may be specified
    as a string containing:
      - Any combination of ['lines', 'markers', 'text'] joined with '+' characters
        (e.g. 'lines+markers')
        OR exactly one of ['none'] (e.g. 'none')

## Shift Returns
Implement the `shift_returns` function to shift the log returns to the previous or future returns in the time series. For example, the parameter `shift_n` is 2 and `returns` is the following:

```
                           Returns
               A         B         C         D
2013-07-08     0.015     0.082     0.096     0.020     ...
2013-07-09     0.037     0.095     0.027     0.063     ...
2013-07-10     0.094     0.001     0.093     0.019     ...
2013-07-11     0.092     0.057     0.069     0.087     ...
...            ...       ...       ...       ...
```

the output of the `shift_returns` function would be:
```
                        Shift Returns
               A         B         C         D
2013-07-08     NaN       NaN       NaN       NaN       ...
2013-07-09     NaN       NaN       NaN       NaN       ...
2013-07-10     0.015     0.082     0.096     0.020     ...
2013-07-11     0.037     0.095     0.027     0.063     ...
...            ...       ...       ...       ...
```
Using the same `returns` data as above, the `shift_returns` function should generate the following with `shift_n` as -2:
```
                        Shift Returns
               A         B         C         D
2013-07-08     0.094     0.001     0.093     0.019     ...
2013-07-09     0.092     0.057     0.069     0.087     ...
...            ...       ...       ...       ...       ...
...            ...       ...       ...       ...       ...
...            NaN       NaN       NaN       NaN       ...
...            NaN       NaN       NaN       NaN       ...
```
_Note: The "..." represents data points we're not showing._

In [10]:
def shift_returns(returns, shift_n):
    """
    Generate shifted returns
    
    Parameters
    ----------
    returns : DataFrame
        Returns for each ticker and date
    shift_n : int
        Number of periods to move, can be positive or negative
    
    Returns
    -------
    shifted_returns : DataFrame
        Shifted returns for each ticker and date
    """
    # TODO: Implement Function
    
    shifted_returns = returns.shift(shift_n)
    
    return shifted_returns

project_tests.test_shift_returns(shift_returns)

Tests Passed


### View Data
Let's get the previous month's and next month's returns.

In [95]:
prev_returns = shift_returns(monthly_close_returns, 1)
lookahead_returns = shift_returns(monthly_close_returns, -1)

project_helper.plot_shifted_returns(
    prev_returns.loc[:, apple_ticker],
    monthly_close_returns.loc[:, apple_ticker],
    'Previous Returns of {} Stock'.format(apple_ticker))
project_helper.plot_shifted_returns(
    lookahead_returns.loc[:, apple_ticker],
    monthly_close_returns.loc[:, apple_ticker],
    'Lookahead Returns of {} Stock'.format(apple_ticker))

ValueError: 
    Invalid value of type 'builtins.str' received for the 'mode' property of scatter
        Received value: 'line'

    The 'mode' property is a flaglist and may be specified
    as a string containing:
      - Any combination of ['lines', 'markers', 'text'] joined with '+' characters
        (e.g. 'lines+markers')
        OR exactly one of ['none'] (e.g. 'none')

## Generate Trading Signal

A trading signal is a sequence of trading actions, or results that can be used to take trading actions. A common form is to produce a "long" and "short" portfolio of stocks on each date (e.g. end of each month, or whatever frequency you desire to trade at). This signal can be interpreted as rebalancing your portfolio on each of those dates, entering long ("buy") and short ("sell") positions as indicated.

Here's a strategy that we will try:
> For each month-end observation period, rank the stocks by _previous_ returns, from the highest to the lowest. Select the top performing stocks for the long portfolio, and the bottom performing stocks for the short portfolio.

Implement the `get_top_n` function to get the top performing stock for each month. Get the top performing stocks from `prev_returns` by assigning them a value of 1. For all other stocks, give them a value of 0. For example, using the following `prev_returns`:

```
                                     Previous Returns
               A         B         C         D         E         F         G
2013-07-08     0.015     0.082     0.096     0.020     0.075     0.043     0.074
2013-07-09     0.037     0.095     0.027     0.063     0.024     0.086     0.025
...            ...       ...       ...       ...       ...       ...       ...
```

The function `get_top_n` with `top_n` set to 3 should return the following:
```
                                     Previous Returns
               A         B         C         D         E         F         G
2013-07-08     0         1         1         0         1         0         0
2013-07-09     0         1         0         1         0         1         0
...            ...       ...       ...       ...       ...       ...       ...
```
*Note: You may have to use Panda's [`DataFrame.iterrows`](https://pandas.pydata.org/pandas-docs/version/0.21/generated/pandas.DataFrame.iterrows.html) with [`Series.nlargest`](https://pandas.pydata.org/pandas-docs/version/0.21/generated/pandas.Series.nlargest.html) in order to implement the function. This is one of those cases where creating a vectorization solution is too difficult.*

In [87]:
# 원본 DB

prev_returns

ticker,A,AAL,AAP,AAPL,ABBV,ABC,ABT,ACN,ADBE,ADI,...,HP,HPE,HPQ,HRB,HRL,HRS,HS,HSIC,HST,HSY
date,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1,Unnamed: 7_level_1,Unnamed: 8_level_1,Unnamed: 9_level_1,Unnamed: 10_level_1,Unnamed: 11_level_1,Unnamed: 12_level_1,Unnamed: 13_level_1,Unnamed: 14_level_1,Unnamed: 15_level_1,Unnamed: 16_level_1,Unnamed: 17_level_1,Unnamed: 18_level_1,Unnamed: 19_level_1,Unnamed: 20_level_1,Unnamed: 21_level_1
2013-07-31,,,,,,,,,,,...,,,,,,,,,,
2013-08-31,,,,,,,,,,,...,,,,,,,,,,
2013-09-30,0.04181412,-0.18015337,-0.02977582,0.08044762,-0.0651837,-0.01609335,-0.09440968,-0.0213619,-0.03289558,-0.05749847,...,0.00487149,,-0.13933369,-0.1187778,-0.02196317,-0.0077397,,-0.0271395,-0.04758708,-0.02609874
2013-10-31,0.09657861,0.15979244,0.03282284,-0.02171531,0.04855545,0.07086509,-0.00420927,0.01905603,0.12689741,0.01650096,...,0.08961216,,-0.05585991,-0.03841603,0.01651743,0.05337608,,0.02651328,0.04354002,0.00596369
2013-11-30,-0.00960698,0.14734639,0.18195865,0.09201927,0.08860637,0.06693948,0.10058564,0.01124304,0.04296064,0.04671322,...,0.11754129,,0.14930673,0.06463229,0.03522949,0.04387971,,0.08020208,0.0486015,0.07033251
2013-12-31,0.05388057,0.06647111,0.01828314,0.06772063,0.0,0.07997603,0.04389252,0.05260536,0.04613431,-0.02215021,...,-0.0006416,,0.11536366,-0.0195284,0.03526586,0.04691021,,0.01386764,-0.00757579,-0.01891288
2014-01-31,0.06769609,0.07267716,0.09197258,0.00886237,0.08616823,-0.00312412,0.00365918,0.05950782,0.05314171,0.06162558,...,0.0879633,,0.02808773,0.04738926,0.00332631,0.07895702,,0.0022781,0.06113468,0.00350299
2014-02-28,0.01664682,0.28421071,0.03663543,-0.11394918,-0.06220213,-0.04494321,-0.03893724,-0.02887307,-0.01157325,-0.05364189,...,0.04602253,,0.03580586,0.04576842,0.01031219,-0.00675533,,0.0054986,-0.05552576,0.02207281
2014-03-31,-0.02120344,0.09598737,0.10373913,0.05588347,0.03355617,0.01278407,0.08167803,0.0425231,0.14797715,0.05873111,...,0.12190871,,0.02989353,0.03997953,0.04328375,0.06260776,,0.03548442,0.06728759,0.06708056
2014-04-30,-0.01790035,-0.00897599,-0.00629368,0.01975642,0.0095788,-0.03387614,-0.03244633,-0.04452811,-0.04302219,0.04463996,...,0.08545743,,0.08457817,-0.04023358,0.03763885,-0.00337563,,0.00276834,0.03563695,-0.01350986


In [88]:
# 각 행을 열로 변환하여 출력

for date, row in prev_returns.iterrows():
    print(date, row)

2013-07-31 00:00:00 ticker
A      nan
AAL    nan
AAP    nan
AAPL   nan
ABBV   nan
        ..
HRS    nan
HS     nan
HSIC   nan
HST    nan
HSY    nan
Name: 2013-07-31 00:00:00, Length: 233, dtype: float64
2013-08-31 00:00:00 ticker
A      nan
AAL    nan
AAP    nan
AAPL   nan
ABBV   nan
        ..
HRS    nan
HS     nan
HSIC   nan
HST    nan
HSY    nan
Name: 2013-08-31 00:00:00, Length: 233, dtype: float64
2013-09-30 00:00:00 ticker
A       0.04181412
AAL    -0.18015337
AAP    -0.02977582
AAPL    0.08044762
ABBV   -0.06518370
           ...    
HRS    -0.00773970
HS             nan
HSIC   -0.02713950
HST    -0.04758708
HSY    -0.02609874
Name: 2013-09-30 00:00:00, Length: 233, dtype: float64
2013-10-31 00:00:00 ticker
A       0.09657861
AAL     0.15979244
AAP     0.03282284
AAPL   -0.02171531
ABBV    0.04855545
           ...    
HRS     0.05337608
HS             nan
HSIC    0.02651328
HST     0.04354002
HSY     0.00596369
Name: 2013-10-31 00:00:00, Length: 233, dtype: float64
2013-11-30 0

In [111]:
 # Long: 월 수익률 상위 10종목

long_chart = prev_returns.copy()

for date, row in prev_returns.iterrows():
    
    row_long = row.nlargest(10)
    
    long_chart.loc[date].loc[row_long.index] = 1
    long_chart.loc[date].loc[set(row.index) - set(row_long.index)] = 0

long_chart.astype(np.int64)

ticker,A,AAL,AAP,AAPL,ABBV,ABC,ABT,ACN,ADBE,ADI,...,HP,HPE,HPQ,HRB,HRL,HRS,HS,HSIC,HST,HSY
date,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1,Unnamed: 7_level_1,Unnamed: 8_level_1,Unnamed: 9_level_1,Unnamed: 10_level_1,Unnamed: 11_level_1,Unnamed: 12_level_1,Unnamed: 13_level_1,Unnamed: 14_level_1,Unnamed: 15_level_1,Unnamed: 16_level_1,Unnamed: 17_level_1,Unnamed: 18_level_1,Unnamed: 19_level_1,Unnamed: 20_level_1,Unnamed: 21_level_1
2013-07-31,0,0,0,0,0,0,0,0,0,0,...,0,0,0,0,0,0,0,0,0,0
2013-08-31,0,0,0,0,0,0,0,0,0,0,...,0,0,0,0,0,0,0,0,0,0
2013-09-30,0,0,0,1,0,0,0,0,0,0,...,0,0,0,0,0,0,0,0,0,0
2013-10-31,0,1,0,0,0,0,0,0,1,0,...,0,0,0,0,0,0,0,0,0,0
2013-11-30,0,1,1,0,0,0,0,0,0,0,...,0,0,1,0,0,0,0,0,0,0
2013-12-31,0,0,0,0,0,0,0,0,0,0,...,0,0,1,0,0,0,0,0,0,0
2014-01-31,0,0,0,0,0,0,0,0,0,0,...,0,0,0,0,0,0,0,0,0,0
2014-02-28,0,1,0,0,0,0,0,0,0,0,...,0,0,0,0,0,0,0,0,0,0
2014-03-31,0,0,0,0,0,0,0,0,1,0,...,0,0,0,0,0,0,0,0,0,0
2014-04-30,0,0,0,0,0,0,0,0,0,0,...,1,0,1,0,0,0,0,0,0,0


In [115]:
 # Short: 월 수익률 하위 10종목

short_chart = prev_returns.copy()

for date, row in prev_returns.iterrows():
    
    row_short = (-1*row).nlargest(10)
    # row_nsmallest = row.nsmallest(10) # 같은 결과

    short_chart.loc[date].loc[row_short.index] = 1
    short_chart.loc[date].loc[set(row.index) - set(row_short.index)] = 0

short_chart

ticker,A,AAL,AAP,AAPL,ABBV,ABC,ABT,ACN,ADBE,ADI,...,HP,HPE,HPQ,HRB,HRL,HRS,HS,HSIC,HST,HSY
date,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1,Unnamed: 7_level_1,Unnamed: 8_level_1,Unnamed: 9_level_1,Unnamed: 10_level_1,Unnamed: 11_level_1,Unnamed: 12_level_1,Unnamed: 13_level_1,Unnamed: 14_level_1,Unnamed: 15_level_1,Unnamed: 16_level_1,Unnamed: 17_level_1,Unnamed: 18_level_1,Unnamed: 19_level_1,Unnamed: 20_level_1,Unnamed: 21_level_1
2013-07-31,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,...,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0
2013-08-31,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,...,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0
2013-09-30,0.0,1.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,...,0.0,0.0,1.0,1.0,0.0,0.0,0.0,0.0,0.0,0.0
2013-10-31,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,...,0.0,0.0,1.0,1.0,0.0,0.0,0.0,0.0,0.0,0.0
2013-11-30,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,...,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0
2013-12-31,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,...,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0
2014-01-31,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,...,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0
2014-02-28,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,...,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0
2014-03-31,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,...,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0
2014-04-30,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,...,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0


In [93]:
def get_top_n(prev_returns, top_n):
    """
    Select the top performing stocks
    
    Parameters
    ----------
    prev_returns : DataFrame
        Previous shifted returns for each ticker and date
    top_n : int
        The number of top performing stocks to get
    
    Returns
    -------
    top_stocks : DataFrame
        Top stocks for each ticker and date marked with a 1
    """
    # TODO: Implement Function
    
    top_stocks = prev_returns.copy()
    
    for i, row in prev_returns.iterrows():
        row_nlargest = row.nlargest(top_n)
        top_stocks.loc[i].loc[row_nlargest.index] = 1.0
        top_stocks.loc[i].loc[set(row.index) - set(row_nlargest.index)] = 0
    
    result = top_stocks.astype(np.int64)
    
    return result

project_tests.test_get_top_n(get_top_n)

Tests Passed


### View Data
We want to get the best performing and worst performing stocks. To get the best performing stocks, we'll use the `get_top_n` function. To get the worst performing stocks, we'll also use the `get_top_n` function. However, we pass in `-1*prev_returns` instead of just `prev_returns`. Multiplying by negative one will flip all the positive returns to negative and negative returns to positive. Thus, it will return the worst performing stocks.

In [94]:
top_bottom_n = 50
df_long = get_top_n(prev_returns, top_bottom_n)
df_short = get_top_n(-1*prev_returns, top_bottom_n)
project_helper.print_top(df_long, 'Longed Stocks')
project_helper.print_top(df_short, 'Shorted Stocks')

10 Most Longed Stocks:
EXPE, AVGO, DLR, DAL, CNC, AMD, BBY, AYI, EA, FB
10 Most Shorted Stocks:
CHK, FCX, COG, EQT, GPS, APA, DVN, CXO, COP, HOG


## Projected Returns
It's now time to check if your trading signal has the potential to become profitable!

We'll start by computing the net returns this portfolio would return. For simplicity, we'll assume every stock gets an equal dollar amount of investment. This makes it easier to compute a portfolio's returns as the simple arithmetic average of the individual stock returns.

Implement the `portfolio_returns` function to compute the expected portfolio returns. Using `df_long` to indicate which stocks to long and `df_short` to indicate which stocks to short, calculate the returns using `lookahead_returns`. To help with calculation, we've provided you with `n_stocks` as the number of stocks we're investing in a single period.

In [117]:
df_long - df_short

ticker,A,AAL,AAP,AAPL,ABBV,ABC,ABT,ACN,ADBE,ADI,...,HP,HPE,HPQ,HRB,HRL,HRS,HS,HSIC,HST,HSY
date,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1,Unnamed: 7_level_1,Unnamed: 8_level_1,Unnamed: 9_level_1,Unnamed: 10_level_1,Unnamed: 11_level_1,Unnamed: 12_level_1,Unnamed: 13_level_1,Unnamed: 14_level_1,Unnamed: 15_level_1,Unnamed: 16_level_1,Unnamed: 17_level_1,Unnamed: 18_level_1,Unnamed: 19_level_1,Unnamed: 20_level_1,Unnamed: 21_level_1
2013-07-31,0,0,0,0,0,0,0,0,0,0,...,0,0,0,0,0,0,0,0,0,0
2013-08-31,0,0,0,0,0,0,0,0,0,0,...,0,0,0,0,0,0,0,0,0,0
2013-09-30,1,-1,0,1,-1,0,-1,0,0,0,...,0,0,-1,-1,0,0,0,0,0,0
2013-10-31,1,1,0,-1,0,0,-1,0,1,0,...,1,0,-1,-1,0,0,0,0,0,0
2013-11-30,-1,1,1,1,1,0,1,0,0,0,...,1,0,1,0,0,0,0,0,0,0
2013-12-31,1,1,0,1,0,1,0,1,0,-1,...,0,0,1,-1,0,0,0,0,0,0
2014-01-31,1,1,1,0,1,-1,0,0,0,0,...,1,0,0,0,0,1,0,-1,0,0
2014-02-28,0,1,1,-1,0,0,0,0,0,0,...,1,0,1,1,0,0,0,0,0,0
2014-03-31,-1,1,1,0,0,-1,0,0,1,0,...,1,0,0,0,0,0,0,0,0,0
2014-04-30,0,0,0,0,0,-1,-1,-1,-1,1,...,1,0,1,-1,0,0,0,0,0,0


In [120]:
lookahead_returns

ticker,A,AAL,AAP,AAPL,ABBV,ABC,ABT,ACN,ADBE,ADI,...,HP,HPE,HPQ,HRB,HRL,HRS,HS,HSIC,HST,HSY
date,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1,Unnamed: 7_level_1,Unnamed: 8_level_1,Unnamed: 9_level_1,Unnamed: 10_level_1,Unnamed: 11_level_1,Unnamed: 12_level_1,Unnamed: 13_level_1,Unnamed: 14_level_1,Unnamed: 15_level_1,Unnamed: 16_level_1,Unnamed: 17_level_1,Unnamed: 18_level_1,Unnamed: 19_level_1,Unnamed: 20_level_1,Unnamed: 21_level_1
2013-07-31,0.04181412,-0.18015337,-0.02977582,0.08044762,-0.0651837,-0.01609335,-0.09440968,-0.0213619,-0.03289558,-0.05749847,...,0.00487149,,-0.13933369,-0.1187778,-0.02196317,-0.0077397,,-0.0271395,-0.04758708,-0.02609874
2013-08-31,0.09657861,0.15979244,0.03282284,-0.02171531,0.04855545,0.07086509,-0.00420927,0.01905603,0.12689741,0.01650096,...,0.08961216,,-0.05585991,-0.03841603,0.01651743,0.05337608,,0.02651328,0.04354002,0.00596369
2013-09-30,-0.00960698,0.14734639,0.18195865,0.09201927,0.08860637,0.06693948,0.10058564,0.01124304,0.04296064,0.04671322,...,0.11754129,,0.14930673,0.06463229,0.03522949,0.04387971,,0.08020208,0.0486015,0.07033251
2013-10-31,0.05388057,0.06647111,0.01828314,0.06772063,0.0,0.07997603,0.04389252,0.05260536,0.04613431,-0.02215021,...,-0.0006416,,0.11536366,-0.0195284,0.03526586,0.04691021,,0.01386764,-0.00757579,-0.01891288
2013-11-30,0.06769609,0.07267716,0.09197258,0.00886237,0.08616823,-0.00312412,0.00365918,0.05950782,0.05314171,0.06162558,...,0.0879633,,0.02808773,0.04738926,0.00332631,0.07895702,,0.0022781,0.06113468,0.00350299
2013-12-31,0.01664682,0.28421071,0.03663543,-0.11394918,-0.06220213,-0.04494321,-0.03893724,-0.02887307,-0.01157325,-0.05364189,...,0.04602253,,0.03580586,0.04576842,0.01031219,-0.00675533,,0.0054986,-0.05552576,0.02207281
2014-01-31,-0.02120344,0.09598737,0.10373913,0.05588347,0.03355617,0.01278407,0.08167803,0.0425231,0.14797715,0.05873111,...,0.12190871,,0.02989353,0.03997953,0.04328375,0.06260776,,0.03548442,0.06728759,0.06708056
2014-02-28,-0.01790035,-0.00897599,-0.00629368,0.01975642,0.0095788,-0.03387614,-0.03244633,-0.04452811,-0.04302219,0.04463996,...,0.08545743,,0.08457817,-0.04023358,0.03763885,-0.00337563,,0.00276834,0.03563695,-0.01350986
2014-03-31,-0.03182502,-0.04270218,-0.04205794,0.09476126,0.02214224,-0.00627057,0.01187986,0.01803285,-0.06358573,-0.03543414,...,0.01008288,,0.021401,-0.06041762,-0.02840954,0.00490865,,-0.04401395,0.0580638,-0.0813846
2014-04-30,0.05227357,0.13552541,0.02346722,0.07577509,0.04229556,0.11924492,0.03225676,0.01521647,0.04516334,0.02805861,...,0.0176681,,0.01322135,0.0467439,0.03137512,0.05493198,,0.04635685,0.02849457,0.01634889


In [121]:
def portfolio_returns(df_long, df_short, lookahead_returns, n_stocks):
    """
    Compute expected returns for the portfolio, assuming equal investment in each long/short stock.
    
    Parameters
    ----------
    df_long : DataFrame
        Top stocks for each ticker and date marked with a 1
    df_short : DataFrame
        Bottom stocks for each ticker and date marked with a 1
    lookahead_returns : DataFrame
        Lookahead returns for each ticker and date
    n_stocks: int
        The number of stocks chosen for each month
    
    Returns
    -------
    portfolio_returns : DataFrame
        Expected portfolio returns for each ticker and date
    """
    # TODO: Implement Function
    
    df_long_short = df_long - df_short
    result = lookahead_returns * df_long_short / n_stocks
    
    return result

project_tests.test_portfolio_returns(portfolio_returns)

Tests Passed


### View Data
Time to see how the portfolio did.

In [122]:
expected_portfolio_returns = portfolio_returns(df_long, df_short, lookahead_returns, 2*top_bottom_n)
project_helper.plot_returns(expected_portfolio_returns.T.sum(), 'Portfolio Returns')

ValueError: 
    Invalid value of type 'builtins.str' received for the 'mode' property of scatter
        Received value: 'line'

    The 'mode' property is a flaglist and may be specified
    as a string containing:
      - Any combination of ['lines', 'markers', 'text'] joined with '+' characters
        (e.g. 'lines+markers')
        OR exactly one of ['none'] (e.g. 'none')

## Statistical Tests
### Annualized Rate of Return

In [123]:
expected_portfolio_returns_by_date = expected_portfolio_returns.T.sum().dropna()
portfolio_ret_mean = expected_portfolio_returns_by_date.mean()
portfolio_ret_ste = expected_portfolio_returns_by_date.sem()
portfolio_ret_annual_rate = (np.exp(portfolio_ret_mean * 12) - 1) * 100

print("""
Mean:                       {:.6f}
Standard Error:             {:.6f}
Annualized Rate of Return:  {:.2f}%
""".format(portfolio_ret_mean, portfolio_ret_ste, portfolio_ret_annual_rate))


Mean:                       0.001859
Standard Error:             0.001868
Annualized Rate of Return:  2.26%



The annualized rate of return allows you to compare the rate of return from this strategy to other quoted rates of return, which are usually quoted on an annual basis. 

### T-Test
Our null hypothesis ($H_0$) is that the actual mean return from the signal is zero. We'll perform a one-sample, one-sided t-test on the observed mean return, to see if we can reject $H_0$.

We'll need to first compute the t-statistic, and then find its corresponding p-value. The p-value will indicate the probability of observing a t-statistic equally or more extreme than the one we observed if the null hypothesis were true. A small p-value means that the chance of observing the t-statistic we observed under the null hypothesis is small, and thus casts doubt on the null hypothesis. It's good practice to set a desired level of significance or alpha ($\alpha$) _before_ computing the p-value, and then reject the null hypothesis if $p < \alpha$.

For this project, we'll use $\alpha = 0.05$, since it's a common value to use.

Implement the `analyze_alpha` function to perform a t-test on the sample of portfolio returns. We've imported the `scipy.stats` module for you to perform the t-test.

Note: [`scipy.stats.ttest_1samp`](https://docs.scipy.org/doc/scipy-1.0.0/reference/generated/scipy.stats.ttest_1samp.html) performs a two-sided test, so divide the p-value by 2 to get 1-sided p-value

In [124]:
expected_portfolio_returns_by_date

date
2013-07-31    0.00000000
2013-08-31    0.00000000
2013-09-30   -0.00824577
2013-10-31    0.00024467
2013-11-30   -0.00164563
2013-12-31   -0.00157677
2014-01-31    0.01063992
2014-02-28   -0.00274485
2014-03-31   -0.00916623
2014-04-30   -0.00097729
2014-05-31    0.00154189
2014-06-30    0.00653246
2014-07-31    0.00466940
2014-08-31    0.01548226
2014-09-30    0.01306488
2014-10-31    0.02614420
2014-11-30    0.00873356
2014-12-31    0.01298594
2015-01-31   -0.02091933
2015-02-28    0.01907311
2015-03-31    0.01096467
2015-04-30    0.01516592
2015-05-31   -0.00939801
2015-06-30    0.02639169
2015-07-31   -0.00462733
2015-08-31    0.03994362
2015-09-30   -0.00997470
2015-10-31    0.00348545
2015-11-30   -0.01629301
2015-12-31   -0.01923761
2016-01-31   -0.00627867
2016-02-29   -0.00897029
2016-03-31   -0.01585080
2016-04-30    0.00663439
2016-05-31   -0.00193083
2016-06-30   -0.00358726
2016-07-31   -0.02363142
2016-08-31   -0.00151126
2016-09-30    0.00524655
2016-10-31    0.0082

In [129]:
t_test_result = stats.ttest_1samp(expected_portfolio_returns_by_date, 0)
t_value = t_test_result[0]
p_value = t_test_result[1] / 2

print(t_value, p_value)

0.9955909491207204 0.16227360889710507


In [130]:
from scipy import stats

def analyze_alpha(expected_portfolio_returns_by_date):
    """
    Perform a t-test with the null hypothesis being that the expected mean return is zero.
    
    Parameters
    ----------
    expected_portfolio_returns_by_date : Pandas Series
        Expected portfolio returns for each date
    
    Returns
    -------
    t_value
        T-statistic from t-test
    p_value
        Corresponding p-value
    """
    # TODO: Implement Function
    
    # stats.ttest_1samp(데이터셋, 모평균)
    
    t_test_result = stats.ttest_1samp(expected_portfolio_returns_by_date, 0)
    t_value = t_test_result[0]
    p_value = t_test_result[1] / 2

    return t_value, p_value

project_tests.test_analyze_alpha(analyze_alpha)

Tests Passed


### View Data
Let's see what values we get with our portfolio. After you run this, make sure to answer the question below.

In [131]:
t_value, p_value = analyze_alpha(expected_portfolio_returns_by_date)
print("""
Alpha analysis:
 t-value:        {:.3f}
 p-value:        {:.6f}
""".format(t_value, p_value))


Alpha analysis:
 t-value:        0.996
 p-value:        0.162274



### Question: What p-value did you observe? And what does that indicate about your signal?

*#TODO: Put Answer In this Cell*

p-value가 0.162274라는 것은 귀무가설(H0: 수익률 0)의 발생확률이 0.162274라는 의미이다. 
이는 유의수준 α 0.05보다 크므로, 해당 포트폴리오의 수익률은 무작위일 가능성이 높다. 
따라서 해당 포트폴리오의 수익률이 0이라는 귀무가설을 채택한다. 

## Submission
Now that you're done with the project, it's time to submit it. Click the submit button in the bottom right. One of our reviewers will give you feedback on your project with a pass or not passed grade. You can continue to the next section while you wait for feedback.