# Engineering Predictive Alpha Factors

This notebook illustrates the following steps:

1. Select the adjusted open, high, low, and close prices as well as the volume for all tickers from the Quandl Wiki data that you downloaded and simplified for the last milestone for the 2007-2016 time period. Looking ahead, we will use 2014-2016 as our 'out-of-sample' period to test the performance of a strategy based on a machine learning model selected using data from preceding periods.
2. Compute the dollar volume as the product of closing price and trading volume; then select the stocks with at least eight years of data and the lowest average daily rank for this metric. 
3. Compute daily returns and keep only 'inliers' with values between -100% and + 100% as a basic check against data error.
4. Now we're ready to compute financial features. The Alpha Factory Library listed among the resources below illustrates how to compute a broad range of those using pandas and TA-Lib. We will list a few examples; feel free to explore and evaluate the various TA-Lib indicators.
    - Compute **historical returns** for various time ranges such as 1, 3, 5, 10, 21 trading days, as well as longer periods like 2, 3, 6 and 12 months.
    - Use TA-Lib's **Bollinger Band** indicator to create features that anticipate **mean-reversion**.
    - Select some indicators from TA-Lib's **momentum** indicators family such as
        - the Average Directional Movement Index (ADX), 
        - the Moving Average Convergence Divergence (MACD), 
        - the Relative Strength Index (RSI), 
        - the Balance of Power (BOP) indictor, or 
        - the Money Flow Index (MFI).
    - Compute TA-Lib **volume** indicators like On Balance Volume (OBV) or the Chaikin A/D Oscillator (ADOSC)
    - Create volatility metrics such as the Normalized Average True Range (NATR).
    - Compute rolling factor betas using the five Fama-French risk factors for different rolling windows of three and 12 months (see resources below).
    - Compute the outcome variable that we will aim to predict, namely the 1-day forward returns.

## Usage tips

- If you experience resource constraints (suddenly restarting Kernel), increase the memory available for Docker Desktop (> Settings > Advanced). If this not possible or you experienced prolonged execution times, reduce the scope of the exercise. The easiest way to do so is to select fewer stocks or a shorter time period, or both.
- You may want to persist intermediate results so you can recover quickly in case something goes wrong. There's an example under the first 'Persist Results' subsection.

## Imports & Settings

In [1]:
import warnings
warnings.filterwarnings('ignore')

In [2]:
%matplotlib inline

from pathlib import Path
import numpy as np
import pandas as pd
import pandas_datareader.data as web

import statsmodels.api as sm
from statsmodels.regression.rolling import RollingOLS
from sklearn.preprocessing import scale
import talib

import matplotlib.pyplot as plt
import seaborn as sns

In [3]:
sns.set_style('whitegrid')
idx = pd.IndexSlice
deciles = np.arange(.1, 1, .1).round(1)

## Load Data

In [4]:
DATA_STORE = Path('..', 'data', 'stock_data.h5')

In [5]:
DATA_STORE

WindowsPath('../data/stock_data.h5')

In [6]:
stock_data =  pd.read_hdf(DATA_STORE)
stock_data.head()

Unnamed: 0_level_0,Unnamed: 1_level_0,open,high,low,close,volume
ticker,date,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1
A,2000-01-03,53.726454,53.85608,45.969377,49.121329,3343600.0
A,2000-01-04,46.481058,46.992738,44.175084,45.369006,3408500.0
A,2000-01-05,45.198445,45.23938,41.828176,41.998737,4119200.0
A,2000-01-06,42.046493,42.298923,39.658651,40.934441,1812900.0
A,2000-01-07,40.293135,44.986951,40.2522,44.345645,2016900.0


## Select 500 most-traded stocks prior to 2017

In [7]:
stock_data = stock_data[(stock_data.index.get_level_values(1)>='2007-01-01') & (stock_data.index.get_level_values(1)<='2016-12-31')]

Compute the dollar volume as the product of the adjusted close price and the adjusted volume:

In [8]:
stock_data['dollar_volume'] = stock_data['close'] * stock_data['volume']

Include stocks that are in at least 8 years

In [9]:
stock_data['year'] = stock_data.index.get_level_values(1).year

In [10]:
def extract_ticker_in_year(df, yr):
    df_filtered = df[df['year']==yr]
    tickers = list(set(df_filtered.index.get_level_values(0).to_list()))
    return tickers

year_count_dict = {}

# get a list of tickers for each year and count how many years each ticker appear in
for yr in range(2007,2017):
    tickers = extract_ticker_in_year(stock_data, yr)
    for ticker in tickers:
        if ticker not in year_count_dict.keys():
            year_count_dict[ticker]=1
        else:
            year_count_dict[ticker] +=1

# grab a list of tickers appearing in at least 8 years
eight_years_list = [key for key,value in year_count_dict.items() if value >=8]


In [11]:
# include only stocks in the eight_years_list
stock_data_eight_yr = stock_data[stock_data.index.get_level_values(0).isin(eight_years_list)]

In [12]:
# mean_dollar_volume
avg_dollar_volume = stock_data_eight_yr.groupby(stock_data_eight_yr.index.get_level_values(0))\
                                            .mean('dollar_volume')['dollar_volume']


In [13]:
avg_dollar_volume

ticker
A       8.375852e+07
AAL     2.702878e+08
AAN     1.093269e+07
AAON    2.207403e+06
AAP     8.899146e+07
            ...     
ZIXI    1.468193e+06
ZLC     9.750222e+06
ZMH     1.116683e+08
ZQK     9.730574e+06
ZUMZ    1.134560e+07
Name: dollar_volume, Length: 2586, dtype: float64

In [14]:
# grab ticker list with highest avg ranks (=<500)
top_500_df = avg_dollar_volume.sort_values( ascending=False)\
                    .iloc[:500]
top_500_list = top_500_df.index.to_list()

Include only tickets in the top_500_list:

In [15]:
stock_data_eight_yr_top_500 = stock_data_eight_yr[stock_data_eight_yr.index.get_level_values(0).isin(top_500_list)]

## Remove outliers based on daily returns

Pivot data to make the tickers columns and keeping only close prices


In [16]:
stock_data_returns = stock_data_eight_yr_top_500['close'].pct_change(1)

# only keep those between 0.00001 and +0.99999 quantiles

outliers = stock_data_returns[(stock_data_returns < stock_data_returns.quantile(.00001)) |
                         (stock_data_returns > stock_data_returns.quantile(.99999))]

In [17]:
# 26 tickers to remove
len(outliers)

26

In [18]:
# select only those in keep list
stock_data_keep = stock_data_eight_yr_top_500[~stock_data_eight_yr_top_500.index.get_level_values(0).isin(outliers)]
stock_data_keep = stock_data_keep.drop('year',axis=1)
stock_data_keep.head()

Unnamed: 0_level_0,Unnamed: 1_level_0,open,high,low,close,volume,dollar_volume
ticker,date,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1,Unnamed: 7_level_1
A,2007-01-03,23.871602,24.2059,23.230295,23.400856,2574600.0,60247840.0
A,2007-01-04,23.400856,23.605528,22.827773,23.475902,2073700.0,48681980.0
A,2007-01-05,23.400856,23.46908,23.196183,23.257585,2676600.0,62251250.0
A,2007-01-08,23.182539,23.250763,22.977866,23.175716,1557200.0,36089230.0
A,2007-01-09,23.250763,23.4145,22.943754,23.203006,1386200.0,32164010.0


## Compute returns

In [19]:
# 1 to 252 trading day lags
trading_day_lags = [1, 3, 5, 10, 21, 42, 63, 126, 252]

for lag in trading_day_lags:
    stock_data_keep[f'return_{lag}_days'] = stock_data_keep['close']\
                                            .pct_change(lag)

### Forward Returns

In [20]:
stock_data_keep['return_fwd'] = stock_data_keep['return_1_days'].shift(-1)
# shift returns back in time (tomorrow's returns are today's fwd returns)

## Bollinger Bands

In [21]:
# to see documentation
?talib.BBANDS

__Note: before the time period ends at the beginning, the bands are not meaningful because the bands would be calculated as really wide band__

In [22]:
stock_data_keep['BBands_high'], stock_data_keep['BBands_mid'], stock_data_keep['BBands_low']\
    =talib.BBANDS(stock_data_keep.groupby(level='ticker')['close'].shift(0),timeperiod=20, nbdevup=2, nbdevdn=2, matype=0)

stock_data_keep['BB_revert'] = stock_data_keep\
    .apply(lambda stock_data_keep: 1 if stock_data_keep['close']>=stock_data_keep['BBands_high'] or \
                                                 stock_data_keep['close']<=stock_data_keep['BBands_low'] else 0, axis=1)

In [23]:
stock_data_keep[(stock_data_keep.index.get_level_values(0) == 'AAPL') \
                & (stock_data_keep.index.get_level_values(1) > '2007-01-31')]

Unnamed: 0_level_0,Unnamed: 1_level_0,open,high,low,close,volume,dollar_volume,return_1_days,return_3_days,return_5_days,return_10_days,return_21_days,return_42_days,return_63_days,return_126_days,return_252_days,return_fwd,BBands_high,BBands_mid,BBands_low,BB_revert
ticker,date,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1,Unnamed: 7_level_1,Unnamed: 8_level_1,Unnamed: 9_level_1,Unnamed: 10_level_1,Unnamed: 11_level_1,Unnamed: 12_level_1,Unnamed: 13_level_1,Unnamed: 14_level_1,Unnamed: 15_level_1,Unnamed: 16_level_1,Unnamed: 17_level_1,Unnamed: 18_level_1,Unnamed: 19_level_1,Unnamed: 20_level_1,Unnamed: 21_level_1
AAPL,2007-02-01,11.081757,11.086897,10.890271,10.890271,166085500.0,1.808716e+09,-0.011548,-0.013963,-0.017507,-0.048613,-0.935507,-0.935713,-0.922110,-0.934486,-0.928068,0.000118,12.558877,11.428552,10.298226,0
AAPL,2007-02-02,10.810592,10.955813,10.756617,10.891556,155382500.0,1.692357e+09,0.000118,-0.009351,-0.007379,-0.042373,0.011337,-0.936097,-0.921940,-0.935271,-0.926153,-0.009558,12.562829,11.422704,10.282580,0
AAPL,2007-02-05,10.833725,10.953243,10.787460,10.787460,144713100.0,1.561087e+09,-0.009558,-0.020880,-0.023272,-0.032838,-0.020079,-0.936477,-0.921762,-0.936466,-0.926814,0.002502,12.569633,11.415572,10.261511,0
AAPL,2007-02-06,10.853002,10.855572,10.648665,10.814448,216098400.0,2.336985e+09,0.002502,-0.006962,-0.016365,-0.018086,-0.010582,-0.936464,-0.921967,-0.936423,-0.924638,0.023767,12.576108,11.407090,10.238072,0
AAPL,2007-02-07,10.856857,11.101034,10.737339,11.071476,266706300.0,2.952832e+09,0.023767,0.016519,0.004899,-0.006344,0.007956,-0.935997,-0.918553,-0.934291,-0.921489,0.000348,12.521004,11.365837,10.210669,0
AAPL,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...
AAPL,2016-12-23,114.162295,115.080808,114.162295,115.080808,14249484.0,1.639842e+09,0.001978,-0.003677,0.004743,0.022554,0.047559,-0.009516,0.037604,0.279393,0.102591,0.006351,117.486874,112.000826,106.514777,0
AAPL,2016-12-27,115.080808,116.344998,115.051178,115.811668,18296855.0,2.118989e+09,0.006351,0.001709,0.005316,0.034951,0.048931,0.019712,0.042254,0.266195,0.122163,-0.004264,117.941406,112.281812,106.622217,0
AAPL,2016-12-28,116.068456,116.558923,114.764760,115.317843,20905892.0,2.410822e+09,-0.004264,0.004042,-0.001625,0.013630,0.046518,0.025209,0.029978,0.249977,0.097648,-0.000257,118.256141,112.543539,106.830937,0
AAPL,2016-12-29,115.011672,115.663027,114.962290,115.288214,15039519.0,1.733879e+09,-0.000257,0.001802,-0.002819,0.013369,0.047282,0.031796,0.045960,0.233970,0.111886,-0.007796,118.459815,112.850204,107.240592,0


## Momentum Indicators

TA-Lib offers the following choices - feel free to experiment with as many as you like (but you don't have to..):

|Function|             Name|
|:---|:---|
|PLUS_DM|              Plus Directional Movement|
|MINUS_DM|             Minus Directional Movement|
|PLUS_DI|              Plus Directional Indicator|
|MINUS_DI|             Minus Directional Indicator|
|DX|                   Directional Movement Index|
|ADX|                  Average Directional Movement Index|
|ADXR|                 Average Directional Movement Index Rating|
|APO|                  Absolute Price Oscillator|
|PPO|                  Percentage Price Oscillator|
|AROON|                Aroon|
|AROONOSC|             Aroon Oscillator|
|BOP|                  Balance Of Power|
|CCI|                  Commodity Channel Index|
|CMO|                  Chande Momentum Oscillator|
|MACD|                 Moving Average Convergence/Divergence|
|MACDEXT|              MACD with controllable MA type|
|MACDFIX|              Moving Average Convergence/Divergence Fix 12/26|
|MFI|                  Money Flow Index|
|MOM|                  Momentum|
|RSI|                  Relative Strength Index|
|STOCH|                Stochastic|
|STOCHF|               Stochastic Fast|
|STOCHRSI|             Stochastic Relative Strength Index|
|TRIX|                 1-day Rate-Of-Change (ROC) of a Triple Smooth EMA|
|ULTOSC|               Ultimate Oscillator|
|WILLR|                Williams' %R|

### Average Directional Movement Index (ADX)

The ADX combines of two other indicators, namely the positive and directional indicators (PLUS_DI and MINUS_DI), which in turn build on the positive and directional movement (PLUS_DM and MINUS_DM). For additional details see [Wikipdia](https://en.wikipedia.org/wiki/Average_directional_movement_index) and [Investopedia](https://www.investopedia.com/articles/trading/07/adx-trend-indicator.asp).

In [24]:
?talib.ADX

In [25]:
stock_data_keep['ADX'] = talib.ADX(
                                high=stock_data_keep.groupby(level='ticker')['high'].shift(0),
                                low=stock_data_keep.groupby(level='ticker')['low'].shift(0),
                                close=stock_data_keep.groupby(level='ticker')['close'].shift(0),
                                timeperiod=14
)

For some reason, ADX doesn't work for all...

In [26]:
stock_data_keep[(stock_data_keep.index.get_level_values(0) == 'AAPL') \
                & (stock_data_keep.index.get_level_values(1) > '2007-01-31')
              ]

Unnamed: 0_level_0,Unnamed: 1_level_0,open,high,low,close,volume,dollar_volume,return_1_days,return_3_days,return_5_days,return_10_days,...,return_42_days,return_63_days,return_126_days,return_252_days,return_fwd,BBands_high,BBands_mid,BBands_low,BB_revert,ADX
ticker,date,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1,Unnamed: 7_level_1,Unnamed: 8_level_1,Unnamed: 9_level_1,Unnamed: 10_level_1,Unnamed: 11_level_1,Unnamed: 12_level_1,Unnamed: 13_level_1,Unnamed: 14_level_1,Unnamed: 15_level_1,Unnamed: 16_level_1,Unnamed: 17_level_1,Unnamed: 18_level_1,Unnamed: 19_level_1,Unnamed: 20_level_1,Unnamed: 21_level_1,Unnamed: 22_level_1
AAPL,2007-02-01,11.081757,11.086897,10.890271,10.890271,166085500.0,1.808716e+09,-0.011548,-0.013963,-0.017507,-0.048613,...,-0.935713,-0.922110,-0.934486,-0.928068,0.000118,12.558877,11.428552,10.298226,0,74.891454
AAPL,2007-02-02,10.810592,10.955813,10.756617,10.891556,155382500.0,1.692357e+09,0.000118,-0.009351,-0.007379,-0.042373,...,-0.936097,-0.921940,-0.935271,-0.926153,-0.009558,12.562829,11.422704,10.282580,0,75.743961
AAPL,2007-02-05,10.833725,10.953243,10.787460,10.787460,144713100.0,1.561087e+09,-0.009558,-0.020880,-0.023272,-0.032838,...,-0.936477,-0.921762,-0.936466,-0.926814,0.002502,12.569633,11.415572,10.261511,0,76.535575
AAPL,2007-02-06,10.853002,10.855572,10.648665,10.814448,216098400.0,2.336985e+09,0.002502,-0.006962,-0.016365,-0.018086,...,-0.936464,-0.921967,-0.936423,-0.924638,0.023767,12.576108,11.407090,10.238072,0,77.274581
AAPL,2007-02-07,10.856857,11.101034,10.737339,11.071476,266706300.0,2.952832e+09,0.023767,0.016519,0.004899,-0.006344,...,-0.935997,-0.918553,-0.934291,-0.921489,0.000348,12.521004,11.365837,10.210669,0,77.855301
AAPL,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...
AAPL,2016-12-23,114.162295,115.080808,114.162295,115.080808,14249484.0,1.639842e+09,0.001978,-0.003677,0.004743,0.022554,...,-0.009516,0.037604,0.279393,0.102591,0.006351,117.486874,112.000826,106.514777,0,23.960773
AAPL,2016-12-27,115.080808,116.344998,115.051178,115.811668,18296855.0,2.118989e+09,0.006351,0.001709,0.005316,0.034951,...,0.019712,0.042254,0.266195,0.122163,-0.004264,117.941406,112.281812,106.622217,0,24.229878
AAPL,2016-12-28,116.068456,116.558923,114.764760,115.317843,20905892.0,2.410822e+09,-0.004264,0.004042,-0.001625,0.013630,...,0.025209,0.029978,0.249977,0.097648,-0.000257,118.256141,112.543539,106.830937,0,24.203927
AAPL,2016-12-29,115.011672,115.663027,114.962290,115.288214,15039519.0,1.733879e+09,-0.000257,0.001802,-0.002819,0.013369,...,0.031796,0.045960,0.233970,0.111886,-0.007796,118.459815,112.850204,107.240592,0,24.179830


### MACD

In [27]:
?talib.MACD

In [28]:
stock_data_keep['MACD'],_,_ = talib.MACD(
                                real=stock_data_keep.groupby(level='ticker')['close'].shift(0)
)

In [29]:
stock_data_keep[(stock_data_keep.index.get_level_values(0) == 'AAPL') \
                & (stock_data_keep.index.get_level_values(1) > '2007-01-31')
              ]

Unnamed: 0_level_0,Unnamed: 1_level_0,open,high,low,close,volume,dollar_volume,return_1_days,return_3_days,return_5_days,return_10_days,...,return_63_days,return_126_days,return_252_days,return_fwd,BBands_high,BBands_mid,BBands_low,BB_revert,ADX,MACD
ticker,date,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1,Unnamed: 7_level_1,Unnamed: 8_level_1,Unnamed: 9_level_1,Unnamed: 10_level_1,Unnamed: 11_level_1,Unnamed: 12_level_1,Unnamed: 13_level_1,Unnamed: 14_level_1,Unnamed: 15_level_1,Unnamed: 16_level_1,Unnamed: 17_level_1,Unnamed: 18_level_1,Unnamed: 19_level_1,Unnamed: 20_level_1,Unnamed: 21_level_1,Unnamed: 22_level_1
AAPL,2007-02-01,11.081757,11.086897,10.890271,10.890271,166085500.0,1.808716e+09,-0.011548,-0.013963,-0.017507,-0.048613,...,-0.922110,-0.934486,-0.928068,0.000118,12.558877,11.428552,10.298226,0,74.891454,-26.634295
AAPL,2007-02-02,10.810592,10.955813,10.756617,10.891556,155382500.0,1.692357e+09,0.000118,-0.009351,-0.007379,-0.042373,...,-0.921940,-0.935271,-0.926153,-0.009558,12.562829,11.422704,10.282580,0,75.743961,-25.064889
AAPL,2007-02-05,10.833725,10.953243,10.787460,10.787460,144713100.0,1.561087e+09,-0.009558,-0.020880,-0.023272,-0.032838,...,-0.921762,-0.936466,-0.926814,0.002502,12.569633,11.415572,10.261511,0,76.535575,-23.557961
AAPL,2007-02-06,10.853002,10.855572,10.648665,10.814448,216098400.0,2.336985e+09,0.002502,-0.006962,-0.016365,-0.018086,...,-0.921967,-0.936423,-0.924638,0.023767,12.576108,11.407090,10.238072,0,77.274581,-22.106700
AAPL,2007-02-07,10.856857,11.101034,10.737339,11.071476,266706300.0,2.952832e+09,0.023767,0.016519,0.004899,-0.006344,...,-0.918553,-0.934291,-0.921489,0.000348,12.521004,11.365837,10.210669,0,77.855301,-20.697240
AAPL,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...
AAPL,2016-12-23,114.162295,115.080808,114.162295,115.080808,14249484.0,1.639842e+09,0.001978,-0.003677,0.004743,0.022554,...,0.037604,0.279393,0.102591,0.006351,117.486874,112.000826,106.514777,0,23.960773,1.501988
AAPL,2016-12-27,115.080808,116.344998,115.051178,115.811668,18296855.0,2.118989e+09,0.006351,0.001709,0.005316,0.034951,...,0.042254,0.266195,0.122163,-0.004264,117.941406,112.281812,106.622217,0,24.229878,1.546350
AAPL,2016-12-28,116.068456,116.558923,114.764760,115.317843,20905892.0,2.410822e+09,-0.004264,0.004042,-0.001625,0.013630,...,0.029978,0.249977,0.097648,-0.000257,118.256141,112.543539,106.830937,0,24.203927,1.524092
AAPL,2016-12-29,115.011672,115.663027,114.962290,115.288214,15039519.0,1.733879e+09,-0.000257,0.001802,-0.002819,0.013369,...,0.045960,0.233970,0.111886,-0.007796,118.459815,112.850204,107.240592,0,24.179830,1.486920





### Absolute Price Oscillator (APO)

The absolute Price Oscillator (APO) is computed as the difference between two exponential moving averages (EMA) of price series, expressed as an absolute value. The EMA windows usually contain 26 and 12 data points, respectively.

### Percentage Price Oscillator (PPO)

The Percentage Price Oscillator (APO) is computed as the difference between two exponential moving averages (EMA) of price series, expressed as a percentage value and thus comparable across assets. The EMA windows usually contain 26 and 12 data points, respectively. 

### Aroon Oscillator

#### Aroon Up/Down Indicator

The indicator measures the time between highs and the time between lows over a time period. It computes an AROON_UP and an AROON_DWN indicator as follows:

$$
\begin{align*}
\text{AROON_UP}&=\frac{T-\text{Periods since T period High}}{T}\times 100\\
\text{AROON_DWN}&=\frac{T-\text{Periods since T period Low}}{T}\times 100
\end{align*}
$$

#### Aroon Oscillator

The Aroon Oscillator is simply the difference between the Aroon Up and Aroon Down indicators.

### Balance Of Power (BOP)

The Balance of Power (BOP) intends to measure the strength of buyers relative to sellers in the market by assessing the ability of each side to drive prices. It is computer as the difference between the close and the open price, divided by the difference between the high and the low price: 

$$
\text{BOP}_t= \frac{P_t^\text{Close}-P_t^\text{Open}}{P_t^\text{High}-P_t^\text{Low}}
$$

In [30]:
?talib.BOP

In [31]:
stock_data_keep['BOP']= talib.BOP(
                                open=stock_data_keep.groupby(level='ticker')['open'].shift(0)
                                ,high=stock_data_keep.groupby(level='ticker')['high'].shift(0)
                                ,low=stock_data_keep.groupby(level='ticker')['low'].shift(0)
                                ,close=stock_data_keep.groupby(level='ticker')['close'].shift(0)
)

In [32]:
stock_data_keep[(stock_data_keep.index.get_level_values(0) == 'AAPL') \
                & (stock_data_keep.index.get_level_values(1) > '2007-01-31')
              ]

Unnamed: 0_level_0,Unnamed: 1_level_0,open,high,low,close,volume,dollar_volume,return_1_days,return_3_days,return_5_days,return_10_days,...,return_126_days,return_252_days,return_fwd,BBands_high,BBands_mid,BBands_low,BB_revert,ADX,MACD,BOP
ticker,date,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1,Unnamed: 7_level_1,Unnamed: 8_level_1,Unnamed: 9_level_1,Unnamed: 10_level_1,Unnamed: 11_level_1,Unnamed: 12_level_1,Unnamed: 13_level_1,Unnamed: 14_level_1,Unnamed: 15_level_1,Unnamed: 16_level_1,Unnamed: 17_level_1,Unnamed: 18_level_1,Unnamed: 19_level_1,Unnamed: 20_level_1,Unnamed: 21_level_1,Unnamed: 22_level_1
AAPL,2007-02-01,11.081757,11.086897,10.890271,10.890271,166085500.0,1.808716e+09,-0.011548,-0.013963,-0.017507,-0.048613,...,-0.934486,-0.928068,0.000118,12.558877,11.428552,10.298226,0,74.891454,-26.634295,-0.973856
AAPL,2007-02-02,10.810592,10.955813,10.756617,10.891556,155382500.0,1.692357e+09,0.000118,-0.009351,-0.007379,-0.042373,...,-0.935271,-0.926153,-0.009558,12.562829,11.422704,10.282580,0,75.743961,-25.064889,0.406452
AAPL,2007-02-05,10.833725,10.953243,10.787460,10.787460,144713100.0,1.561087e+09,-0.009558,-0.020880,-0.023272,-0.032838,...,-0.936466,-0.926814,0.002502,12.569633,11.415572,10.261511,0,76.535575,-23.557961,-0.279070
AAPL,2007-02-06,10.853002,10.855572,10.648665,10.814448,216098400.0,2.336985e+09,0.002502,-0.006962,-0.016365,-0.018086,...,-0.936423,-0.924638,0.023767,12.576108,11.407090,10.238072,0,77.274581,-22.106700,-0.186335
AAPL,2007-02-07,10.856857,11.101034,10.737339,11.071476,266706300.0,2.952832e+09,0.023767,0.016519,0.004899,-0.006344,...,-0.934291,-0.921489,0.000348,12.521004,11.365837,10.210669,0,77.855301,-20.697240,0.590106
AAPL,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...
AAPL,2016-12-23,114.162295,115.080808,114.162295,115.080808,14249484.0,1.639842e+09,0.001978,-0.003677,0.004743,0.022554,...,0.279393,0.102591,0.006351,117.486874,112.000826,106.514777,0,23.960773,1.501988,1.000000
AAPL,2016-12-27,115.080808,116.344998,115.051178,115.811668,18296855.0,2.118989e+09,0.006351,0.001709,0.005316,0.034951,...,0.266195,0.122163,-0.004264,117.941406,112.281812,106.622217,0,24.229878,1.546350,0.564885
AAPL,2016-12-28,116.068456,116.558923,114.764760,115.317843,20905892.0,2.410822e+09,-0.004264,0.004042,-0.001625,0.013630,...,0.249977,0.097648,-0.000257,118.256141,112.543539,106.830937,0,24.203927,1.524092,-0.418364
AAPL,2016-12-29,115.011672,115.663027,114.962290,115.288214,15039519.0,1.733879e+09,-0.000257,0.001802,-0.002819,0.013369,...,0.233970,0.111886,-0.007796,118.459815,112.850204,107.240592,0,24.179830,1.486920,0.394644


### Commodity Channel Index (CCI)

The Commodity Channel Index (CCI) measures the difference between the current *typical* price, computed as the average of current low, high and close price and the historical average price. A positive (negative) CCI indicates that price is above (below) the historic average. When CCI is below zero, the price is below the hsitoric average. It is computed as:

$$
\begin{align*}
\bar{P_t}&=\frac{P_t^H+P_t^L+P_t^C}{3}\\
\text{CCI}_t & =\frac{\bar{P_t} - \text{SMA}(T)_t}{0.15\sum_{t=i}^T |\bar{P_t}-\text{SMA}(N)_t|/T}
\end{align*}
$$

### Moving Average Convergence/Divergence (MACD)

Moving Average Convergence Divergence (MACD) is a trend-following (lagging) momentum indicator that shows the relationship between two moving averages of a security’s price. It is calculated by subtracting the 26-period Exponential Moving Average (EMA) from the 12-period EMA.

The TA-Lib implementation returns the MACD value and its signal line, which is the 9-day EMA of the MACD. In addition, the MACD-Histogram measures the distance between the indicator and its signal line.

### Chande Momentum Oscillator (CMO)

The Chande Momentum Oscillator (CMO) intends to measure momentum on both up and down days. It is calculated as the difference between the sum of gains and losses over at time period T, divided by the sum of all price movement over the same period. It oscillates between +100 and -100.

### Money Flow Index

The Money Flow Index (MFI) incorporates price and volume information to identify overbought or oversold conditions.  The indicator is typically calculated using 14 periods of data. An MFI reading above 80 is considered overbought and an MFI reading below 20 is considered oversold.

In [33]:
?talib.MFI

In [34]:
stock_data_keep['MFI'] = talib.MFI(
                                high=stock_data_keep.groupby(level='ticker')['high'].shift(0)
                                ,low=stock_data_keep.groupby(level='ticker')['low'].shift(0)
                                ,close=stock_data_keep.groupby(level='ticker')['close'].shift(0)
                                ,volume=stock_data_keep.groupby(level='ticker')['volume'].shift(0)
                                ,timeperiod=14
)

In [35]:
stock_data_keep[(stock_data_keep.index.get_level_values(0) == 'AAPL') \
                & (stock_data_keep.index.get_level_values(1) > '2007-01-31')
              ]

Unnamed: 0_level_0,Unnamed: 1_level_0,open,high,low,close,volume,dollar_volume,return_1_days,return_3_days,return_5_days,return_10_days,...,return_252_days,return_fwd,BBands_high,BBands_mid,BBands_low,BB_revert,ADX,MACD,BOP,MFI
ticker,date,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1,Unnamed: 7_level_1,Unnamed: 8_level_1,Unnamed: 9_level_1,Unnamed: 10_level_1,Unnamed: 11_level_1,Unnamed: 12_level_1,Unnamed: 13_level_1,Unnamed: 14_level_1,Unnamed: 15_level_1,Unnamed: 16_level_1,Unnamed: 17_level_1,Unnamed: 18_level_1,Unnamed: 19_level_1,Unnamed: 20_level_1,Unnamed: 21_level_1,Unnamed: 22_level_1
AAPL,2007-02-01,11.081757,11.086897,10.890271,10.890271,166085500.0,1.808716e+09,-0.011548,-0.013963,-0.017507,-0.048613,...,-0.928068,0.000118,12.558877,11.428552,10.298226,0,74.891454,-26.634295,-0.973856,24.320706
AAPL,2007-02-02,10.810592,10.955813,10.756617,10.891556,155382500.0,1.692357e+09,0.000118,-0.009351,-0.007379,-0.042373,...,-0.926153,-0.009558,12.562829,11.422704,10.282580,0,75.743961,-25.064889,0.406452,25.561447
AAPL,2007-02-05,10.833725,10.953243,10.787460,10.787460,144713100.0,1.561087e+09,-0.009558,-0.020880,-0.023272,-0.032838,...,-0.926814,0.002502,12.569633,11.415572,10.261511,0,76.535575,-23.557961,-0.279070,17.867111
AAPL,2007-02-06,10.853002,10.855572,10.648665,10.814448,216098400.0,2.336985e+09,0.002502,-0.006962,-0.016365,-0.018086,...,-0.924638,0.023767,12.576108,11.407090,10.238072,0,77.274581,-22.106700,-0.186335,19.095607
AAPL,2007-02-07,10.856857,11.101034,10.737339,11.071476,266706300.0,2.952832e+09,0.023767,0.016519,0.004899,-0.006344,...,-0.921489,0.000348,12.521004,11.365837,10.210669,0,77.855301,-20.697240,0.590106,29.325943
AAPL,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...
AAPL,2016-12-23,114.162295,115.080808,114.162295,115.080808,14249484.0,1.639842e+09,0.001978,-0.003677,0.004743,0.022554,...,0.102591,0.006351,117.486874,112.000826,106.514777,0,23.960773,1.501988,1.000000,87.660569
AAPL,2016-12-27,115.080808,116.344998,115.051178,115.811668,18296855.0,2.118989e+09,0.006351,0.001709,0.005316,0.034951,...,0.122163,-0.004264,117.941406,112.281812,106.622217,0,24.229878,1.546350,0.564885,87.472519
AAPL,2016-12-28,116.068456,116.558923,114.764760,115.317843,20905892.0,2.410822e+09,-0.004264,0.004042,-0.001625,0.013630,...,0.097648,-0.000257,118.256141,112.543539,106.830937,0,24.203927,1.524092,-0.418364,82.064134
AAPL,2016-12-29,115.011672,115.663027,114.962290,115.288214,15039519.0,1.733879e+09,-0.000257,0.001802,-0.002819,0.013369,...,0.111886,-0.007796,118.459815,112.850204,107.240592,0,24.179830,1.486920,0.394644,77.749384


### Relative Strength Index

RSI compares the magnitude of recent price changes across stocks to identify stocks as overbought or oversold. A high RSI (usually above 70) indicates overbought and a low RSI (typically below 30) indicates oversold. It first computes the average price change for a given number (often 14) of prior trading days with rising and falling prices, respectively as $\text{up}_t$ and $\text{down}_t$. Then, the RSI is computed as:
$$
\text{RSI}_t=100-\frac{100}{1+\frac{\text{up}_t}{\text{down}_t}}
$$



In [36]:
?talib.RSI

In [37]:
stock_data_keep['RSI'] = talib.RSI(
                                real=stock_data_keep.groupby(level='ticker')['close'].shift(0)
)

In [38]:
stock_data_keep[(stock_data_keep.index.get_level_values(0) == 'AAPL') \
                & (stock_data_keep.index.get_level_values(1) > '2007-01-31')
              ]

Unnamed: 0_level_0,Unnamed: 1_level_0,open,high,low,close,volume,dollar_volume,return_1_days,return_3_days,return_5_days,return_10_days,...,return_fwd,BBands_high,BBands_mid,BBands_low,BB_revert,ADX,MACD,BOP,MFI,RSI
ticker,date,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1,Unnamed: 7_level_1,Unnamed: 8_level_1,Unnamed: 9_level_1,Unnamed: 10_level_1,Unnamed: 11_level_1,Unnamed: 12_level_1,Unnamed: 13_level_1,Unnamed: 14_level_1,Unnamed: 15_level_1,Unnamed: 16_level_1,Unnamed: 17_level_1,Unnamed: 18_level_1,Unnamed: 19_level_1,Unnamed: 20_level_1,Unnamed: 21_level_1,Unnamed: 22_level_1
AAPL,2007-02-01,11.081757,11.086897,10.890271,10.890271,166085500.0,1.808716e+09,-0.011548,-0.013963,-0.017507,-0.048613,...,0.000118,12.558877,11.428552,10.298226,0,74.891454,-26.634295,-0.973856,24.320706,7.384486
AAPL,2007-02-02,10.810592,10.955813,10.756617,10.891556,155382500.0,1.692357e+09,0.000118,-0.009351,-0.007379,-0.042373,...,-0.009558,12.562829,11.422704,10.282580,0,75.743961,-25.064889,0.406452,25.561447,7.387507
AAPL,2007-02-05,10.833725,10.953243,10.787460,10.787460,144713100.0,1.561087e+09,-0.009558,-0.020880,-0.023272,-0.032838,...,0.002502,12.569633,11.415572,10.261511,0,76.535575,-23.557961,-0.279070,17.867111,7.366548
AAPL,2007-02-06,10.853002,10.855572,10.648665,10.814448,216098400.0,2.336985e+09,0.002502,-0.006962,-0.016365,-0.018086,...,0.023767,12.576108,11.407090,10.238072,0,77.274581,-22.106700,-0.186335,19.095607,7.439867
AAPL,2007-02-07,10.856857,11.101034,10.737339,11.071476,266706300.0,2.952832e+09,0.023767,0.016519,0.004899,-0.006344,...,0.000348,12.521004,11.365837,10.210669,0,77.855301,-20.697240,0.590106,29.325943,8.185217
AAPL,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...
AAPL,2016-12-23,114.162295,115.080808,114.162295,115.080808,14249484.0,1.639842e+09,0.001978,-0.003677,0.004743,0.022554,...,0.006351,117.486874,112.000826,106.514777,0,23.960773,1.501988,1.000000,87.660569,65.182567
AAPL,2016-12-27,115.080808,116.344998,115.051178,115.811668,18296855.0,2.118989e+09,0.006351,0.001709,0.005316,0.034951,...,-0.004264,117.941406,112.281812,106.622217,0,24.229878,1.546350,0.564885,87.472519,67.751025
AAPL,2016-12-28,116.068456,116.558923,114.764760,115.317843,20905892.0,2.410822e+09,-0.004264,0.004042,-0.001625,0.013630,...,-0.000257,118.256141,112.543539,106.830937,0,24.203927,1.524092,-0.418364,82.064134,64.299536
AAPL,2016-12-29,115.011672,115.663027,114.962290,115.288214,15039519.0,1.733879e+09,-0.000257,0.001802,-0.002819,0.013369,...,-0.007796,118.459815,112.850204,107.240592,0,24.179830,1.486920,0.394644,77.749384,64.088572


#### Stochastic RSI (STOCHRSI)

The Stochastic Relative Strength Index (STOCHRSI) is based on the RSI just described and intends to identify crossovers as well as overbought and oversold conditions. It compares the distance of the current RSI to the lowest RSI over a given time period T to the maximum range of values the RSI has assumed for this period. It is computed as follows:

$$
\text{STOCHRSI}_t= \frac{\text{RSI}_t-\text{RSI}_t^L(T)}{\text{RSI}_t^H(T)-\text{RSI}_t^L(T)}
$$

The TA-Lib implementation offers more flexibility than the original "Unsmoothed stochastic RSI" version by Chande and Kroll (1993). To calculate the original indicator, keep the `timeperiod` and `fastk_period` equal. 

The return value `fastk` is the unsmoothed RSI. The `fastd_period` is used to compute a smoothed STOCHRSI, which  is returned as `fastd`. If you do not care about STOCHRSI smoothing, just set `fastd_period` to 1 and ignore the `fastd` output.

Reference: "Stochastic RSI and Dynamic Momentum Index" by Tushar Chande and Stanley Kroll Stock&Commodities V.11:5 (189-199)


### Stochastic (STOCH)

A stochastic oscillator is a momentum indicator comparing a particular closing price of a security to a range of its prices over a certain period of time. Stochastic oscillators are based on the idea that closing prices should confirm the trend.

For stochastic (STOCH), there are four different lines: `FASTK`, `FASTD`, `SLOWK` and `SLOWD`. The `D` is the signal line usually drawn over its corresponding `K` function.

$$
\begin{align*}
& K^\text{Fast}(T_K) & = &\frac{P_t-P_{T_K}^L}{P_{T_K}^H-P_{T_K}^L}* 100 \\
& D^\text{Fast}(T_{\text{FastD}}) & = & \text{MA}(T_{\text{FastD}})[K^\text{Fast}]\\
& K^\text{Slow}(T_{\text{SlowK}}) & = &\text{MA}(T_{\text{SlowK}})[K^\text{Fast}]\\
& D^\text{Slow}(T_{\text{SlowD}}) & = &\text{MA}(T_{\text{SlowD}})[K^\text{Slow}]
\end{align*}
$$
  

The $P_{T_K}^L$, $P_{T_K}^H$, and $P_{T_K}^L$ are the extreme values among the last $T_K$ period.
 $K^\text{Slow}$ and $D^\text{Fast}$ are equivalent when using the same period. 

### Ultimate Oscillator (ULTOSC)

The Ultimate Oscillator (ULTOSC), developed by Larry Williams, measures the average difference of the current close to the previous lowest price over three time frames (default: 7, 14, and 28) to avoid overreacting to short-term price changes and incorporat short, medium, and long-term market trends. It first computes the buying pressure, $\text{BP}_t$, then sums it over the three periods $T_1, T_2, T_3$, normalized by the True Range ($\text{TR}_t$.
$$
\begin{align*}
\text{BP}_t & = P_t^\text{Close}-\min(P_{t-1}^\text{Close}, P_t^\text{Low})\\ 
\text{TR}_t & = \max(P_{t-1}^\text{Close}, P_t^\text{High})-\min(P_{t-1}^\text{Close}, P_t^\text{Low})
\end{align*}
$$

ULTOSC is then computed as a weighted average over the three periods as follows:
$$
\begin{align*}
\text{Avg}_t(T) & = \frac{\sum_{i=0}^{T-1} \text{BP}_{t-i}}{\sum_{i=0}^{T-1} \text{TR}_{t-i}}\\
\text{ULTOSC}_t & = 100*\frac{4\text{Avg}_t(7) + 2\text{Avg}_t(14) + \text{Avg}_t(28)}{4+2+1}
\end{align*}
$$

### Williams' %R (WILLR)

Williams %R, also known as the Williams Percent Range, is a momentum indicator that moves between 0 and -100 and measures overbought and oversold levels to identify entry and exit points. It is similar to the Stochastic oscillator and compares the current closing price $P_t^\text{Close}$ to the range of highest ($P_T^\text{High}$) and lowest ($P_T^\text{Low}$) prices over the last T periods (typically 14). The indicators is computed as:

$$
\text{WILLR}_t = \frac{P_T^\text{High}-P_t^\text{Close}}{P_T^\text{High}-P_T^\text{Low}}
$$


## Volume Indicators

|Function|             Name|
|:---|:---|
|AD|                   Chaikin A/D Line|
|ADOSC|                Chaikin A/D Oscillator|
|OBV|                  On Balance Volume|

### Chaikin A/D Line

The Chaikin Advance/Decline or Accumulation/Distribution Line (AD) is a volume-based indicator designed to measure the cumulative flow of money into and out of an asset. The indicator assumes that the degree of buying or selling pressure can be determined by the location of the close, relative to the high and low for the period. There is buying (sellng) pressure when a stock closes in the upper (lower) half of a period's range. The intention is to signal a change in direction when the indicator diverges from the security price.

The Accumulation/Distribution Line is a running total of each period's Money Flow Volume. It is calculated as follows:

1. The Money Flow Multiplier (MFI) is the relationship of the close to the high-low range:
2. The MFI is multiplied by the period's volume $V_t$ to come up with a Money Flow Volume (MFV). 
3. A running total of the Money Flow Volume forms the Accumulation Distribution Line:
$$
\begin{align*}
&\text{MFI}_t&=\frac{P_t^\text{Close}-P_t^\text{Low}}{P_t^\text{High}-P_t^\text{Low}}\\
&\text{MFV}_t&=\text{MFI}_t \times V_t\\
&\text{AD}_t&=\text{AD}_{t-1}+\text{MFV}_t
\end{align*}
$$

### Chaikin A/D Oscillator (ADOSC)

The Chaikin A/D Oscillator (ADOSC) is the Moving Average Convergence Divergence indicator (MACD) applied to the Chaikin A/D Line. The Chaikin Oscillator intends to predict changes in the Accumulation/Distribution Line.

It is computed as the difference between the 3-day exponential moving average and the 10-day exponential moving average of the Accumulation/Distribution Line.

In [39]:
?talib.ADOSC

In [40]:
stock_data_keep['ADOSC'] = talib.ADOSC(
                                    high=stock_data_keep.groupby(level='ticker')['high'].shift(0)
                                    ,low=stock_data_keep.groupby(level='ticker')['low'].shift(0)
                                    ,close=stock_data_keep.groupby(level='ticker')['close'].shift(0)
                                    ,volume=stock_data_keep.groupby(level='ticker')['volume'].shift(0)
                                    ,fastperiod=3
                                    ,slowperiod=10
)

In [41]:
stock_data_keep[(stock_data_keep.index.get_level_values(0) == 'AAPL') \
                & (stock_data_keep.index.get_level_values(1) > '2007-01-31')
              ]

Unnamed: 0_level_0,Unnamed: 1_level_0,open,high,low,close,volume,dollar_volume,return_1_days,return_3_days,return_5_days,return_10_days,...,BBands_high,BBands_mid,BBands_low,BB_revert,ADX,MACD,BOP,MFI,RSI,ADOSC
ticker,date,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1,Unnamed: 7_level_1,Unnamed: 8_level_1,Unnamed: 9_level_1,Unnamed: 10_level_1,Unnamed: 11_level_1,Unnamed: 12_level_1,Unnamed: 13_level_1,Unnamed: 14_level_1,Unnamed: 15_level_1,Unnamed: 16_level_1,Unnamed: 17_level_1,Unnamed: 18_level_1,Unnamed: 19_level_1,Unnamed: 20_level_1,Unnamed: 21_level_1,Unnamed: 22_level_1
AAPL,2007-02-01,11.081757,11.086897,10.890271,10.890271,166085500.0,1.808716e+09,-0.011548,-0.013963,-0.017507,-0.048613,...,12.558877,11.428552,10.298226,0,74.891454,-26.634295,-0.973856,24.320706,7.384486,-2.693946e+08
AAPL,2007-02-02,10.810592,10.955813,10.756617,10.891556,155382500.0,1.692357e+09,0.000118,-0.009351,-0.007379,-0.042373,...,12.562829,11.422704,10.282580,0,75.743961,-25.064889,0.406452,25.561447,7.387507,-2.250299e+08
AAPL,2007-02-05,10.833725,10.953243,10.787460,10.787460,144713100.0,1.561087e+09,-0.009558,-0.020880,-0.023272,-0.032838,...,12.569633,11.415572,10.261511,0,76.535575,-23.557961,-0.279070,17.867111,7.366548,-2.324685e+08
AAPL,2007-02-06,10.853002,10.855572,10.648665,10.814448,216098400.0,2.336985e+09,0.002502,-0.006962,-0.016365,-0.018086,...,12.576108,11.407090,10.238072,0,77.274581,-22.106700,-0.186335,19.095607,7.439867,-1.729521e+08
AAPL,2007-02-07,10.856857,11.101034,10.737339,11.071476,266706300.0,2.952832e+09,0.023767,0.016519,0.004899,-0.006344,...,12.521004,11.365837,10.210669,0,77.855301,-20.697240,0.590106,29.325943,8.185217,-6.181412e+07
AAPL,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...
AAPL,2016-12-23,114.162295,115.080808,114.162295,115.080808,14249484.0,1.639842e+09,0.001978,-0.003677,0.004743,0.022554,...,117.486874,112.000826,106.514777,0,23.960773,1.501988,1.000000,87.660569,65.182567,3.730864e+06
AAPL,2016-12-27,115.080808,116.344998,115.051178,115.811668,18296855.0,2.118989e+09,0.006351,0.001709,0.005316,0.034951,...,117.941406,112.281812,106.622217,0,24.229878,1.546350,0.564885,87.472519,67.751025,7.060825e+06
AAPL,2016-12-28,116.068456,116.558923,114.764760,115.317843,20905892.0,2.410822e+09,-0.004264,0.004042,-0.001625,0.013630,...,118.256141,112.543539,106.830937,0,24.203927,1.524092,-0.418364,82.064134,64.299536,5.230437e+06
AAPL,2016-12-29,115.011672,115.663027,114.962290,115.288214,15039519.0,1.733879e+09,-0.000257,0.001802,-0.002819,0.013369,...,118.459815,112.850204,107.240592,0,24.179830,1.486920,0.394644,77.749384,64.088572,3.672289e+06


### On Balance Volume (OBV)

The On Balance Volume indicator (OBV) is a cumulative momentum indicator that relates volume to price change. It assumes that OBV changes precede price changes because smart money can be seen flowing into the security by a rising OBV. When the public then moves into the security, both the security and OBV will rise.

The current OBV is computed by adding (subtracting) the current volume to the last OBV if the security closes higher (lower) than the previous close.

$$
\text{OBV}_t = 
\begin{cases}
\text{OBV}_{t-1}+V_t & \text{if }P_t>P_{t-1}\\
\text{OBV}_{t-1}-V_t & \text{if }P_t<P_{t-1}\\
\text{OBV}_{t-1} & \text{otherwise}
\end{cases}
$$

In [42]:
?talib.OBV

In [43]:
stock_data_keep['OBV'] = talib.OBV(
                                    real=stock_data_keep.groupby(level='ticker')['close'].shift(0)
                                    ,volume=stock_data_keep.groupby(level='ticker')['volume'].shift(0)
)

In [44]:
stock_data_keep[(stock_data_keep.index.get_level_values(0) == 'AAPL') \
                & (stock_data_keep.index.get_level_values(1) > '2007-01-31')
              ]

Unnamed: 0_level_0,Unnamed: 1_level_0,open,high,low,close,volume,dollar_volume,return_1_days,return_3_days,return_5_days,return_10_days,...,BBands_mid,BBands_low,BB_revert,ADX,MACD,BOP,MFI,RSI,ADOSC,OBV
ticker,date,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1,Unnamed: 7_level_1,Unnamed: 8_level_1,Unnamed: 9_level_1,Unnamed: 10_level_1,Unnamed: 11_level_1,Unnamed: 12_level_1,Unnamed: 13_level_1,Unnamed: 14_level_1,Unnamed: 15_level_1,Unnamed: 16_level_1,Unnamed: 17_level_1,Unnamed: 18_level_1,Unnamed: 19_level_1,Unnamed: 20_level_1,Unnamed: 21_level_1,Unnamed: 22_level_1
AAPL,2007-02-01,11.081757,11.086897,10.890271,10.890271,166085500.0,1.808716e+09,-0.011548,-0.013963,-0.017507,-0.048613,...,11.428552,10.298226,0,74.891454,-26.634295,-0.973856,24.320706,7.384486,-2.693946e+08,1.915037e+08
AAPL,2007-02-02,10.810592,10.955813,10.756617,10.891556,155382500.0,1.692357e+09,0.000118,-0.009351,-0.007379,-0.042373,...,11.422704,10.282580,0,75.743961,-25.064889,0.406452,25.561447,7.387507,-2.250299e+08,3.468862e+08
AAPL,2007-02-05,10.833725,10.953243,10.787460,10.787460,144713100.0,1.561087e+09,-0.009558,-0.020880,-0.023272,-0.032838,...,11.415572,10.261511,0,76.535575,-23.557961,-0.279070,17.867111,7.366548,-2.324685e+08,2.021731e+08
AAPL,2007-02-06,10.853002,10.855572,10.648665,10.814448,216098400.0,2.336985e+09,0.002502,-0.006962,-0.016365,-0.018086,...,11.407090,10.238072,0,77.274581,-22.106700,-0.186335,19.095607,7.439867,-1.729521e+08,4.182715e+08
AAPL,2007-02-07,10.856857,11.101034,10.737339,11.071476,266706300.0,2.952832e+09,0.023767,0.016519,0.004899,-0.006344,...,11.365837,10.210669,0,77.855301,-20.697240,0.590106,29.325943,8.185217,-6.181412e+07,6.849778e+08
AAPL,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...
AAPL,2016-12-23,114.162295,115.080808,114.162295,115.080808,14249484.0,1.639842e+09,0.001978,-0.003677,0.004743,0.022554,...,112.000826,106.514777,0,23.960773,1.501988,1.000000,87.660569,65.182567,3.730864e+06,1.066289e+10
AAPL,2016-12-27,115.080808,116.344998,115.051178,115.811668,18296855.0,2.118989e+09,0.006351,0.001709,0.005316,0.034951,...,112.281812,106.622217,0,24.229878,1.546350,0.564885,87.472519,67.751025,7.060825e+06,1.068119e+10
AAPL,2016-12-28,116.068456,116.558923,114.764760,115.317843,20905892.0,2.410822e+09,-0.004264,0.004042,-0.001625,0.013630,...,112.543539,106.830937,0,24.203927,1.524092,-0.418364,82.064134,64.299536,5.230437e+06,1.066028e+10
AAPL,2016-12-29,115.011672,115.663027,114.962290,115.288214,15039519.0,1.733879e+09,-0.000257,0.001802,-0.002819,0.013369,...,112.850204,107.240592,0,24.179830,1.486920,0.394644,77.749384,64.088572,3.672289e+06,1.064524e+10


## Volatility Indicators

|Function|             Name|
|:---|:---|
|TRANGE|               True Range|
|ATR|                  Average True Range|
|NATR|                 Normalized Average True Range|

### ATR

The Average True Range indicator (ATR) shows volatility of the market. It was introduced by Welles Wilder (1978)  and has been used as a component of numerous other indicators since. It aims to anticipate changes in trend such that the higher its value, the higher the probability of a trend change; the lower the indicator’s value, the weaker the current trend.

It is computed as the simple moving average for a period T of the True Range (TRANGE), which measures volatility as the absolute value of the largest recent trading range:
$$
\text{TRANGE}_t = \max\left[P_t^\text{High} - P_t^\text{low}, \left| P_t^\text{High} - P_{t-1}^\text{Close}\right|, \left| P_t^\text{low} - P_{t-1}^\text{Close}\right|\right]
$$

### NATR

The Normalized Average True Range (NATR) is a normalized version of the ATR computed as follows:

$$
\text{NATR}_t = \frac{\text{ATR}_t(T)}{P_t^\text{Close}} * 100
$$

Normalization make the ATR function more relevant in the folllowing scenarios:
- Long term analysis where the price changes drastically.
- Cross-market or cross-security ATR comparison.

## Rolling Factor Betas

From _Advances in Financial Machine Learning_ by Marcos Lopez de Prado:

... The five Fama-French factors, namely market risk, size, value, operating profitability, and investment, have been shown empirically to explain asset returns. They are commonly used to assess the exposure of a portfolio to well-known drivers of risk and returns, where the unexplained portion is then attributed to the manager's idiosyncratic skill. Hence, it is natural to include past factor exposures as financial features in models that aim to predict future returns...

In [45]:
factors = ['Mkt-RF', 'SMB', 'HML', 'RMW','CMA']

# note: the factors are recorded on a monthly basis
factor_data = web.DataReader('F-F_Research_Data_5_Factors_2x3', 'famafrench',start='2000')[0]
factor_data.index=factor_data.index.to_timestamp()
factor_data.index.name='date'

In [46]:
factor_data

Unnamed: 0_level_0,Mkt-RF,SMB,HML,RMW,CMA,RF
date,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1
2000-01-01,-4.74,4.44,-1.91,-6.31,4.75,0.41
2000-02-01,2.45,18.34,-9.70,-18.73,-0.36,0.43
2000-03-01,5.20,-15.35,8.17,11.82,-1.65,0.47
2000-04-01,-6.40,-5.01,7.26,7.66,5.65,0.46
2000-05-01,-4.42,-3.84,4.81,4.17,1.30,0.50
...,...,...,...,...,...,...
2022-10-01,7.83,1.86,8.05,3.07,6.52,0.23
2022-11-01,4.60,-2.67,1.38,6.01,3.11,0.29
2022-12-01,-6.41,-0.16,1.32,0.09,4.19,0.33
2023-01-01,6.65,4.43,-4.05,-2.62,-4.53,0.35


In [47]:
# convert monthly factors to daily

factor_data['date']=pd.to_datetime(factor_data.index).to_period('m')
factor_data = factor_data.set_index('date').resample('d').ffill().to_timestamp()
factor_data

Unnamed: 0_level_0,Mkt-RF,SMB,HML,RMW,CMA,RF
date,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1
2000-01-01,-4.74,4.44,-1.91,-6.31,4.75,0.41
2000-01-02,-4.74,4.44,-1.91,-6.31,4.75,0.41
2000-01-03,-4.74,4.44,-1.91,-6.31,4.75,0.41
2000-01-04,-4.74,4.44,-1.91,-6.31,4.75,0.41
2000-01-05,-4.74,4.44,-1.91,-6.31,4.75,0.41
...,...,...,...,...,...,...
2023-02-24,-2.58,0.59,-0.80,0.92,-1.53,0.34
2023-02-25,-2.58,0.59,-0.80,0.92,-1.53,0.34
2023-02-26,-2.58,0.59,-0.80,0.92,-1.53,0.34
2023-02-27,-2.58,0.59,-0.80,0.92,-1.53,0.34


In [48]:
?RollingOLS

In [49]:
# exog needs sm.add_constant to include an intercept in the OLS model
# below modified from solutions
# use return_fwd because we want to predict the return of the following day
windows = [21,63]
ret = 'return_fwd'
for window in windows:
    print(window)
    betas = []
    for ticker, df in stock_data_keep.groupby('ticker', group_keys=False):
        model_data = df[[ret]].merge(factor_data, on='date').dropna()
        model_data[ret] -= model_data.RF

        rolling_ols = RollingOLS(endog=model_data[ret], 
                                 exog=sm.add_constant(model_data[factors]), window=window)
        factor_model = rolling_ols.fit(params_only=True).params.rename(columns={'const':'ALPHA'})
        result = factor_model.assign(ticker=ticker).set_index('ticker', append=True).swaplevel()
        betas.append(result)
    betas = pd.concat(betas).rename(columns=lambda x: f'{x}_{window}')
    stock_data_keep = stock_data_keep.join(betas)

21
63


## Time Period Flags

In [50]:
dates = stock_data_keep.index.get_level_values('date')
stock_data_keep['month'] = dates.month.values.astype(np.uint8) - 1
stock_data_keep['weekday'] = dates.weekday.values.astype(np.uint8)

In [51]:
stock_data_keep[(stock_data_keep.index.get_level_values(0) == 'AAPL') \
                & (stock_data_keep.index.get_level_values(1) > '2007-01-31')
              ]

Unnamed: 0_level_0,Unnamed: 1_level_0,open,high,low,close,volume,dollar_volume,return_1_days,return_3_days,return_5_days,return_10_days,...,RMW_21,CMA_21,ALPHA_63,Mkt-RF_63,SMB_63,HML_63,RMW_63,CMA_63,month,weekday
ticker,date,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1,Unnamed: 7_level_1,Unnamed: 8_level_1,Unnamed: 9_level_1,Unnamed: 10_level_1,Unnamed: 11_level_1,Unnamed: 12_level_1,Unnamed: 13_level_1,Unnamed: 14_level_1,Unnamed: 15_level_1,Unnamed: 16_level_1,Unnamed: 17_level_1,Unnamed: 18_level_1,Unnamed: 19_level_1,Unnamed: 20_level_1,Unnamed: 21_level_1,Unnamed: 22_level_1
AAPL,2007-02-01,11.081757,11.086897,10.890271,10.890271,166085500.0,1.808716e+09,-0.011548,-0.013963,-0.017507,-0.048613,...,-1.187500,-0.250000,,,,,,,1,3
AAPL,2007-02-02,10.810592,10.955813,10.756617,10.891556,155382500.0,1.692357e+09,0.000118,-0.009351,-0.007379,-0.042373,...,-4.000000,-1.000000,,,,,,,1,4
AAPL,2007-02-05,10.833725,10.953243,10.787460,10.787460,144713100.0,1.561087e+09,-0.009558,-0.020880,-0.023272,-0.032838,...,-0.250000,0.000000,,,,,,,1,0
AAPL,2007-02-06,10.853002,10.855572,10.648665,10.814448,216098400.0,2.336985e+09,0.002502,-0.006962,-0.016365,-0.018086,...,0.250000,0.343750,,,,,,,1,1
AAPL,2007-02-07,10.856857,11.101034,10.737339,11.071476,266706300.0,2.952832e+09,0.023767,0.016519,0.004899,-0.006344,...,0.250000,0.485107,,,,,,,1,2
AAPL,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...
AAPL,2016-12-23,114.162295,115.080808,114.162295,115.080808,14249484.0,1.639842e+09,0.001978,-0.003677,0.004743,0.022554,...,-0.006958,-0.037354,0.027344,0.009003,-0.008331,-0.026123,0.030823,0.053345,11,4
AAPL,2016-12-27,115.080808,116.344998,115.051178,115.811668,18296855.0,2.118989e+09,0.006351,0.001709,0.005316,0.034951,...,-0.005737,-0.035950,0.025146,0.009979,-0.009003,-0.026123,0.031006,0.053589,11,1
AAPL,2016-12-28,116.068456,116.558923,114.764760,115.317843,20905892.0,2.410822e+09,-0.004264,0.004042,-0.001625,0.013630,...,-0.006752,-0.036278,0.028320,0.009521,-0.008728,-0.026611,0.030823,0.054565,11,2
AAPL,2016-12-29,115.011672,115.663027,114.962290,115.288214,15039519.0,1.733879e+09,-0.000257,0.001802,-0.002819,0.013369,...,-0.006233,-0.035172,0.020508,0.009247,-0.008514,-0.024170,0.028625,0.051025,11,3


## Persist results

In [52]:
stock_data_keep = (stock_data_keep
        .drop(['open', 'high', 'low', 'close', 'volume'], axis=1)
        .replace((np.inf, -np.inf), np.nan))

In [53]:
stock_data_keep[(stock_data_keep.index.get_level_values(0) == 'AAPL') \
                & (stock_data_keep.index.get_level_values(1) > '2007-01-31')
                & (stock_data_keep['CMA_21']>0)
              ]

Unnamed: 0_level_0,Unnamed: 1_level_0,dollar_volume,return_1_days,return_3_days,return_5_days,return_10_days,return_21_days,return_42_days,return_63_days,return_126_days,return_252_days,...,RMW_21,CMA_21,ALPHA_63,Mkt-RF_63,SMB_63,HML_63,RMW_63,CMA_63,month,weekday
ticker,date,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1,Unnamed: 7_level_1,Unnamed: 8_level_1,Unnamed: 9_level_1,Unnamed: 10_level_1,Unnamed: 11_level_1,Unnamed: 12_level_1,Unnamed: 13_level_1,Unnamed: 14_level_1,Unnamed: 15_level_1,Unnamed: 16_level_1,Unnamed: 17_level_1,Unnamed: 18_level_1,Unnamed: 19_level_1,Unnamed: 20_level_1,Unnamed: 21_level_1,Unnamed: 22_level_1
AAPL,2007-02-06,2.336985e+09,0.002502,-0.006962,-0.016365,-0.018086,-0.010582,-0.936464,-0.921967,-0.936423,-0.924638,...,0.250000,0.343750,,,,,,,1,1
AAPL,2007-02-07,2.952832e+09,0.023767,0.016519,0.004899,-0.006344,0.007956,-0.935997,-0.918553,-0.934291,-0.921489,...,0.250000,0.485107,,,,,,,1,2
AAPL,2007-02-08,1.880123e+09,0.000348,0.026686,0.016993,-0.000812,-0.069029,-0.937232,-0.919324,-0.933329,-0.921489,...,1.125000,0.812500,,,,,,,1,3
AAPL,2007-02-09,2.302238e+09,-0.033767,-0.010458,-0.017463,-0.024713,-0.141546,-0.939262,-0.922122,-0.935638,-0.922536,...,0.281250,0.517426,,,,,,,1,4
AAPL,2007-02-12,1.974591e+09,0.019335,-0.014742,0.011198,-0.012334,-0.113987,-0.937756,-0.922121,-0.934417,-0.920947,...,1.125000,0.614258,,,,,,,1,0
AAPL,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...
AAPL,2016-10-25,5.591934e+09,0.005100,0.010166,0.006640,0.016767,0.047573,0.099284,0.154800,0.222791,0.047662,...,0.018902,0.324219,-0.018225,0.002823,-0.003066,-0.000895,-0.002191,-0.003750,9,1
AAPL,2016-10-26,7.511053e+09,-0.022495,-0.008662,-0.013064,-0.014914,0.022106,0.080886,0.113785,0.232972,0.030621,...,0.018782,0.297852,-0.015663,0.003106,-0.002235,-0.001114,-0.000206,-0.004627,9,2
AAPL,2016-10-27,3.887616e+09,-0.009603,-0.026944,-0.022040,-0.021371,0.004651,0.071709,0.104466,0.235331,-0.019670,...,0.016577,0.312012,-0.014038,0.003127,-0.001610,-0.001252,0.001112,-0.005231,9,3
AAPL,2016-10-28,4.230492e+09,-0.006639,-0.038309,-0.024700,-0.033240,0.013728,0.072830,0.078098,0.228441,-0.036358,...,0.017742,0.277832,-0.014183,0.003217,-0.001530,-0.001093,0.001049,-0.005358,9,4


In [54]:
stock_data_keep.info(null_counts=True)

<class 'pandas.core.frame.DataFrame'>
MultiIndex: 1239662 entries, ('A', Timestamp('2007-01-03 00:00:00')) to ('ZMH', Timestamp('2015-06-26 00:00:00'))
Data columns (total 36 columns):
 #   Column           Non-Null Count    Dtype  
---  ------           --------------    -----  
 0   dollar_volume    1239662 non-null  float64
 1   return_1_days    1239661 non-null  float64
 2   return_3_days    1239659 non-null  float64
 3   return_5_days    1239657 non-null  float64
 4   return_10_days   1239652 non-null  float64
 5   return_21_days   1239641 non-null  float64
 6   return_42_days   1239620 non-null  float64
 7   return_63_days   1239599 non-null  float64
 8   return_126_days  1239536 non-null  float64
 9   return_252_days  1239410 non-null  float64
 10  return_fwd       1239661 non-null  float64
 11  BBands_high      1239643 non-null  float64
 12  BBands_mid       1239643 non-null  float64
 13  BBands_low       1239643 non-null  float64
 14  BB_revert        1239662 non-null  int64  

In [55]:
with pd.HDFStore('../data/stock_prices.h5') as store:
    store.put('model_data', stock_data_keep)