`Fama-French`因子是`empyrical`包计算的基础数据，用于投资组合风险和绩效分析。`empyrical`包的数据取材于美国市场，不适用A股投资组合分析。本文尝试使用`Pipeline`，基于[Fama-French三因子模型](https://www.bogleheads.org/wiki/Fama_and_French_three-factor_model)，参考论文作者提供的[构造方法](http://mba.tuck.dartmouth.edu/pages/faculty/ken.french/Data_Library/f-f_factors.html)每天定期运行计算出相应数据，修改`empyrical`包即可使用源于A股市场的因子数据，使得真正能用于A股投资组合分析。相关[代码](https://www.quantopian.com/posts/computing-the-fama-french-factors-with-pipeline-1)及[材料](http://mba.tuck.dartmouth.edu/pages/faculty/ken.french/Data_Library/f-f_factors.html)，主要来自于`quantopian`论坛。

quantopian最近发布的`pipeline`API可以快速运行大型股票的计算。这创造了一个广阔的世界，其中之一就是[Fama-French Three Factor Model](https://www.bogleheads.org/wiki/Fama_and_French_three-factor_model)。计算这些因子需要划分大量的股票，这些涉及数以千计的股票：在`Pipeline`之前，这在Quantopian平台上是不可能的。现在它变成了可能。

## 原始5因子数据

### `quantopian/empyrical`数据

In [None]:
### 调用数据
import pandas as pd

In [None]:
fama_french = pd.read_pickle('fama_french_5.pkl')

In [None]:
fama_french.tail()

## 背景

### 三因子

据[因子构造方法](http://mba.tuck.dartmouth.edu/pages/faculty/ken.french/Data_Library/f-f_factors.html)：

+ Fama-French因子是由6个根据市值和账面价值划分，等权重资产组合构建形成的。请参阅6个市值/账面价值资产组合的描述。

+ SMB(Small Minus Big)为三个等权重小市值投资组合的平均回报减去三个大市值等权投资组合的平均回报。$$SMB = \frac{1}{3}*(Small Value + Small Neutral + Small Growth)
 - \frac{1}{3}*(Big Value + Big Neutral + Big Growth) $$

+ HML(High Minus Low)是两个等权市值组合的平均收益减去两个等权增长组合的平均收益。$$HML = \frac{1}{2}*(Small Value + Big Value)
 - \frac{1}{2}*(Small Growth + Big Growth)$$

+ `Rm-Rf`为市场超额收益，为股票收益率减去一个月国库券资金成本。

股票范围：
 + `Rm-Rf`包含所有股票
 + `SMB和HML`要求`t-1年`账面价值为正数；`t-1年12月`至`t年6月`有市值数据

### 五因子

[因子构造方法](http://mba.tuck.dartmouth.edu/pages/faculty/ken.french/Data_Library/f-f_5developed.html)：

+ 所有回报包括股息和资本收益，且不会持续复合计算。`Market`是一个地区的等权重股票组合的收益减去一个月的国库券利率。

`Fama/French 5 因子(2x3)`使用(1)市值和账面价值基础上构造的6个等权组合；(2)市值和营运盈利能力基础上构造的6个等权组合；(3)市值和投资基础上构造的6个等权组合等三项来构造。

为了构建`SMB`、`HML`、`RMW`和`CMA`因子，在每年六月底，我们将一个地区的股票分为两类市值和三个相应的账面市值(B/M)，运营盈利能力（OP）和投资（INV）组。大型股票指的是该地区6月份市值位于最高90％以上的股票，而小型股票指的是市值位于底部10％以下的股票。一个地区的B/M，OP和INV断点是该地区大型股票的相应比率的第30和第70百分位。

SMB(Small Minus Big)是九个小型股票投资组合的平均回报减去九个大股票投资组合的平均回报：$$SMB_{(B/M)} = \frac{1}{3} * (Small Value + Small Neutral + Small Growth) - \frac{1}{3} * (Big Value + Big Neutral + Big Growth)$$ 
$$SMB_{(OP)} = \frac{1}{3} * (Small Robust + Small Neutral + Small Weak) - \frac{1}{3} * (Big Robust + Big Neutral + Big Weak)$$ 
$$SMB_{(INV)} = \frac{1}{3} * (Small Conservative + Small Neutral + Small Aggressive) - \frac{1}{3} * (Big Conservative + Big Neutral + Big Aggressive)$$ 
$$SMB = \frac{1}{3} *  (SMB_{(B/M)} + SMB_{(OP)} + SMB_{(INV)})$$

`HML(High Minus Low)`是两个价值组合的平均收益减去两个增长组合的平均收益:

$$HML = \frac{1}{2} *  (Small Value + Big Value) - \frac{1}{2} *  (Small Growth + Big Growth)$$

`RMW(Robust Minus Weak)`是两个强劲的运营盈利能力组合的平均回报减去两个疲软的运营盈利能力组合的平均回报：$$RMW = \frac{1}{2} *  (Small Robust + Big Robust) - \frac{1}{2} *  (Small Weak + Big Weak)$$

`CMA(Conservative Minus Aggressive)`是两个保守投资组合（此处指选取公司投资风格保守的股票作为投资组合）的平均回报减去两个积极投资组合的平均回报：$$CMA = \frac{1}{2} *  (Small Conservative + Big Conservative) - \frac{1}{2} *  (Small Aggressive + Big Aggressive)$$

### 有关取样细节

+ Detail for Portfolios Formed on Size 排除ME < 0

## 构造因子

In [1]:
#%%prun
from zipline.pipeline import Pipeline
from zipline.pipeline.fundamentals import Fundamentals

  from ._conv import register_converters as _register_converters


### 辅助函数

### 三因子

### 五因子

In [None]:
import pandas as pd
import numpy as np
from quantopian.algorithm import attach_pipeline, pipeline_output
from quantopian.pipeline import Pipeline
from quantopian.pipeline import CustomFactor
from quantopian.pipeline.data.builtin import USEquityPricing
from quantopian.pipeline.data import morningstar

# time frame on which we want to compute Fama-French
normal_days = 31
# approximate the number of trading days in that period
# this is the number of trading days we'll look back on,
# on every trading day.
business_days = int(0.69 * normal_days)

class Returns(CustomFactor):
    """
    this factor outputs the returns over the period defined by 
    business_days, ending on the previous trading day, for every security.
    """
    window_length = business_days
    inputs = [USEquityPricing.close]
    def compute(self,today,assets,out,price):
        out[:] = (price[-1] - price[0]) / price[0] * 100

class MarketEquity(CustomFactor):
    """
    this factor outputs the market cap of every security on the day.
    """
    window_length = business_days
    inputs = [morningstar.valuation.market_cap]
    def compute(self,today,assets,out,mcap):
        out[:] = mcap[0]

class BookEquity(CustomFactor):
    """
    this factor outputs the book value of every security on the day.
    """
    window_length = business_days
    inputs = [morningstar.balance_sheet.tangible_book_value]
    def compute(self,today,assets,out,book):
        out[:] = book[0]
                                        
class CommonStock(CustomFactor):
    """
    this factor outputs 1.0 for all securities that are either common stock or SPY,
    and outputs 0.0 for all other securities. This is to filter out ETFs and other
    types of share that we do not wish to consider.
    """
    window_length = business_days
    inputs = [morningstar.share_class_reference.is_primary_share]
    def compute(self,today,assets,out, share_class):
        out[:] = ((share_class[-1].astype(bool)) | (assets == 8554)).astype(float)                                     
        
def initialize(context):
    """
    use our factors to add our pipes and screens.
    """
    pipe = Pipeline()
    attach_pipeline(pipe, 'ff_example')
    
    common_stock = CommonStock()
    # filter down to securities that are either common stock or SPY
    pipe.set_screen(common_stock.eq(1))
    mkt_cap = MarketEquity()
    pipe.add(mkt_cap,'market_cap')
    
    book_equity = BookEquity()
    # book equity over market equity
    be_me = book_equity/mkt_cap
    pipe.add(be_me,'be_me')

    returns = Returns()
    pipe.add(returns,'returns')
    schedule_function(func=print_fama_french, date_rule=date_rules.every_day())

def print_fama_french(context, data):
    # print the Fama-French factors for the period defined by business_days
    # ending on the previous trading day.
    print(context.rm_rf, context.smb, context.hml)
    
def before_trading_start(context,data):
    """
    every trading day, we use our pipes to construct the Fama-French
    portfolios, and then calculate the Fama-French factors appropriately.
    """
    spy = sid(8554)
    
    factors = pipeline_output('ff_example')
    
    # get the data we're going to use
    returns = factors['returns']
    mkt_cap = factors.sort(['market_cap'], ascending=True)
    be_me = factors.sort(['be_me'], ascending=True)
    
    # to compose the six portfolios, split our universe into portions
    half = int(len(mkt_cap)*0.5)
    small_caps = mkt_cap[:half]
    big_caps = mkt_cap[half:]
    
    thirty = int(len(be_me)*0.3)
    seventy = int(len(be_me)*0.7)
    growth = be_me[:thirty]
    neutral = be_me[thirty:seventy]
    value = be_me[seventy:]
    
    # now use the portions to construct the portfolios.
    # note: these portfolios are just lists (indices) of equities
    small_value = small_caps.index.intersection(value.index)
    small_neutral = small_caps.index.intersection(neutral.index)
    small_growth = small_caps.index.intersection(growth.index)
    
    big_value = big_caps.index.intersection(value.index)
    big_neutral = big_caps.index.intersection(neutral.index)
    big_growth = big_caps.index.intersection(growth.index)
    
    # take the mean to get the portfolio return, assuming uniform
    # allocation to its constituent equities.
    sv = returns[small_value].mean()
    sn = returns[small_neutral].mean()
    sg = returns[small_growth].mean()
    
    bv = returns[big_value].mean()
    bn = returns[big_neutral].mean()
    bg = returns[big_growth].mean()
    
    # computing Rm-Rf (Market Returns - Risk-Free Returns). we take the 
    # rate of risk-free returns to be zero, so this is simply SPY's returns.
    # have to set an initial dummy value
    context.rm_rf = float('nan')
    if spy in returns.index:
        context.rm_rf = returns.loc[spy]
    
    # computing SMB
    context.smb = (sv + sn + sg)/3 - (bv + bn + bg)/3
    
    # computing HML
    context.hml = (sv + bv)/2 - (sg + bg)/2

In [None]:
from zipline.api import symbol, sid, get_datetime

import pandas as pd
import numpy as np
from zipline.api import attach_pipeline, pipeline_output
from zipline.pipeline import Pipeline
from zipline.pipeline import CustomFactor
from zipline.pipeline.data import USEquityPricing
from zipline.pipeline.fundamentals import Fundamentals

# time frame on which we want to compute Fama-French
normal_days = 31
# approximate the number of trading days in that period
# this is the number of trading days we'll look back on,
# on every trading day.
business_days = int(0.69 * normal_days)

# 以下自定义因子选取期初数
class Returns(CustomFactor):
    """
    每个交易日每个股票窗口长度"business_days"期间收益率
    """
    window_length = business_days
    inputs = [USEquityPricing.close]
    def compute(self, today, assets, out, price):
        out[:] = (price[-1] - price[0]) / price[0] * 100

class MarketEquity(CustomFactor):
    """
    每个交易日每只股票所对应的总市值
    """
    window_length = business_days
    inputs = [USEquityPricing.tmv]
    def compute(self,today,assets,out,mcap):
        out[:] = mcap[0]

class BookEquity(CustomFactor):
    """
    每个交易日每只股票所对应的账面价值（所有者权益）
    """
    window_length = business_days
    inputs = [Fundamentals.balance_sheet.A107]
    def compute(self, today, assets, out, book):
        out[:] = book[0]
                                                                             
def initialize(context):
    """
    use our factors to add our pipes and screens.
    """
    pipe = Pipeline()
    mkt_cap = MarketEquity()
    pipe.add(mkt_cap,'market_cap')
    
    book_equity = BookEquity()
    # book equity over market equity
    be_me = book_equity / mkt_cap
    pipe.add(be_me,'be_me')

    returns = Returns()
    pipe.add(returns,'returns')

    attach_pipeline(pipe, 'ff_example')

    
def before_trading_start(context,data):
    """
    every trading day, we use our pipes to construct the Fama-French
    portfolios, and then calculate the Fama-French factors appropriately.
    """
     
    factors = pipeline_output('ff_example')
    
    # get the data we're going to use
    returns = factors['returns']
    mkt_cap = factors.sort_values(['market_cap'], ascending=True)
    be_me = factors.sort_values(['be_me'], ascending=True)
    
    # to compose the six portfolios, split our universe into portions
    half = int(len(mkt_cap) * 0.5)
    small_caps = mkt_cap[:half]
    big_caps = mkt_cap[half:]
    
    thirty = int(len(be_me) * 0.3)
    seventy = int(len(be_me) * 0.7)
    growth = be_me[:thirty]
    neutral = be_me[thirty:seventy]
    value = be_me[seventy:]
    
    # now use the portions to construct the portfolios.
    # note: these portfolios are just lists (indices) of equities
    small_value = small_caps.index.intersection(value.index)
    small_neutral = small_caps.index.intersection(neutral.index)
    small_growth = small_caps.index.intersection(growth.index)
    
    big_value = big_caps.index.intersection(value.index)
    big_neutral = big_caps.index.intersection(neutral.index)
    big_growth = big_caps.index.intersection(growth.index)
    
    # take the mean to get the portfolio return, assuming uniform
    # allocation to its constituent equities.
    sv = returns[small_value].mean()
    sn = returns[small_neutral].mean()
    sg = returns[small_growth].mean()
    
    bv = returns[big_value].mean()
    bn = returns[big_neutral].mean()
    bg = returns[big_growth].mean()
    
    # computing Rm-Rf (Market Returns - Risk-Free Returns).  we take the
    # rate of risk-free returns to be zero, so this is simply SPY's returns.
    # have to set an initial dummy value
    #context.rm_rf = float('nan')
    #if spy in returns.index:
        #context.rm_rf = returns.loc[spy]
    
    # computing SMB
    context.smb = (sv + sn + sg) / 3 - (bv + bn + bg) / 3
    
    # computing HML
    context.hml = (sv + bv) / 2 - (sg + bg) / 2

def handle_data(context, data):
    # print the Fama-French factors for the period defined by business_days
    # ending on the previous trading day.
    print(context.smb, context.hml)

In [2]:
import numpy as np

In [3]:
x = [1,2,3,4]

In [4]:
np.average(x)

2.5

In [5]:
x = [1,2,0,3,4]

In [6]:
np.average(x)

2.0