# 资本资产定价模型 Capital Asset Pricing Model

## Alpha and Beta

According to the Capital Asset Pricing Model (CAPM) of Sharpe (1964), Lintner (1965), and Mossin (1966), the expected return of any security is equal to the return on the riskless security plus the security’s market beta multiplied by the market risk premium. Expressed mathematically, the CAPM is

$$
E\left[R_{i, t}\right] - R_{f, t} = \beta_i\left(E\left[R_{m, t}\right]-R_{f, t}\right)
$$

where $E\left[R_{i, t}\right]$ is the individual stock return, $E\left[R_{f, t}\right]$ is the risk-free rate. $E\left[R_{m, t}\right]$ is the stock market return.

And the security’s beta is given by

$$
\beta_i=\frac{\operatorname{Cov}\left(R_{i, t}, R_{m, t}\right)}{\operatorname{Var}\left(R_{m, t}\right)}
$$

1. The CAPM predicts that cross-sectional variation in the expected returns of different securities is driven only by cross-sectional variation in the betas of the securities. 
   
    在横截面上，所有股票预期收益率的变化只来源于股票贝塔的变化。

    - The first group of tests examines the cross-sectional ability of beta to predict the future excess returns.

      第一类方式：检验个股贝塔预测未来的股票收益率

    - The second group of tests examines the cross-sectional ability of other variables to predict future excess returns.
      
      第二类方式：检验其他能够预测个股未来回报率指标的能力
         - Size 规模
         - EP 盈利/市值比率
         - To 换手率

2.  The second empirically testable prediction of the CAPM is that the average excess returns, after accounting for the effect of beta, should be zero. To test this hypothesis, researchers frequently examine the intercept term of cross-sectional regressions of security excess returns on estimates of beta.

    CAPM模型另一个实证的检验标准是：横截面上所有股票的变化能够完全被贝塔解释掉，所以在考虑了贝塔的影响之后，没有额外的回报率，即没有$\alpha$。

## 使用贝塔构造投资组合

$t$月的高贝塔股票在$t+1$月能否获得更高的回报率，相比于低贝塔股票


In [1]:
import numpy as np # 数据处理最重要的模块
import pandas as pd # 数据处理最重要的模块
import scipy.stats as stats # 统计模块
import scipy
# import pymysql  # 导入数据库模块

from datetime import datetime # 时间模块
import statsmodels.formula.api as smf  # OLS regression

# import pyreadr # read RDS file

from matplotlib import style
import matplotlib.pyplot as plt  # 画图模块
import matplotlib.dates as mdates

from matplotlib.font_manager import FontProperties # 作图中文
from pylab import mpl
#mpl.rcParams['font.sans-serif'] = ['SimHei']
#plt.rcParams['font.family'] = 'Times New Roman'

#输出矢量图 渲染矢量图
%matplotlib inline
%config InlineBackend.figure_format = 'svg'

from IPython.core.interactiveshell import InteractiveShell # jupyter运行输出的模块
#显示每一个运行结果
InteractiveShell.ast_node_interactivity = 'all'

#设置行不限制数量
#pd.set_option('display.max_rows',None)

#设置列不限制数量
pd.set_option('display.max_columns', None)

In [10]:
cross = pd.read_csv('datasets/ret_mon_python2023.csv')
from pandas.tseries.offsets import MonthEnd
cross['month'] = pd.to_datetime(cross['month'], format='%Y-%m-%d') + MonthEnd(1)
# 补齐股票代码 如果不满6位 在前面补上0
cross['Stkcd'] = cross['Stkcd'].apply(lambda x: '{:0>6}'.format(x)) # 6位股票代码
cross['w'] = 1
cross = cross.dropna(subset=['ret'])
cross

Unnamed: 0,Stkcd,month,price,Rank,Freq,floatingvalue,totalvalue,sizef,sizet,Return,rfmonth,ret,next_ret,w
1,000001,1991-05-31,38.34,2,24.0,1.016010e+09,1.859497e+09,20.739149,21.343572,-0.122253,0.006092,-0.128345,-0.119551,1
2,000001,1991-06-30,33.99,3,23.0,9.007350e+08,1.648521e+09,20.618722,21.223144,-0.113459,0.006092,-0.119551,-0.137013,1
3,000001,1991-07-31,29.54,4,16.0,7.828100e+08,1.432695e+09,20.478401,21.082823,-0.130921,0.006092,-0.137013,-0.417680,1
4,000001,1991-08-31,15.00,5,15.0,6.748338e+08,1.346275e+09,20.329977,21.020607,-0.411588,0.006092,-0.417680,-0.039425,1
5,000001,1991-09-30,14.50,6,24.0,6.523394e+08,1.301399e+09,20.296075,20.986706,-0.033333,0.006092,-0.039425,0.849080,1
...,...,...,...,...,...,...,...,...,...,...,...,...,...,...
752023,605599,2023-08-31,13.56,24,23.0,7.955351e+09,1.054667e+10,22.797111,23.079076,-0.025862,0.001241,-0.027103,0.071030,1
752024,605599,2023-09-30,14.54,25,20.0,8.530295e+09,1.130889e+10,22.866890,23.148855,0.072271,0.001241,0.071030,-0.048696,1
752025,605599,2023-10-31,13.85,26,17.0,8.125488e+09,1.077222e+10,22.818272,23.100237,-0.047455,0.001241,-0.048696,-0.027956,1
752026,605599,2023-11-30,13.48,27,22.0,7.908417e+09,1.048444e+10,22.791193,23.073159,-0.026715,0.001241,-0.027956,0.103358,1


# 导入市场收益率数据 
注意这和上证指数计算的收益率不同，但同样是市场层面数据

$$
MKT_t = R_t - R_{f,t}
$$


In [11]:
Market_ret = pd.read_excel('datasets/Marketret_mon_stock2023.xlsx')
Market_ret['month'] = pd.to_datetime(Market_ret['month'],format='%b %Y') + MonthEnd(1)
Market_ret.rename(columns={'ret':'MKT'}, inplace=True)
Market_ret

Unnamed: 0,month,MarketR,MarketR_e,rfmonth,MKT,ret_e,marketret3,marketret6,marketret12,Q
0,1991-01-31,0.029998,0.036554,0.006930,0.023068,0.029624,-0.084127,-0.305662,0.254049,1991 Q1
1,1991-02-28,0.010203,0.021860,0.006930,0.003273,0.014930,-0.183573,-0.384745,0.241492,1991 Q1
2,1991-03-31,-0.099663,-0.060425,0.006930,-0.106593,-0.067355,-0.252928,-0.445049,0.288857,1991 Q1
3,1991-04-30,-0.079652,-0.031449,0.006651,-0.086303,-0.038100,-0.234776,-0.394937,0.691749,1991 Q2
4,1991-05-31,-0.074521,0.005375,0.006092,-0.080613,-0.000717,-0.236294,0.181673,1.542701,1991 Q2
...,...,...,...,...,...,...,...,...,...,...
391,2023-08-31,-0.055570,-0.041538,0.001241,-0.056811,-0.042779,-0.089387,,,2023 Q3
392,2023-09-30,-0.005689,0.000357,0.001241,-0.006930,-0.000884,-0.033855,,,2023 Q3
393,2023-10-31,-0.026321,-0.009959,0.001241,-0.027562,-0.011200,-0.047817,,,2023 Q4
394,2023-11-30,0.001790,0.034280,0.001241,0.000549,0.033039,,,,2023 Q4


In [12]:
cross = pd.merge(cross,Market_ret[['month','MKT']],left_on='month',right_on='month',how='left')
cross

Unnamed: 0,Stkcd,month,price,Rank,Freq,floatingvalue,totalvalue,sizef,sizet,Return,rfmonth,ret,next_ret,w,MKT
0,000001,1991-05-31,38.34,2,24.0,1.016010e+09,1.859497e+09,20.739149,21.343572,-0.122253,0.006092,-0.128345,-0.119551,1,-0.080613
1,000001,1991-06-30,33.99,3,23.0,9.007350e+08,1.648521e+09,20.618722,21.223144,-0.113459,0.006092,-0.119551,-0.137013,1,-0.085440
2,000001,1991-07-31,29.54,4,16.0,7.828100e+08,1.432695e+09,20.478401,21.082823,-0.130921,0.006092,-0.137013,-0.417680,1,-0.088189
3,000001,1991-08-31,15.00,5,15.0,6.748338e+08,1.346275e+09,20.329977,21.020607,-0.411588,0.006092,-0.417680,-0.039425,1,-0.090277
4,000001,1991-09-30,14.50,6,24.0,6.523394e+08,1.301399e+09,20.296075,20.986706,-0.033333,0.006092,-0.039425,0.849080,1,-0.031046
...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...
709878,605599,2023-08-31,13.56,24,23.0,7.955351e+09,1.054667e+10,22.797111,23.079076,-0.025862,0.001241,-0.027103,0.071030,1,-0.056811
709879,605599,2023-09-30,14.54,25,20.0,8.530295e+09,1.130889e+10,22.866890,23.148855,0.072271,0.001241,0.071030,-0.048696,1,-0.006930
709880,605599,2023-10-31,13.85,26,17.0,8.125488e+09,1.077222e+10,22.818272,23.100237,-0.047455,0.001241,-0.048696,-0.027956,1,-0.027562
709881,605599,2023-11-30,13.48,27,22.0,7.908417e+09,1.048444e+10,22.791193,23.073159,-0.026715,0.001241,-0.027956,0.103358,1,0.000549


# 估计个股的$\beta$
使用t月前60个月的月数据来滚动回归估计个股$\beta$

In [16]:
import statsmodels.api as sm

# 假设我们有一个DataFrame 'data'，其中包含个股和市场的月度收益率
# 'data' 包含四列：'month'（月份），'Stkcd'（股票代码），'stock_return'（个股收益率），'market_return'（市场收益率）

# 定义一个函数来计算滚动回归的贝塔和使用的数据点数量
def rolling_beta_per_stock(data, window_months=60):
    betas = []
    months = []
    Stkcds = []
    data_counts = []
    
    # 按股票分组
    grouped = data.groupby('Stkcd')
    
    for Stkcd, group in grouped:
        group = group.set_index('month').sort_index()
        end_months = group.index.unique()
        
        for end_month in end_months:
            start_month = end_month - pd.DateOffset(months=window_months)
            window_data = group.loc[start_month:end_month]
            
            if len(window_data) > 0:
                X = sm.add_constant(window_data['MKT'])
                y = window_data['ret']
                model = sm.OLS(y, X).fit()
                
                beta = model.params['MKT']
                betas.append(beta)
                months.append(end_month)
                Stkcds.append(Stkcd)
                data_counts.append(len(window_data))  # 记录使用的数据点数量
    
    return pd.DataFrame({'Stkcd': Stkcds, 'month': months, 'beta': betas, 'data_count': data_counts})

# 计算每只股票的滚动贝塔和数据点数量
rolling_betas = rolling_beta_per_stock(cross)

# 打印结果
print(rolling_betas)

         Stkcd      month       beta  data_count
0       000001 1991-05-31   1.592113           1
1       000001 1991-06-30  -1.821836           2
2       000001 1991-07-31   0.807037           3
3       000001 1991-08-31  23.378197           4
4       000001 1991-09-30   3.358613           5
...        ...        ...        ...         ...
709878  605599 2023-08-31   1.040511          23
709879  605599 2023-09-30   1.038189          24
709880  605599 2023-10-31   1.052476          25
709881  605599 2023-11-30   1.047631          26
709882  605599 2023-12-31   1.017220          27

[709883 rows x 4 columns]


In [18]:
# save rolling_betas
rolling_betas.to_csv('datasets/rolling_betas.csv', index=False)

In [20]:
cross_beta = pd.merge(cross,rolling_betas,on=("Stkcd",'month'),how='left')
cross_beta

Unnamed: 0,Stkcd,month,price,Rank,Freq,floatingvalue,totalvalue,sizef,sizet,Return,rfmonth,ret,next_ret,w,MKT,beta,data_count
0,000001,1991-05-31,38.34,2,24.0,1.016010e+09,1.859497e+09,20.739149,21.343572,-0.122253,0.006092,-0.128345,-0.119551,1,-0.080613,1.592113,1
1,000001,1991-06-30,33.99,3,23.0,9.007350e+08,1.648521e+09,20.618722,21.223144,-0.113459,0.006092,-0.119551,-0.137013,1,-0.085440,-1.821836,2
2,000001,1991-07-31,29.54,4,16.0,7.828100e+08,1.432695e+09,20.478401,21.082823,-0.130921,0.006092,-0.137013,-0.417680,1,-0.088189,0.807037,3
3,000001,1991-08-31,15.00,5,15.0,6.748338e+08,1.346275e+09,20.329977,21.020607,-0.411588,0.006092,-0.417680,-0.039425,1,-0.090277,23.378197,4
4,000001,1991-09-30,14.50,6,24.0,6.523394e+08,1.301399e+09,20.296075,20.986706,-0.033333,0.006092,-0.039425,0.849080,1,-0.031046,3.358613,5
...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...
709878,605599,2023-08-31,13.56,24,23.0,7.955351e+09,1.054667e+10,22.797111,23.079076,-0.025862,0.001241,-0.027103,0.071030,1,-0.056811,1.040511,23
709879,605599,2023-09-30,14.54,25,20.0,8.530295e+09,1.130889e+10,22.866890,23.148855,0.072271,0.001241,0.071030,-0.048696,1,-0.006930,1.038189,24
709880,605599,2023-10-31,13.85,26,17.0,8.125488e+09,1.077222e+10,22.818272,23.100237,-0.047455,0.001241,-0.048696,-0.027956,1,-0.027562,1.052476,25
709881,605599,2023-11-30,13.48,27,22.0,7.908417e+09,1.048444e+10,22.791193,23.073159,-0.026715,0.001241,-0.027956,0.103358,1,0.000549,1.047631,26


In [21]:
fenweishu = pd.DataFrame(
    cross_beta.groupby(['month'])['beta'].quantile([0.1,0.2,0.3,0.4,0.5,0.6,0.7,0.8,0.9]))
fenweishu = fenweishu.reset_index()
fenweishu = fenweishu.pivot_table(index='month',columns='level_1',values='beta')
fenweishu.columns = ['one','two','three','four','five','six','seven','eight','nine']
fenweishu

Unnamed: 0_level_0,one,two,three,four,five,six,seven,eight,nine
month,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1,Unnamed: 7_level_1,Unnamed: 8_level_1,Unnamed: 9_level_1
1991-01-31,-0.088044,0.124328,0.289232,0.454136,0.967466,1.480796,1.942908,2.405020,2.973231
1991-02-28,-11.631195,-4.154010,-1.697873,-0.793130,0.498030,1.528760,2.494322,5.404324,11.773877
1991-03-31,0.172545,0.364977,0.494600,0.671041,0.820313,0.927383,1.022277,1.125327,1.356849
1991-04-30,-0.154109,0.114161,0.553247,0.751669,0.840796,0.937789,0.988732,1.058314,1.346014
1991-05-31,-0.544016,-0.090488,0.737830,0.863803,0.920035,0.933762,0.953147,1.166812,1.463361
...,...,...,...,...,...,...,...,...,...
2023-08-31,0.520065,0.716899,0.843055,0.942821,1.030354,1.125748,1.236319,1.379716,1.605872
2023-09-30,0.507240,0.702471,0.832118,0.931681,1.014984,1.117847,1.230089,1.384121,1.610116
2023-10-31,0.503186,0.705417,0.832456,0.929876,1.016549,1.116539,1.228589,1.381621,1.612616
2023-11-30,0.509124,0.697798,0.826000,0.922128,1.012127,1.109470,1.227813,1.385341,1.613584


In [22]:
portfolio = pd.merge(cross_beta,fenweishu,on='month')
portfolio

Unnamed: 0,Stkcd,month,price,Rank,Freq,floatingvalue,totalvalue,sizef,sizet,Return,rfmonth,ret,next_ret,w,MKT,beta,data_count,one,two,three,four,five,six,seven,eight,nine
0,000001,1991-05-31,38.34,2,24.0,1.016010e+09,1.859497e+09,20.739149,21.343572,-0.122253,0.006092,-0.128345,-0.119551,1,-0.080613,1.592113,1,-0.544016,-0.090488,0.737830,0.863803,0.920035,0.933762,0.953147,1.166812,1.463361
1,000001,1991-06-30,33.99,3,23.0,9.007350e+08,1.648521e+09,20.618722,21.223144,-0.113459,0.006092,-0.119551,-0.137013,1,-0.085440,-1.821836,2,-0.727482,-0.706277,-0.487221,0.322595,0.414984,0.546725,0.679746,0.919591,1.413585
2,000001,1991-07-31,29.54,4,16.0,7.828100e+08,1.432695e+09,20.478401,21.082823,-0.130921,0.006092,-0.137013,-0.417680,1,-0.088189,0.807037,3,-0.778594,-0.408574,0.217481,0.274415,0.322818,0.623425,0.849288,1.392277,1.720269
3,000001,1991-08-31,15.00,5,15.0,6.748338e+08,1.346275e+09,20.329977,21.020607,-0.411588,0.006092,-0.417680,-0.039425,1,-0.090277,23.378197,4,-0.916648,-0.380853,-0.197028,-0.072535,-0.065783,0.114942,0.739487,1.071835,1.405071
4,000001,1991-09-30,14.50,6,24.0,6.523394e+08,1.301399e+09,20.296075,20.986706,-0.033333,0.006092,-0.039425,0.849080,1,-0.031046,3.358613,5,-0.282719,-0.196051,-0.171487,-0.113894,-0.040992,0.328480,0.572807,1.169300,1.657930
...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...
709878,605599,2023-08-31,13.56,24,23.0,7.955351e+09,1.054667e+10,22.797111,23.079076,-0.025862,0.001241,-0.027103,0.071030,1,-0.056811,1.040511,23,0.520065,0.716899,0.843055,0.942821,1.030354,1.125748,1.236319,1.379716,1.605872
709879,605599,2023-09-30,14.54,25,20.0,8.530295e+09,1.130889e+10,22.866890,23.148855,0.072271,0.001241,0.071030,-0.048696,1,-0.006930,1.038189,24,0.507240,0.702471,0.832118,0.931681,1.014984,1.117847,1.230089,1.384121,1.610116
709880,605599,2023-10-31,13.85,26,17.0,8.125488e+09,1.077222e+10,22.818272,23.100237,-0.047455,0.001241,-0.048696,-0.027956,1,-0.027562,1.052476,25,0.503186,0.705417,0.832456,0.929876,1.016549,1.116539,1.228589,1.381621,1.612616
709881,605599,2023-11-30,13.48,27,22.0,7.908417e+09,1.048444e+10,22.791193,23.073159,-0.026715,0.001241,-0.027956,0.103358,1,0.000549,1.047631,26,0.509124,0.697798,0.826000,0.922128,1.012127,1.109470,1.227813,1.385341,1.613584


In [24]:
portfolio['sort'] = np.where(
    portfolio['beta'] <= portfolio['one'], 'P1',
    np.where(
        portfolio['beta'] <= portfolio['two'], 'P2',
        np.where(
            portfolio['beta'] <= portfolio['three'], 'P3',
            np.where(
                portfolio['beta'] <= portfolio['four'], 'P4',
                np.where(
                    portfolio['beta'] <= portfolio['five'], 'P5',
                    np.where(
                        portfolio['beta'] <= portfolio['six'], 'P6',
                        np.where(
                            portfolio['beta'] <= portfolio['seven'], 'P7',
                            np.where(
                                portfolio['beta'] <= portfolio['eight'], 'P8',
                                np.where(
                                    portfolio['beta'] <= portfolio['nine'],
                                    'P9', 'Pmax')))))))))
portfolio = portfolio.dropna(subset=['floatingvalue','next_ret','beta'])
portfolio

Unnamed: 0,Stkcd,month,price,Rank,Freq,floatingvalue,totalvalue,sizef,sizet,Return,rfmonth,ret,next_ret,w,MKT,beta,data_count,one,two,three,four,five,six,seven,eight,nine,sort
0,000001,1991-05-31,38.34,2,24.0,1.016010e+09,1.859497e+09,20.739149,21.343572,-0.122253,0.006092,-0.128345,-0.119551,1,-0.080613,1.592113,1,-0.544016,-0.090488,0.737830,0.863803,0.920035,0.933762,0.953147,1.166812,1.463361,Pmax
1,000001,1991-06-30,33.99,3,23.0,9.007350e+08,1.648521e+09,20.618722,21.223144,-0.113459,0.006092,-0.119551,-0.137013,1,-0.085440,-1.821836,2,-0.727482,-0.706277,-0.487221,0.322595,0.414984,0.546725,0.679746,0.919591,1.413585,P1
2,000001,1991-07-31,29.54,4,16.0,7.828100e+08,1.432695e+09,20.478401,21.082823,-0.130921,0.006092,-0.137013,-0.417680,1,-0.088189,0.807037,3,-0.778594,-0.408574,0.217481,0.274415,0.322818,0.623425,0.849288,1.392277,1.720269,P7
3,000001,1991-08-31,15.00,5,15.0,6.748338e+08,1.346275e+09,20.329977,21.020607,-0.411588,0.006092,-0.417680,-0.039425,1,-0.090277,23.378197,4,-0.916648,-0.380853,-0.197028,-0.072535,-0.065783,0.114942,0.739487,1.071835,1.405071,Pmax
4,000001,1991-09-30,14.50,6,24.0,6.523394e+08,1.301399e+09,20.296075,20.986706,-0.033333,0.006092,-0.039425,0.849080,1,-0.031046,3.358613,5,-0.282719,-0.196051,-0.171487,-0.113894,-0.040992,0.328480,0.572807,1.169300,1.657930,Pmax
...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...
709877,605599,2023-07-31,13.92,23,21.0,8.166555e+09,1.082667e+10,22.823313,23.105278,0.053747,0.001241,0.052506,-0.027103,1,0.035528,1.059668,22,0.537803,0.726667,0.852566,0.952938,1.042133,1.137512,1.251751,1.398082,1.634288,P6
709878,605599,2023-08-31,13.56,24,23.0,7.955351e+09,1.054667e+10,22.797111,23.079076,-0.025862,0.001241,-0.027103,0.071030,1,-0.056811,1.040511,23,0.520065,0.716899,0.843055,0.942821,1.030354,1.125748,1.236319,1.379716,1.605872,P6
709879,605599,2023-09-30,14.54,25,20.0,8.530295e+09,1.130889e+10,22.866890,23.148855,0.072271,0.001241,0.071030,-0.048696,1,-0.006930,1.038189,24,0.507240,0.702471,0.832118,0.931681,1.014984,1.117847,1.230089,1.384121,1.610116,P6
709880,605599,2023-10-31,13.85,26,17.0,8.125488e+09,1.077222e+10,22.818272,23.100237,-0.047455,0.001241,-0.048696,-0.027956,1,-0.027562,1.052476,25,0.503186,0.705417,0.832456,0.929876,1.016549,1.116539,1.228589,1.381621,1.612616,P6


In [33]:
portfolio = portfolio.dropna(subset=['next_ret','floatingvalue','beta'])
portfolio_beta =  pd.DataFrame(
    portfolio.groupby(['month','sort']).apply(lambda x: np.average(x['next_ret'],weights = x['floatingvalue']),include_groups=False))
portfolio_beta

Unnamed: 0_level_0,Unnamed: 1_level_0,0
month,sort,Unnamed: 2_level_1
1991-01-31,P1,-0.041567
1991-01-31,P2,0.010496
1991-01-31,P4,0.038284
1991-01-31,P6,0.081305
1991-01-31,P8,0.028134
...,...,...
2023-11-30,P6,-0.029253
2023-11-30,P7,-0.022432
2023-11-30,P8,-0.025132
2023-11-30,P9,-0.023712


In [34]:
portfolio_beta = portfolio_beta.reset_index()
portfolio_beta.columns = ['month', 'sort', 'p']
portfolio_beta['month'] = portfolio_beta['month'] + MonthEnd(1)
portfolio_beta

Unnamed: 0,month,sort,p
0,1991-02-28,P1,-0.041567
1,1991-02-28,P2,0.010496
2,1991-02-28,P4,0.038284
3,1991-02-28,P6,0.081305
4,1991-02-28,P8,0.028134
...,...,...,...
3941,2023-12-31,P6,-0.029253
3942,2023-12-31,P7,-0.022432
3943,2023-12-31,P8,-0.025132
3944,2023-12-31,P9,-0.023712


In [35]:
portfolio_beta = portfolio_beta.pivot_table(index='month',
                                            columns='sort',
                                            values='p')
portfolio_beta['My_portfolio'] = portfolio_beta['Pmax'] - portfolio_beta['P1']
portfolio_beta

sort,P1,P2,P3,P4,P5,P6,P7,P8,P9,Pmax,My_portfolio
month,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1,Unnamed: 7_level_1,Unnamed: 8_level_1,Unnamed: 9_level_1,Unnamed: 10_level_1,Unnamed: 11_level_1
1991-02-28,-0.041567,0.010496,,0.038284,,0.081305,,0.028134,,-0.002070,0.039497
1991-03-31,-0.133253,-0.133293,-0.070455,-0.044343,-0.125264,0.070941,-0.069140,0.023904,-0.133504,-0.059149,0.074104
1991-04-30,0.087227,0.120035,-0.000595,-0.097744,-0.103582,-0.123391,-0.133193,-0.077512,0.070484,-0.122736,-0.209963
1991-05-31,0.202117,0.238572,0.080639,-0.120582,-0.130814,-0.129118,-0.031989,0.055289,-0.006657,-0.036993,-0.239110
1991-06-30,0.201555,0.190283,-0.460637,0.190006,0.213185,-0.115097,0.184748,0.177858,-0.309901,-0.119551,-0.321106
...,...,...,...,...,...,...,...,...,...,...,...
2023-08-31,-0.036448,-0.052865,-0.068313,-0.054478,-0.067242,-0.059165,-0.054825,-0.059189,-0.064142,-0.070448,-0.034000
2023-09-30,0.013735,0.014267,-0.002743,-0.009549,-0.005910,-0.002246,-0.016275,-0.035363,-0.017042,-0.019017,-0.032751
2023-10-31,-0.018309,-0.046179,-0.034002,-0.044583,-0.016691,-0.031060,-0.041582,-0.038645,-0.019552,0.002772,0.021081
2023-11-30,0.011395,0.010197,0.011212,0.012365,0.002974,-0.023128,0.026235,-0.017067,-0.018870,-0.007422,-0.018816


In [37]:
portfolio_beta = portfolio_beta['1995-01':'2023-12']
portfolio_beta

sort,P1,P2,P3,P4,P5,P6,P7,P8,P9,Pmax,My_portfolio
month,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1,Unnamed: 7_level_1,Unnamed: 8_level_1,Unnamed: 9_level_1,Unnamed: 10_level_1,Unnamed: 11_level_1
1995-01-31,-0.121734,-0.083400,-0.138382,-0.099037,-0.113427,-0.133506,-0.133194,-0.142164,-0.150120,-0.149496,-0.027762
1995-02-28,-0.017841,-0.008338,-0.009659,-0.017700,-0.004894,-0.023509,-0.032170,-0.035273,-0.043114,-0.045020,-0.027179
1995-03-31,0.092645,0.058423,0.077588,0.103556,0.102440,0.137761,0.136026,0.166411,0.205497,0.189397,0.096752
1995-04-30,-0.121164,-0.107233,-0.122560,-0.095786,-0.122881,-0.127712,-0.103340,-0.100386,-0.118406,-0.114404,0.006761
1995-05-31,0.108133,0.085052,0.123498,0.106382,0.159479,0.189940,0.203092,0.249722,0.241654,0.230451,0.122318
...,...,...,...,...,...,...,...,...,...,...,...
2023-08-31,-0.036448,-0.052865,-0.068313,-0.054478,-0.067242,-0.059165,-0.054825,-0.059189,-0.064142,-0.070448,-0.034000
2023-09-30,0.013735,0.014267,-0.002743,-0.009549,-0.005910,-0.002246,-0.016275,-0.035363,-0.017042,-0.019017,-0.032751
2023-10-31,-0.018309,-0.046179,-0.034002,-0.044583,-0.016691,-0.031060,-0.041582,-0.038645,-0.019552,0.002772,0.021081
2023-11-30,0.011395,0.010197,0.011212,0.012365,0.002974,-0.023128,0.026235,-0.017067,-0.018870,-0.007422,-0.018816


In [40]:
model = smf.ols('My_portfolio ~ 1',
                 data=portfolio_beta['1995-01':'2023-12']).fit(
                     cov_type='HAC', cov_kwds={'maxlags': 6})
print(model.summary())

                            OLS Regression Results                            
Dep. Variable:           My_portfolio   R-squared:                       0.000
Model:                            OLS   Adj. R-squared:                  0.000
Method:                 Least Squares   F-statistic:                       nan
Date:                Tue, 26 Nov 2024   Prob (F-statistic):                nan
Time:                        13:34:54   Log-Likelihood:                 499.54
No. Observations:                 348   AIC:                            -997.1
Df Residuals:                     347   BIC:                            -993.2
Df Model:                           0                                         
Covariance Type:                  HAC                                         
                 coef    std err          z      P>|z|      [0.025      0.975]
------------------------------------------------------------------------------
Intercept     -0.0073      0.003     -2.680      0.0

## 是否存在$\alpha$

In [42]:
cross_beta

Unnamed: 0,Stkcd,month,price,Rank,Freq,floatingvalue,totalvalue,sizef,sizet,Return,rfmonth,ret,next_ret,w,MKT,beta,data_count
0,000001,1991-05-31,38.34,2,24.0,1.016010e+09,1.859497e+09,20.739149,21.343572,-0.122253,0.006092,-0.128345,-0.119551,1,-0.080613,1.592113,1
1,000001,1991-06-30,33.99,3,23.0,9.007350e+08,1.648521e+09,20.618722,21.223144,-0.113459,0.006092,-0.119551,-0.137013,1,-0.085440,-1.821836,2
2,000001,1991-07-31,29.54,4,16.0,7.828100e+08,1.432695e+09,20.478401,21.082823,-0.130921,0.006092,-0.137013,-0.417680,1,-0.088189,0.807037,3
3,000001,1991-08-31,15.00,5,15.0,6.748338e+08,1.346275e+09,20.329977,21.020607,-0.411588,0.006092,-0.417680,-0.039425,1,-0.090277,23.378197,4
4,000001,1991-09-30,14.50,6,24.0,6.523394e+08,1.301399e+09,20.296075,20.986706,-0.033333,0.006092,-0.039425,0.849080,1,-0.031046,3.358613,5
...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...
709878,605599,2023-08-31,13.56,24,23.0,7.955351e+09,1.054667e+10,22.797111,23.079076,-0.025862,0.001241,-0.027103,0.071030,1,-0.056811,1.040511,23
709879,605599,2023-09-30,14.54,25,20.0,8.530295e+09,1.130889e+10,22.866890,23.148855,0.072271,0.001241,0.071030,-0.048696,1,-0.006930,1.038189,24
709880,605599,2023-10-31,13.85,26,17.0,8.125488e+09,1.077222e+10,22.818272,23.100237,-0.047455,0.001241,-0.048696,-0.027956,1,-0.027562,1.052476,25
709881,605599,2023-11-30,13.48,27,22.0,7.908417e+09,1.048444e+10,22.791193,23.073159,-0.026715,0.001241,-0.027956,0.103358,1,0.000549,1.047631,26


In [43]:
cross_beta2 = cross_beta[cross_beta['data_count'] >= 48].copy()
cross_beta2 = cross_beta2.set_index(['Stkcd', 'month']) # 设置multi-index
cross_beta2


Unnamed: 0_level_0,Unnamed: 1_level_0,price,Rank,Freq,floatingvalue,totalvalue,sizef,sizet,Return,rfmonth,ret,next_ret,w,MKT,beta,data_count
Stkcd,month,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1,Unnamed: 7_level_1,Unnamed: 8_level_1,Unnamed: 9_level_1,Unnamed: 10_level_1,Unnamed: 11_level_1,Unnamed: 12_level_1,Unnamed: 13_level_1,Unnamed: 14_level_1,Unnamed: 15_level_1,Unnamed: 16_level_1
000001,1995-04-30,9.69,49,20.0,2.881531e+09,4.177055e+09,21.781588,22.152872,-0.099442,0.008719,-0.108161,0.002633,1,-0.113023,0.632808,48
000001,1995-05-31,9.80,50,22.0,2.914242e+09,4.224473e+09,21.792876,22.164160,0.011352,0.008719,0.002633,-0.073005,1,0.166736,0.628513,49
000001,1995-06-30,9.17,51,22.0,2.726898e+09,3.952899e+09,21.726430,22.097715,-0.064286,0.008719,-0.073005,-0.011991,1,-0.094741,0.628738,50
000001,1995-07-31,9.14,52,21.0,2.717977e+09,3.939967e+09,21.723154,22.094438,-0.003272,0.008719,-0.011991,0.048174,1,0.059671,0.628341,51
000001,1995-08-31,9.66,53,21.0,2.872610e+09,4.164123e+09,21.778487,22.149772,0.056893,0.008719,0.048174,0.000984,1,0.041040,0.628447,52
...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...
603999,2023-08-31,6.76,93,23.0,3.893760e+09,3.893760e+09,22.082641,22.082641,0.057903,0.001241,0.056662,-0.103312,1,-0.056811,0.987871,61
603999,2023-09-30,6.07,94,20.0,3.496320e+09,3.496320e+09,21.974977,21.974977,-0.102071,0.001241,-0.103312,-0.022658,1,-0.006930,0.998184,61
603999,2023-10-31,5.94,95,17.0,3.421440e+09,3.421440e+09,21.953327,21.953327,-0.021417,0.001241,-0.022658,0.087985,1,-0.027562,1.013549,61
603999,2023-11-30,6.47,96,22.0,3.726720e+09,3.726720e+09,22.038794,22.038794,0.089226,0.001241,0.087985,0.060583,1,0.000549,1.031205,61


windows+R,输入cmd

输入：

pip install linearmodels

Anaconda:

conda install linearmodels

In [47]:
from linearmodels import FamaMacBeth
model = FamaMacBeth.from_formula('next_ret ~ 1 + beta', data=cross_beta2.dropna(subset=['next_ret','beta']))
## 一般fm回归结果展示的是Newey-West调整后的t值，.fit()中做如下设置
## 其中`bandwidth`是Newey-West滞后阶数，选取方式为lag = 4(T/100) ^ (2/9)
## 若不需要Newey-West调整则去掉括号内所有设置。
# choose bandwidth auto
res = model.fit(cov_type= 'kernel',debiased = False,bandwidth=6)
print(res.summary)

                            FamaMacBeth Estimation Summary                           
Dep. Variable:                    next_ret   R-squared:                       -0.0006
Estimator:                     FamaMacBeth   R-squared (Between):             -0.0768
No. Observations:                   492716   R-squared (Within):               0.0001
Date:                     Tue, Nov 26 2024   R-squared (Overall):             -0.0006
Time:                             13:38:25   Log-likelihood                  2.71e+05
Cov. Estimator:    Fama-MacBeth Kernel Cov                                           
                                             F-statistic:                     -309.64
Entities:                             3786   P-value                           1.0000
Avg Obs:                            130.14   Distribution:                F(1,492714)
Min Obs:                            1.0000                                           
Max Obs:                            348.00   F-statist

# My Code for Fama MacBeth

In [48]:
cross_beta2

Unnamed: 0_level_0,Unnamed: 1_level_0,price,Rank,Freq,floatingvalue,totalvalue,sizef,sizet,Return,rfmonth,ret,next_ret,w,MKT,beta,data_count
Stkcd,month,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1,Unnamed: 7_level_1,Unnamed: 8_level_1,Unnamed: 9_level_1,Unnamed: 10_level_1,Unnamed: 11_level_1,Unnamed: 12_level_1,Unnamed: 13_level_1,Unnamed: 14_level_1,Unnamed: 15_level_1,Unnamed: 16_level_1
000001,1995-04-30,9.69,49,20.0,2.881531e+09,4.177055e+09,21.781588,22.152872,-0.099442,0.008719,-0.108161,0.002633,1,-0.113023,0.632808,48
000001,1995-05-31,9.80,50,22.0,2.914242e+09,4.224473e+09,21.792876,22.164160,0.011352,0.008719,0.002633,-0.073005,1,0.166736,0.628513,49
000001,1995-06-30,9.17,51,22.0,2.726898e+09,3.952899e+09,21.726430,22.097715,-0.064286,0.008719,-0.073005,-0.011991,1,-0.094741,0.628738,50
000001,1995-07-31,9.14,52,21.0,2.717977e+09,3.939967e+09,21.723154,22.094438,-0.003272,0.008719,-0.011991,0.048174,1,0.059671,0.628341,51
000001,1995-08-31,9.66,53,21.0,2.872610e+09,4.164123e+09,21.778487,22.149772,0.056893,0.008719,0.048174,0.000984,1,0.041040,0.628447,52
...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...
603999,2023-08-31,6.76,93,23.0,3.893760e+09,3.893760e+09,22.082641,22.082641,0.057903,0.001241,0.056662,-0.103312,1,-0.056811,0.987871,61
603999,2023-09-30,6.07,94,20.0,3.496320e+09,3.496320e+09,21.974977,21.974977,-0.102071,0.001241,-0.103312,-0.022658,1,-0.006930,0.998184,61
603999,2023-10-31,5.94,95,17.0,3.421440e+09,3.421440e+09,21.953327,21.953327,-0.021417,0.001241,-0.022658,0.087985,1,-0.027562,1.013549,61
603999,2023-11-30,6.47,96,22.0,3.726720e+09,3.726720e+09,22.038794,22.038794,0.089226,0.001241,0.087985,0.060583,1,0.000549,1.031205,61


In [51]:
import statsmodels.api as sm 

def Fama_MacBeth(data, yvar, xvars):
    Y = data[yvar]
    X = data[xvars]
    X['intercept'] = 1.
    result = sm.OLS(Y, X).fit()
    return result.params

coef = cross_beta2.dropna(subset=['next_ret','beta']).groupby('month').apply(Fama_MacBeth, 'next_ret', ['beta'])
coef

Unnamed: 0_level_0,beta,intercept
month,Unnamed: 1_level_1,Unnamed: 2_level_1
1994-12-31,-0.002211,-0.155057
1995-01-31,0.083434,-0.117265
1995-02-28,-0.102031,0.211704
1995-03-31,-0.146752,0.030613
1995-04-30,0.197278,0.046645
...,...,...
2023-07-31,-0.014799,-0.033648
2023-08-31,-0.017810,0.022992
2023-09-30,0.015563,-0.025496
2023-10-31,-0.002676,0.035143


In [57]:
model_alpha = smf.ols('intercept ~ 1',
                 data=coef['1994-12':'2023-11']).fit(
                     cov_type='HAC', cov_kwds={'maxlags': 6})
print(model_alpha.summary())

                            OLS Regression Results                            
Dep. Variable:              intercept   R-squared:                       0.000
Model:                            OLS   Adj. R-squared:                  0.000
Method:                 Least Squares   F-statistic:                       nan
Date:                Tue, 26 Nov 2024   Prob (F-statistic):                nan
Time:                        13:41:16   Log-Likelihood:                 321.50
No. Observations:                 348   AIC:                            -641.0
Df Residuals:                     347   BIC:                            -637.1
Df Model:                           0                                         
Covariance Type:                  HAC                                         
                 coef    std err          z      P>|z|      [0.025      0.975]
------------------------------------------------------------------------------
Intercept      0.0169      0.007      2.562      0.0

In [58]:
model_beta = smf.ols('beta ~ 1',
                 data=coef['1994-12':'2023-11']).fit(
                     cov_type='HAC', cov_kwds={'maxlags': 6})
print(model_beta.summary())

                            OLS Regression Results                            
Dep. Variable:                   beta   R-squared:                       0.000
Model:                            OLS   Adj. R-squared:                  0.000
Method:                 Least Squares   F-statistic:                       nan
Date:                Tue, 26 Nov 2024   Prob (F-statistic):                nan
Time:                        13:41:39   Log-Likelihood:                 588.86
No. Observations:                 348   AIC:                            -1176.
Df Residuals:                     347   BIC:                            -1172.
Df Model:                           0                                         
Covariance Type:                  HAC                                         
                 coef    std err          z      P>|z|      [0.025      0.975]
------------------------------------------------------------------------------
Intercept     -0.0032      0.003     -1.246      0.2