<h1 style="text-align:center; color:#e64500ff;">资本资产定价模型 CAPM</h1>

- 投资者是理性且厌恶风险的：投资者会根据均值-方差准则（Mean-Variance Criterion）来做决策。这意味着他们在给定风险（方差）的情况下追求最大回报（均值），或者在给定回报的情况下追求最小风险。  
$$E[R_{i,t}] - R_{f,t} = \beta_i \left(E[R_{m,t}] - R_{f,t}\right)$$

- $E[R_{i,t}]$ 是个体股票（i）的预期回报率  
- $R_{f,t}$ 是无风险利率
- $E[R_{m,t}]$ 是市场预期回报率
- $\beta_i$ (Beta) 是证券的风险系数。

证券的 $\beta$ 是衡量其系统性风险（Systematic Risk，即与整体市场波动相关的风险）的指标。它表示当市场回报变动 1% 时，该资产预期回报的变动百分比。
$$\beta_i = \frac{\text{Cov}(R_{i,t}, R_{m,t})}{\text{Var}(R_{m,t})}$$
- $\beta > 1$：该资产的波动性高于市场平均水平（例如，科技股）。它被视为进攻型或高风险资产。  
- $0 < \beta < 1$：该资产的波动性低于市场平均水平（例如，公用事业股）。它被视为防御型或低风险资产。
- $\beta < 0$：该资产的回报与市场波动呈反向关系（非常罕见，例如黄金或某些对冲基金在经济衰退期）

In [1]:
import numpy as np
import pandas as pd
import scipy.stats as stats
import scipy
from datetime import datetime
import statsmodels.formula.api as smf

from matplotlib import style
import matplotlib.pyplot as plt
import matplotlib.dates as mdates
from matplotlib.font_manager import FontProperties
from pylab import mpl
import platform

# 根据操作系统设置中文字体
system = platform.system()
if system == 'Windows':
    plt.rcParams['font.sans-serif'] = ['SimHei']  # Windows使用黑体
    plt.rcParams['axes.unicode_minus'] = False  # 解决负号显示问题
elif system == 'Darwin':  # macOS
    plt.rcParams['font.sans-serif'] = ['Arial Unicode MS']  # macOS使用Arial Unicode MS
    plt.rcParams['axes.unicode_minus'] = False
else:  # Linux
    plt.rcParams['font.sans-serif'] = ['WenQuanYi Micro Hei']  # Linux使用文泉驿微米黑
    plt.rcParams['axes.unicode_minus'] = False

# 输出矢量图
%matplotlib inline
%config InlineBackend.figure_format = 'svg'

from IPython.core.interactiveshell import InteractiveShell
InteractiveShell.ast_node_interactivity = 'all'

# 设置pandas显示选项
pd.set_option('display.max_columns', None)

print(f"当前操作系统: {system}")
print(f"字体设置: {plt.rcParams['font.sans-serif']}")

当前操作系统: Windows
字体设置: ['SimHei']


In [2]:
cross = pd.read_csv('./TRD_Mnth202509.csv')
from pandas.tseries.offsets import MonthEnd
# 处理月份格式
cross['month'] = pd.to_datetime(cross['Trdmnt'], format='%Y-%m') + MonthEnd(1)
# 补齐股票代码 如果不满6位 在前面补上0
cross['Stkcd'] = cross['Stkcd'].apply(lambda x: '{:0>6}'.format(x)) # 6位股票代码
# 重命名列以匹配后续代码
cross.rename(columns={'Mretwd': 'Return', 'Msmvosd': 'floatingvalue', 'Msmvttl': 'totalvalue'}, inplace=True)

# 导入无风险利率数据
rf_data = pd.read_csv('./Marketret_mon_stock2024.csv')
# 处理月份格式
rf_data['month'] = pd.to_datetime(rf_data['month'], format='%b %Y') + MonthEnd(1)
# 只保留需要的列
rf_data = rf_data[['month', 'rfmonth']]

# 合并无风险利率到cross
cross = pd.merge(cross, rf_data, on='month', how='left')

# 添加一个序列 用来统计公司上市的第几个月
cross = cross.sort_values(by=['Stkcd', 'month'])
cross['list_month'] = cross.groupby('Stkcd').cumcount() + 1

# 计算超额收益率
cross['ret'] = cross['Return'] - cross['rfmonth']
cross['floatingvalue'] = cross['floatingvalue'] * 1000
cross['totalvalue'] = cross['totalvalue'] * 1000

cross

Unnamed: 0,Stkcd,Trdmnt,Opndt,Mopnprc,Clsdt,Mclsprc,Mnshrtrd,Mnvaltrd,floatingvalue,totalvalue,Ndaytrd,Return,Mretnd,Markettype,Capchgdt,Ahshrtrd_M,Ahvaltrd_M,month,rfmonth,list_month,ret
0,000001,1991-04,3,49.00,30,43.68,13400,6.150000e+05,1.157520e+09,2.118487e+09,20,,,4,1991-04-03,,,1991-04-30,0.006651,1,
1,000001,1991-05,2,43.46,31,38.34,187800,7.675000e+06,1.016010e+09,1.859497e+09,24,-0.122253,0.235714,4,1991-04-03,,,1991-05-31,0.006092,2,-0.128345
2,000001,1991-06,1,38.53,28,33.99,30600,1.094000e+06,9.007350e+08,1.648521e+09,23,-0.113459,-0.113459,4,1991-04-03,,,1991-06-30,0.006092,3,-0.119551
3,000001,1991-07,1,33.65,31,29.54,6100,1.940430e+05,7.828100e+08,1.432695e+09,16,-0.130921,-0.130921,4,1991-04-03,,,1991-07-31,0.006092,4,-0.137013
4,000001,1991-08,1,29.39,31,15.00,3243100,4.957624e+07,6.748338e+08,1.346275e+09,15,-0.411588,-0.411587,4,1991-08-01,,,1991-08-31,0.006092,5,-0.417680
...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...
886908,920992,2025-05,6,20.00,30,20.46,92609592,1.922463e+09,5.707967e+08,1.979115e+09,19,0.051312,0.051715,64,2023-04-25,0.0,0.0,2025-05-31,,32,
886909,920992,2025-06,3,20.40,30,21.26,73152031,1.580767e+09,5.931152e+08,2.056500e+09,20,0.039101,0.039101,64,2023-04-25,0.0,0.0,2025-06-30,,33,
886910,920992,2025-07,1,21.13,31,22.44,96064221,2.126994e+09,6.260351e+08,2.170642e+09,23,0.055503,0.055503,64,2023-04-25,0.0,0.0,2025-07-31,,34,
886911,920992,2025-08,1,22.12,29,21.65,63953451,1.432608e+09,6.039955e+08,2.094225e+09,21,-0.035205,-0.035205,64,2023-04-25,0.0,0.0,2025-08-31,,35,


In [4]:
# 生成下一个月的收益率

# 方法1：补全所有个股的日期序列，再使用shift
# 创建所有日期和股票代码的完整组合
all_months = pd.DataFrame(cross['month'].unique(), columns=['month'])
all_stocks = pd.DataFrame(cross['Stkcd'].unique(), columns=['Stkcd'])
# 创建笛卡尔积 - 所有股票和所有月份的组合
full_index = all_stocks.merge(all_months, how='cross')

# 将原始数据合并到完整索引中
cross_full = full_index.merge(cross, on=['Stkcd', 'month'], how='left')
# 按股票代码和月份排序
cross_full = cross_full.sort_values(['Stkcd', 'month'])
# 使用shift生成下月收益率
cross_full['next_ret'] = cross_full.groupby('Stkcd')['ret'].shift(-1)

# 只保留原始数据存在的行
cross = cross.merge(cross_full[['Stkcd', 'month', 'next_ret']], 
                    on=['Stkcd', 'month'], how='right')

# 添加一个变量 用来统计过去一年的交易日数量之和
cross['Cumsum_tradingday'] = cross.groupby('Stkcd')['Ndaytrd'].transform(lambda x: x.rolling(window=12, min_periods=1).sum())
print("方法1完成：补全日期序列后使用shift")

cross = cross[(cross['month'] >= '1995-01-31') & (cross['month'] <= '2024-12-31')]
cross


方法1完成：补全日期序列后使用shift


Unnamed: 0,Stkcd,Trdmnt,Opndt,Mopnprc,Clsdt,Mclsprc,Mnshrtrd,Mnvaltrd,floatingvalue,totalvalue,Ndaytrd,Return,Mretnd,Markettype,Capchgdt,Ahshrtrd_M,Ahvaltrd_M,month,rfmonth,list_month,ret,next_ret_method2,next_ret,Cumsum_tradingday
49,000001,1995-01,3.0,10.50,27.0,10.54,17488100.0,1.867690e+08,3.134297e+09,4.543463e+09,19.0,0.005725,0.005725,4.0,1994-09-02,,,1995-01-31,0.008719,46.0,-0.002994,-0.000180,-0.000180,247.0
50,000001,1995-02,13.0,10.60,28.0,10.63,8544200.0,9.192790e+07,3.161060e+09,4.582260e+09,12.0,0.008539,0.008539,4.0,1994-09-02,,,1995-02-28,0.008719,47.0,-0.000180,0.003511,0.003511,244.0
51,000001,1995-03,1.0,10.63,31.0,10.76,18810600.0,2.059094e+08,3.199719e+09,4.638299e+09,23.0,0.012230,0.012230,4.0,1994-09-02,,,1995-03-31,0.008719,48.0,0.003511,-0.108161,-0.108161,244.0
52,000001,1995-04,3.0,10.78,28.0,9.69,13235500.0,1.380604e+08,2.881531e+09,4.177055e+09,20.0,-0.099442,-0.099442,4.0,1994-09-02,,,1995-04-30,0.008719,49.0,-0.108161,0.002633,0.002633,243.0
53,000001,1995-05,2.0,9.69,31.0,9.80,53197900.0,6.049730e+08,2.914242e+09,4.224473e+09,22.0,0.011352,0.011352,4.0,1994-09-02,,,1995-05-31,0.008719,50.0,0.002633,-0.073005,-0.073005,244.0
...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...
2453228,920992,2024-08,1.0,8.50,30.0,8.19,14436632.0,1.185912e+08,2.284861e+08,7.922264e+08,22.0,-0.037603,-0.037603,64.0,2023-04-25,0.0,0.0,2024-08-31,0.001118,23.0,-0.038721,0.276049,0.276049,242.0
2453229,920992,2024-09,2.0,8.15,30.0,10.46,18109819.0,1.566140e+08,2.918149e+08,1.011806e+09,19.0,0.277167,0.277167,64.0,2023-04-25,0.0,0.0,2024-09-30,0.001118,24.0,0.276049,0.389895,0.389895,241.0
2453230,920992,2024-10,8.0,13.02,31.0,14.55,86774237.0,1.145494e+09,4.059185e+08,1.407435e+09,18.0,0.391013,0.391013,64.0,2023-04-25,0.0,0.0,2024-10-31,0.001118,25.0,0.389895,0.241494,0.241494,242.0
2453231,920992,2024-11,1.0,14.69,29.0,18.08,130783392.0,2.346534e+09,5.043990e+08,1.748895e+09,21.0,0.242612,0.242612,64.0,2023-04-25,0.0,0.0,2024-11-30,0.001118,26.0,0.241494,-0.243928,-0.243928,241.0


In [None]:
# 读取市场收益率数据
from pandas.tseries.offsets import MonthEnd # 月末
Market_ret = pd.read_csv('./Marketret_mon_stock2024.csv')
Market_ret['month'] = pd.to_datetime(Market_ret['month'], format='%b %Y') + MonthEnd(0)
Market_ret.set_index('month', inplace=True)
Market_ret.sort_index(inplace=True)
Market_ret = Market_ret.drop(columns=['Unnamed: 0'])
Market_ret.rename(columns={'ret': 'MKT'}, inplace=True)
Market_ret

Unnamed: 0_level_0,MarketR,MarketR_e,rfmonth,MKT,ret_e,marketret3,marketret6,marketret12,Q
month,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1,Unnamed: 7_level_1,Unnamed: 8_level_1,Unnamed: 9_level_1
1991-01-31,0.029998,0.036554,0.006930,0.023068,0.029624,-0.084127,-0.305662,0.254049,1991 Q1
1991-02-28,0.010203,0.021860,0.006930,0.003273,0.014930,-0.183573,-0.384745,0.241492,1991 Q1
1991-03-31,-0.099663,-0.060425,0.006930,-0.106593,-0.067355,-0.252928,-0.445049,0.288857,1991 Q1
1991-04-30,-0.079652,-0.031449,0.006651,-0.086303,-0.038100,-0.234776,-0.394937,0.691749,1991 Q2
1991-05-31,-0.074521,0.005375,0.006092,-0.080613,-0.000717,-0.236294,0.181673,1.542701,1991 Q2
...,...,...,...,...,...,...,...,...,...
2024-08-31,-0.031468,-0.033685,0.001118,-0.032586,-0.034803,0.151050,,,2024 Q3
2024-09-30,0.195992,0.227078,0.001118,0.194874,0.225960,0.200147,,,2024 Q3
2024-10-31,-0.003409,0.070220,0.001118,-0.004527,0.069102,0.000228,,,2024 Q4
2024-11-30,0.009724,0.045312,0.001118,0.008606,0.044194,,,,2024 Q4


In [None]:
# 合并市场收益率到个股数据
cross = pd.merge(cross,Market_ret[['MKT']],left_on='month',right_on='month',how='left')
cross

Unnamed: 0,Stkcd,Trdmnt,Opndt,Mopnprc,Clsdt,Mclsprc,Mnshrtrd,Mnvaltrd,floatingvalue,totalvalue,Ndaytrd,Return,Mretnd,Markettype,Capchgdt,Ahshrtrd_M,Ahvaltrd_M,month,rfmonth,list_month,ret,next_ret_method2,next_ret,Cumsum_tradingday,MKT
0,000001,1995-01,3.0,10.50,27.0,10.54,17488100.0,1.867690e+08,3.134297e+09,4.543463e+09,19.0,0.005725,0.005725,4.0,1994-09-02,,,1995-01-31,0.008719,46.0,-0.002994,-0.000180,-0.000180,247.0,-0.125017
1,000001,1995-02,13.0,10.60,28.0,10.63,8544200.0,9.192790e+07,3.161060e+09,4.582260e+09,12.0,0.008539,0.008539,4.0,1994-09-02,,,1995-02-28,0.008719,47.0,-0.000180,0.003511,0.003511,244.0,-0.022778
2,000001,1995-03,1.0,10.63,31.0,10.76,18810600.0,2.059094e+08,3.199719e+09,4.638299e+09,23.0,0.012230,0.012230,4.0,1994-09-02,,,1995-03-31,0.008719,48.0,0.003511,-0.108161,-0.108161,244.0,0.122163
3,000001,1995-04,3.0,10.78,28.0,9.69,13235500.0,1.380604e+08,2.881531e+09,4.177055e+09,20.0,-0.099442,-0.099442,4.0,1994-09-02,,,1995-04-30,0.008719,49.0,-0.108161,0.002633,0.002633,243.0,-0.113023
4,000001,1995-05,2.0,9.69,31.0,9.80,53197900.0,6.049730e+08,2.914242e+09,4.224473e+09,22.0,0.011352,0.011352,4.0,1994-09-02,,,1995-05-31,0.008719,50.0,0.002633,-0.073005,-0.073005,244.0,0.166736
...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...
2112835,920992,2024-08,1.0,8.50,30.0,8.19,14436632.0,1.185912e+08,2.284861e+08,7.922264e+08,22.0,-0.037603,-0.037603,64.0,2023-04-25,0.0,0.0,2024-08-31,0.001118,23.0,-0.038721,0.276049,0.276049,242.0,-0.032586
2112836,920992,2024-09,2.0,8.15,30.0,10.46,18109819.0,1.566140e+08,2.918149e+08,1.011806e+09,19.0,0.277167,0.277167,64.0,2023-04-25,0.0,0.0,2024-09-30,0.001118,24.0,0.276049,0.389895,0.389895,241.0,0.194874
2112837,920992,2024-10,8.0,13.02,31.0,14.55,86774237.0,1.145494e+09,4.059185e+08,1.407435e+09,18.0,0.391013,0.391013,64.0,2023-04-25,0.0,0.0,2024-10-31,0.001118,25.0,0.389895,0.241494,0.241494,242.0,-0.004527
2112838,920992,2024-11,1.0,14.69,29.0,18.08,130783392.0,2.346534e+09,5.043990e+08,1.748895e+09,21.0,0.242612,0.242612,64.0,2023-04-25,0.0,0.0,2024-11-30,0.001118,26.0,0.241494,-0.243928,-0.243928,241.0,0.008606


In [None]:
from tqdm import tqdm

import statsmodels.api as sm
# 估计个股的Beta（方法1：标准滚动回归）
# 定义一个函数来计算滚动回归的贝塔和使用的数据点数量
def rolling_beta_per_stock(data, window_months=60):
    # 计算最小需要的数据点数量（2/3的窗口期）
    min_periods = max(1, int(np.ceil(window_months * 2 / 3)))#最小数据点要求:为了确保回归结果的可靠性，它要求在 60 个月的窗口期内，至少要有 $2/3$ 的月份（即 40 个月）有完整的收益率数据 (ret 和 MKT) 才能进行回归。
    print(f"Beta估计设置:")
    print(f"  窗口期: {window_months} 个月")
    print(f"  最少需要: {min_periods} 个月有数据")
    print("="*60)
    
    betas = []
    months = []
    Stkcds = []
    data_counts = []
    
    # 按股票代码 (Stkcd) 分组，这是为了确保每只股票的历史数据被独立处理。
    grouped = data.groupby('Stkcd')
    
    # 使用tqdm添加进度条
    for Stkcd, group in tqdm(grouped, desc="计算Beta", total=len(grouped)):
        group = group.set_index('month').sort_index()
        end_months = group.index.unique()
        
        for end_month in end_months: #遍历所有截止月份，这是滚动回归的核心循环。
            start_month = end_month - pd.DateOffset(months=window_months)
            window_data = group.loc[start_month:end_month]
            
            # 只有当数据点数量达到最小要求时才进行回归
            if len(window_data) >= min_periods:
                # 删除缺失值
                window_data_clean = window_data.dropna(subset=['ret', 'MKT'])
                
                # 再次检查清理后的数据量
                if len(window_data_clean) >= min_periods:
                    X = sm.add_constant(window_data_clean['MKT'])# X: 市场收益率
                    y = window_data_clean['ret']# y: 个股超额收益率
                    model = sm.OLS(y, X).fit()
                    
                    beta = model.params['MKT']# 提取Beta系数（MKT的系数）
                    betas.append(beta)
                    months.append(end_month)
                    Stkcds.append(Stkcd)
                    data_counts.append(len(window_data_clean))
    
    result_df = pd.DataFrame({'Stkcd': Stkcds, 'month': months, 'beta': betas, 'data_count': data_counts})
    
    print(f"\n估计完成:")
    print(f"  总观测数: {len(result_df)}")
    print(f"  平均数据点数: {result_df['data_count'].mean():.1f}")
    print(f"  数据点数统计:")
    print(result_df['data_count'].describe())
    
    return result_df

# 计算每只股票的滚动贝塔和数据点数量
rolling_betas = rolling_beta_per_stock(cross)

# 打印结果
rolling_betas

方法2：直接使用groupby

In [None]:
from tqdm import tqdm  # 已在上方导入则可删

def rolling_beta_vectorized(data, window_months=60):
    """
    使用向量化操作计算滚动Beta
    适用于: 单只股票数据量较大的场景
    """
    min_periods = max(1, int(np.ceil(window_months * 2 / 3)))
    print(f"Beta估计设置(向量化):")
    print(f"  窗口期: {window_months} 个月")
    print(f"  最少需要: {min_periods} 个月有数据")
    print("="*60)

    data = data.sort_values(['Stkcd', 'month']).dropna(subset=['ret', 'MKT', 'Stkcd', 'month'])

    def calc_beta_for_stock(group):
        group = group.set_index('month').sort_index()
        betas = []
        months = []
        for end_month in group.index:
            start_month = end_month - pd.DateOffset(months=window_months)
            window_data = group.loc[start_month:end_month]
            if len(window_data) >= min_periods:
                try:
                    X = sm.add_constant(window_data['MKT'])
                    y = window_data['ret']
                    model = sm.OLS(y, X).fit()
                    betas.append(model.params['MKT'])
                    months.append(end_month)
                except:
                    pass
        if not betas:  # 空结果直接返回空DF，避免后续concat触发FutureWarning
            return pd.DataFrame(columns=['month', 'beta'])
        return pd.DataFrame({'month': months, 'beta': betas})

    result_list = []
    for stkcd, group in tqdm(data.groupby('Stkcd'), desc="计算Beta"):
        stock_result = calc_beta_for_stock(group)
        if not stock_result.empty:
            stock_result['Stkcd'] = stkcd
            result_list.append(stock_result)

    if result_list:
        result_df = pd.concat(result_list, ignore_index=True)
    else:
        result_df = pd.DataFrame(columns=['Stkcd', 'month', 'beta'])

    print(f"\n估计完成:")
    print(f"  总观测数: {len(result_df)}")
    return result_df[['Stkcd', 'month', 'beta']]


In [None]:
rolling_betas2 = rolling_beta_vectorized(cross, window_months=60)
rolling_betas2

## 估计方法3 并行处理

In [None]:
import pandas as pd
import numpy as np
import multiprocessing

def process_single_stock(args):
    """
    处理单只股票的Beta估计（用于并行处理）
    """
    import statsmodels.api as sm
    
    Stkcd, group_data, window_months, min_periods = args
    
    results = []
    group = group_data.set_index('month').sort_index()
    end_months = group.index.unique()
    
    for end_month in end_months:
        start_month = end_month - pd.DateOffset(months=window_months)
        window_data = group.loc[start_month:end_month]
        
        if len(window_data) >= min_periods:
            window_data_clean = window_data.dropna(subset=['ret', 'MKT'])
            
            if len(window_data_clean) >= min_periods:
                try:
                    X = sm.add_constant(window_data_clean['MKT'])
                    y = window_data_clean['ret']
                    model = sm.OLS(y, X).fit()
                    
                    results.append({
                        'Stkcd': Stkcd,
                        'month': end_month,
                        'beta': model.params['MKT'],
                        'data_count': len(window_data_clean)
                    })
                except:
                    continue
    
    return results


def rolling_beta_parallel(data, window_months=60, n_jobs=None):
    """
    使用多进程并行计算滚动Beta（适配macOS）
    
    参数:
        n_jobs: 并行进程数，默认为CPU核心数-1
    """
    
    # 设置启动方法为'fork'（macOS推荐）
    ctx = multiprocessing.get_context('fork')
    
    min_periods = max(1, int(np.ceil(window_months * 2 / 3)))
    
    if n_jobs is None:
        n_jobs = max(1, mp.cpu_count() - 1)
    
    print(f"Beta估计设置(并行-macOS):")
    print(f"  窗口期: {window_months} 个月")
    print(f"  最少需要: {min_periods} 个月有数据")
    print(f"  并行进程数: {n_jobs}")
    print(f"  启动方法: fork")
    print("="*60)
    
    # 准备任务列表
    grouped = data.groupby('Stkcd')
    tasks = [(Stkcd, group.reset_index(drop=True), window_months, min_periods) 
             for Stkcd, group in grouped]
    
    all_results = []
    
    try:
        # 使用fork context的进程池
        with ctx.Pool(processes=n_jobs) as pool:
            # 使用imap_unordered with tqdm
            results_iter = pool.imap_unordered(process_single_stock, tasks, chunksize=10)
            
            for results in tqdm(results_iter, total=len(tasks), desc="计算Beta"):
                all_results.extend(results)
                
    except Exception as e:
        print(f"并行处理出错: {e}")
        print("提示: 如果仍有问题，建议使用方法2（向量化）")
        return pd.DataFrame()
    
    result_df = pd.DataFrame(all_results)
    
    if len(result_df) > 0:
        result_df = result_df.sort_values(['Stkcd', 'month']).reset_index(drop=True)
        print(f"\n估计完成:")
        print(f"  总观测数: {len(result_df)}")
        if 'data_count' in result_df.columns:
            print(f"  平均数据点数: {result_df['data_count'].mean():.1f}")
    else:
        print("\n警告: 没有生成任何结果")
    
    return result_df


In [None]:
rolling_betas3 = rolling_beta_parallel(cross, window_months=60, n_jobs=4)
rolling_betas3

In [None]:
try:
    from numba import jit
    
    @jit(nopython=True)
    def calc_beta_numba(x, y):
        """使用Numba加速的Beta计算"""
        n = len(x)
        if n < 2:
            return np.nan
        
        x_mean = np.mean(x)
        y_mean = np.mean(y)
        
        cov = np.sum((x - x_mean) * (y - y_mean)) / (n - 1)
        var = np.sum((x - x_mean) ** 2) / (n - 1)
        
        if var == 0:
            return np.nan
        
        return cov / var
    
    
    def rolling_beta_numba(data, window_months=60):
        """
        使用Numba加速的Beta计算
        最快但功能最简单（不返回截距项等统计量）
        """
        min_periods = max(1, int(np.ceil(window_months * 2 / 3)))
        
        print(f"Beta估计设置(Numba加速):")
        print(f"  窗口期: {window_months} 个月")
        print(f"  最少需要: {min_periods} 个月有数据")
        print("="*60)
        
        results = []
        
        grouped = data.groupby('Stkcd')
        for Stkcd, group in tqdm(grouped, desc="计算Beta"):
            group = group.set_index('month').sort_index()
            group_clean = group.dropna(subset=['ret', 'MKT'])
            
            for i in range(len(group_clean)):
                start_idx = max(0, i - window_months + 1)
                window = group_clean.iloc[start_idx:i+1]
                
                if len(window) >= min_periods:
                    beta = calc_beta_numba(
                        window['MKT'].values, 
                        window['ret'].values
                    )
                    
                    if not np.isnan(beta):
                        results.append({
                            'Stkcd': Stkcd,
                            'month': window.index[-1],
                            'beta': beta
                        })
        
        result_df = pd.DataFrame(results)
        print(f"\n估计完成: 总观测数 {len(result_df)}")
        
        return result_df
    
    NUMBA_AVAILABLE = True
except ImportError:
    NUMBA_AVAILABLE = False
    print("⚠️ Numba未安装，无法使用numba加速方案")

In [None]:
if NUMBA_AVAILABLE:
    rolling_betas4 = rolling_beta_numba(cross, window_months=60)

rolling_betas4

## 数据过滤
1. 删除月交易日过少的公司
2. 删除股票过低的公司（低于5元）
3. 删除上市不足6个月的公司
4. 删除前一年不足100个交易日的公司
5. 删除ST的公司！（未处理）
6. 保留0 3 6 开头的公司

In [None]:
# 合并Beta数据：使用股票代码和月份作为匹配键
cross_beta = pd.merge(cross, rolling_betas2, on=("Stkcd", 'month'), how='left')

# 删除Beta缺失的数据：只保留成功估计出Beta的股票
cross_beta = cross_beta.dropna(subset=['beta'])
cross_beta

In [None]:
# 要求每月至少7个交易日，确保数据质量
cross_beta = cross_beta[cross_beta['Ndaytrd'] >= 7]
#删除股价过低的公司（低于5元
cross_beta = cross_beta[cross_beta['Clsdt'] >= 5]
# 删除上市不足6个月的公司
cross_beta = cross_beta[cross_beta['list_month'] > 6]
# 删除前一年不足100个交易日的公司
cross_beta = cross_beta[cross_beta['Cumsum_tradingday'] >= 100]
# Markettype: 1=上证A股, 4=深证A股, 6=创业板
# 排除B股、科创板等特殊市场
cross_beta = cross_beta[(cross_beta['Markettype'] == 1) | 
                        (cross_beta['Markettype'] == 4) | 
                        (cross_beta['Markettype'] == 6)]
cross_beta

In [None]:
# 计算每个月的Beta分位数
fenweishu = pd.DataFrame(
    cross_beta.groupby(['month'])['beta'].quantile([0.1,0.2,0.3,0.4,0.5,0.6,0.7,0.8,0.9]))
fenweishu = fenweishu.reset_index()
fenweishu = fenweishu.pivot_table(index='month',columns='level_1',values='beta')
fenweishu.columns = ['one','two','three','four','five','six','seven','eight','nine']
fenweishu

In [None]:
# 合并分位数数据到个股数据
portfolio = pd.merge(cross_beta,fenweishu,on='month')
portfolio

In [None]:
# 根据t期Beta对股票进行分组
portfolio['sort'] = np.where(
    portfolio['beta'] <= portfolio['one'], 'P1',
    np.where(
        portfolio['beta'] <= portfolio['two'], 'P2',
        np.where(
            portfolio['beta'] <= portfolio['three'], 'P3',
            np.where(
                portfolio['beta'] <= portfolio['four'], 'P4',
                np.where(
                    portfolio['beta'] <= portfolio['five'], 'P5',
                    np.where(
                        portfolio['beta'] <= portfolio['six'], 'P6',
                        np.where(
                            portfolio['beta'] <= portfolio['seven'], 'P7',
                            np.where(
                                portfolio['beta'] <= portfolio['eight'], 'P8',
                                np.where(
                                    portfolio['beta'] <= portfolio['nine'],
                                    'P9', 'Pmax')))))))))
# 只保留同时有流通市值、下月收益率和Beta的记录
portfolio = portfolio.dropna(subset=['floatingvalue','next_ret','beta'])
portfolio

In [None]:
# 计算每个投资组合的加权平均收益率
portfolio = portfolio.dropna(subset=['next_ret','floatingvalue','beta'])
portfolio_beta =  pd.DataFrame(
    portfolio.groupby(['month','sort']).apply(lambda x: np.average(x['next_ret'],weights = x['floatingvalue']),include_groups=False))
portfolio_beta

In [None]:
# 重置索引并调整月份
portfolio_beta = portfolio_beta.reset_index()
portfolio_beta.columns = ['month', 'sort', 'p']
portfolio_beta['month'] = portfolio_beta['month'] + MonthEnd(1)
portfolio_beta

In [None]:
#将分组从行转为列，并构造投资策略
portfolio_beta = portfolio_beta.pivot_table(index='month',
                                            columns='sort',
                                            values='p')
portfolio_beta['My_portfolio'] = portfolio_beta['P1'] - portfolio_beta['Pmax']
portfolio_beta

In [None]:
portfolio_beta = portfolio_beta['1995-01':'2024-12']
portfolio_beta

In [None]:
# 回归分析：检验Beta策略的平均收益率是否显著不为0
# 目标：检验Beta策略（低Beta - 高Beta）的平均收益率是否显著大于0
# 模型：My_portfolio = α + ε
# 其中：α是截距项（平均收益率），ε是误差项
model = smf.ols('My_portfolio ~ 1',
                 data=portfolio_beta['1995-01':'2024-12']).fit(
                     cov_type='HAC', cov_kwds={'maxlags': 6})
print(model.summary())
# 解释：
# - 如果截距项（Intercept）显著大于0，说明低Beta组合收益显著高于高Beta组合
#   这违背了CAPM的预测（CAPM预测高Beta应该获得更高收益）
# - 如果截距项不显著或显著小于0，说明CAPM基本成立
# - HAC标准误考虑了时间序列数据的自相关和异方差问题

In [None]:
# 检验Beta系数的平均值是否显著
# - 如果截距项（Intercept）显著大于0，说明Beta对收益率有显著的正向预测能力
#   这符合CAPM的预测（Beta越高，收益越高）
model_beta = smf.ols('beta ~ 1',
                 data=coef['1994-12':'2024-11']).fit(
                     cov_type='HAC', cov_kwds={'maxlags': 6})
print(model_beta.summary())