# 第一章：准备工作 (Preparation)

In [1]:
import numpy as np # 数据处理最重要的模块
import pandas as pd # 数据处理最重要的模块

from IPython.core.interactiveshell import InteractiveShell # jupyter运行输出的模块

#显示每一个运行结果
InteractiveShell.ast_node_interactivity = 'all'

#设置行不限制数量
#pd.set_option('display.max_rows',None)

#设置列不限制数量
pd.set_option('display.max_columns', None)

# 第二章：数据导入 (Data Import)

## 常用的 Jupyter/IPython 魔法命令（速查）

- 魔法命令分为两类：行魔法（以 % 开头，作用于单行）与单元魔法（以 %% 开头，作用于整个单元）。使用 `%lsmagic` 可列出当前环境可用的所有魔法，`%quickref` 查看速查帮助，任意魔法后加 `?` 可看用法说明。

### 计时与性能分析
- %time / %%time：测量单行或整个单元格的运行耗时，快速获得一次性计时结果。
- %timeit / %%timeit：自动多次运行并给出统计（如均值、标准差），适合更可靠的微基准测试。
- %prun：基于 cProfile 的函数级性能分析，查看不同函数的耗时构成。
- 可选扩展：%lprun（逐行耗时，需要 line_profiler 扩展）与 %memit（内存峰值，需要 memory_profiler 扩展），需先 `%load_ext line_profiler` 或 `%load_ext memory_profiler` 才能使用。

### 调试与异常
- %debug：在异常发生后进入交互式调试器（pdb）。
- %pdb：切换在异常发生时自动进入调试器的开关。
- %xmode：设置异常信息的详细程度（Plain/Context/Verbose）。

### 运行、导入/导出与历史
- %run script.py：在当前内核中运行外部 Python 脚本或 Notebook 片段（支持参数）。
- %load 路径或URL：将外部脚本或网页片段载入到当前单元格中以便编辑/运行。
- %%writefile 文件名：把整个单元格内容写入文件（常用来快速生成脚本/配置）。
- %history：查看输入历史，支持筛选/输出到文件等选项。
- %save：将历史中的某些输入行保存到文件。

### 变量与会话管理
- %who / %who_ls / %whos：列出当前命名空间中的变量，`%whos` 显示更详细的类型与大小信息。
- %reset / %xdel：清空（或选择性清理）交互命名空间、彻底删除对象并尝试释放内存。
- %store 变量名：在不同会话之间持久化/恢复变量（轻量跨会话传递）。

### 环境与系统
- %pwd / %cd：查看或切换当前工作目录。
- %env：查看、设置环境变量（如 `%env MY_VAR=value`）。
- %config：查看或设置各类交互式选项（如绘图后端、显示精度等）。

### 显示与可视化
- %matplotlib inline（或 widget 等）：设置 Matplotlib 的绘图后端（inline 将图嵌入输出）。
- %%html / %%latex / %%javascript / %%svg：直接在输出中渲染 HTML、LaTeX、JS、SVG 内容。
- %%capture：捕获单元格的标准输出/错误输出（可赋给变量以便后续处理）。

### Shell/脚本与多语言
- %%bash / %%sh / %%script：在整个单元格中以指定脚本语言运行（最常用的是 bash/sh）。

### 扩展与生态
- %load_ext / %unload_ext / %reload_ext：加载、卸载或重载 IPython 扩展，许多第三方扩展会提供额外的魔法命令（如上面的 line_profiler/memory_profiler）。

提示：不同内核或环境中可用的魔法可能略有差异，先用 `%lsmagic` 检查你当前会话支持的命令集合。

In [2]:
%%time
data = pd.read_csv('datasets/000001.csv')
data['Day'] = pd.to_datetime(data['Day'],format='%Y/%m/%d')
data.set_index('Day', inplace = True)
data.sort_values(by = ['Day'],axis=0, ascending=True)
data

CPU times: user 11.7 ms, sys: 5.71 ms, total: 17.5 ms
Wall time: 42.8 ms


Unnamed: 0_level_0,Preclose,Open,Highest,Lowest,Close
Day,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1
1990-12-19,,96.050,99.980,95.790,99.980
1990-12-20,99.98,104.300,104.390,99.980,104.390
1990-12-21,104.39,109.070,109.130,103.730,109.130
1990-12-24,109.13,113.570,114.550,109.130,114.550
1990-12-25,114.55,120.090,120.250,114.550,120.250
...,...,...,...,...,...
2025-08-25,3825.759,3848.163,3883.562,3839.972,3883.562
2025-08-26,3883.562,3871.471,3888.599,3859.758,3868.382
2025-08-27,3868.382,3869.612,3887.198,3800.350,3800.350
2025-08-28,3800.35,3796.711,3845.087,3761.422,3843.597


# 第三章：收益率计算基础 (Return Calculation Basics)

股票的收益率的计算一般使用的是 **收盘价** 来计算。


* Raw Return: $R_t = \frac{p_t - p_{t-1}}{p_{t-1}} $
* Log Return: $r_t = log(p_t) - log(p_{t-1})$

自然对数收益率与原始收益率的关系：

$r_t = log(1 + R_t)$

* 在实际数据中，我们常用收盘价来计算收益率。为什么？
* 对数收益率和原始收益率的区别在哪里？分别用在什么样的场景呢？

## 复利的计算
设某资产的的初始值为 $C$, 名义上的年利率为 $r$ ， 但是在一年内分成 $m$ 次付息，理论上每次付息 $C \frac{r}{m}$, 最终的资产净值应为 

$C+C \frac{r}{m} \times m=C(1+r)$; 

但是，因为提前付息，所以提前支付的利息也进入账户增值，从第二次付息开始，支付的利息就超过了 $C \frac{r}{m}$, 使得一年后的净值要高于 $C(1+r)$。 一年后的净值为

$C\left(1+\frac{r}{m}\right)^{m}$

当 $m \rightarrow \infty$ 时，由极限 $\lim _{x \rightarrow+\infty}\left(1+\frac{1}{x}\right)^{x}=e$, 可知

$\lim _{m \rightarrow \infty} C\left(1+\frac{r}{m}\right)^{m}=\lim _{m \rightarrow \infty} C\left[\left(1+\frac{r}{m}\right)^{\frac{m}{r}}\right]^{r}=C e^{r}$

这时 $r$ 称为连续复利， 它也对应某个时间单位（一般是年)， $R=e^{r}-1$ 是连续复利 $r$ 对应的实际利率, $r$ 与 $R$ 的关系为

$R=e^{r}-1, \quad r=\ln (1+R)$

# 第四章：日度收益率计算 (Daily Return Calculation)

## 4.1 数据准备


In [3]:
data_new = data['1995-01':'2025-08'].copy()
data_new['Close'] = pd.to_numeric(data_new['Close'])
data_new['Preclose'] = pd.to_numeric(data_new['Preclose'])
data_new

Unnamed: 0_level_0,Preclose,Open,Highest,Lowest,Close
Day,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1
1995-01-03,647.870,637.720,647.710,630.530,639.880
1995-01-04,639.880,641.900,655.510,638.860,653.810
1995-01-05,653.810,655.380,657.520,645.810,646.890
1995-01-06,646.890,642.750,643.890,636.330,640.760
1995-01-09,640.760,637.520,637.550,625.040,626.000
...,...,...,...,...,...
2025-08-25,3825.759,3848.163,3883.562,3839.972,3883.562
2025-08-26,3883.562,3871.471,3888.599,3859.758,3868.382
2025-08-27,3868.382,3869.612,3887.198,3800.350,3800.350
2025-08-28,3800.350,3796.711,3845.087,3761.422,3843.597


## 4.2 多种日度收益率计算方法


In [4]:
# 计算000001上证指数日收益率 - 方法1：直接使用向量化操作（最推荐的方式）
data_new['Raw_return'] = data_new['Close'] / data_new['Preclose'] - 1
data_new['Log_return'] = np.log(data_new['Close']) - np.log(data_new['Preclose'])

# 方法2：使用pandas的pct_change函数计算收益率（适用于时间序列数据）
# 注意：这种方法需要数据已经按时间排序
data_new['Pct_change_return'] = data_new['Close'].pct_change()

# 方法3：使用apply方法（不推荐，因为速度较慢）
data_new['Apply_return'] = data_new.apply(lambda row: row['Close'] / row['Preclose'] - 1, axis=1)

# 方法4：使用diff和div方法组合（另一种向量化操作）
data_new['Diff_div_return'] = data_new['Close'].diff() / data_new['Close'].shift(1)

# 比较不同方法计算结果的差异
data_new

Unnamed: 0_level_0,Preclose,Open,Highest,Lowest,Close,Raw_return,Log_return,Pct_change_return,Apply_return,Diff_div_return
Day,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1,Unnamed: 7_level_1,Unnamed: 8_level_1,Unnamed: 9_level_1,Unnamed: 10_level_1
1995-01-03,647.870,637.720,647.710,630.530,639.880,-0.012333,-0.012409,,-0.012333,
1995-01-04,639.880,641.900,655.510,638.860,653.810,0.021770,0.021536,0.021770,0.021770,0.021770
1995-01-05,653.810,655.380,657.520,645.810,646.890,-0.010584,-0.010641,-0.010584,-0.010584,-0.010584
1995-01-06,646.890,642.750,643.890,636.330,640.760,-0.009476,-0.009521,-0.009476,-0.009476,-0.009476
1995-01-09,640.760,637.520,637.550,625.040,626.000,-0.023035,-0.023305,-0.023035,-0.023035,-0.023035
...,...,...,...,...,...,...,...,...,...,...
2025-08-25,3825.759,3848.163,3883.562,3839.972,3883.562,0.015109,0.014996,0.015109,0.015109,0.015109
2025-08-26,3883.562,3871.471,3888.599,3859.758,3868.382,-0.003909,-0.003916,-0.003909,-0.003909,-0.003909
2025-08-27,3868.382,3869.612,3887.198,3800.350,3800.350,-0.017587,-0.017743,-0.017587,-0.017587,-0.017587
2025-08-28,3800.350,3796.711,3845.087,3761.422,3843.597,0.011380,0.011315,0.011380,0.011380,0.011380


## 4.3 循环与向量化方法比较


* 注意下面代码的运行，Python靠缩进来判断代码的级别，注意使用。
* 能不用for while等循环就少用

In [5]:
# 方法5：使用for循环计算收益率（不推荐，效率低）
# 这种方法在大数据集上会非常慢，仅作为教学示例

# 创建新列存储结果
if 'Loop_return' not in data_new.columns:
    data_new['Loop_return'] = np.nan

# 使用for循环计算
for i in range(len(data_new)):
    data_new.iloc[i, data_new.columns.get_loc('Loop_return')] = data_new.iloc[i, data_new.columns.get_loc('Close')] / data_new.iloc[i, data_new.columns.get_loc('Preclose')] - 1

# 方法6：使用zip和enumerate组合（比纯for循环更Pythonic）
close_values = data_new['Close'].values
preclose_values = data_new['Preclose'].values
loop_return_values = []

for i, (close, preclose) in enumerate(zip(close_values, preclose_values)):
    if preclose != 0 and not np.isnan(preclose):
        loop_return_values.append(close / preclose - 1)
    else:
        loop_return_values.append(np.nan)

data_new['Loop_return2'] = loop_return_values

# 方法7：使用numpy的向量化操作（高效且简洁）
data_new['Numpy_return'] = (data_new['Close'].values / data_new['Preclose'].values) - 1

# 显示结果
data_new

Unnamed: 0_level_0,Preclose,Open,Highest,Lowest,Close,Raw_return,Log_return,Pct_change_return,Apply_return,Diff_div_return,Loop_return,Loop_return2,Numpy_return
Day,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1,Unnamed: 7_level_1,Unnamed: 8_level_1,Unnamed: 9_level_1,Unnamed: 10_level_1,Unnamed: 11_level_1,Unnamed: 12_level_1,Unnamed: 13_level_1
1995-01-03,647.870,637.720,647.710,630.530,639.880,-0.012333,-0.012409,,-0.012333,,-0.012333,-0.012333,-0.012333
1995-01-04,639.880,641.900,655.510,638.860,653.810,0.021770,0.021536,0.021770,0.021770,0.021770,0.021770,0.021770,0.021770
1995-01-05,653.810,655.380,657.520,645.810,646.890,-0.010584,-0.010641,-0.010584,-0.010584,-0.010584,-0.010584,-0.010584,-0.010584
1995-01-06,646.890,642.750,643.890,636.330,640.760,-0.009476,-0.009521,-0.009476,-0.009476,-0.009476,-0.009476,-0.009476,-0.009476
1995-01-09,640.760,637.520,637.550,625.040,626.000,-0.023035,-0.023305,-0.023035,-0.023035,-0.023035,-0.023035,-0.023035,-0.023035
...,...,...,...,...,...,...,...,...,...,...,...,...,...
2025-08-25,3825.759,3848.163,3883.562,3839.972,3883.562,0.015109,0.014996,0.015109,0.015109,0.015109,0.015109,0.015109,0.015109
2025-08-26,3883.562,3871.471,3888.599,3859.758,3868.382,-0.003909,-0.003916,-0.003909,-0.003909,-0.003909,-0.003909,-0.003909,-0.003909
2025-08-27,3868.382,3869.612,3887.198,3800.350,3800.350,-0.017587,-0.017743,-0.017587,-0.017587,-0.017587,-0.017587,-0.017587,-0.017587
2025-08-28,3800.350,3796.711,3845.087,3761.422,3843.597,0.011380,0.011315,0.011380,0.011380,0.011380,0.011380,0.011380,0.011380


# 第五章 月度收益

## 5.1 月度收益率计算

月度收益率是金融分析中最常用的周期性收益率之一。$t$ 月的收益率使用该月月末的收盘价和上个月$t-1$月末的收盘价来计算：

* 原始收益率 (Raw Return): $R_t = \frac{p_t - p_{t-1}}{p_{t-1}} $
* 对数收益率 (Log Return): $r_t = log(p_t) - log(p_{t-1})$

下面我们将介绍多种计算月度收益率的方法。


### resample方法

In [6]:
# 方法1：使用resample函数计算月度对数收益率并转换为原始收益率
# 这种方法适合对数收益率，因为对数收益率可以直接相加
Month_data1 = data_new.resample('ME')['Log_return'].sum().to_frame(name='Log_return') 
Month_data1['Raw_Return'] = np.exp(Month_data1['Log_return']) - 1

# 添加年月信息便于分析
Month_data1['Year'] = Month_data1.index.year
Month_data1['Month'] = Month_data1.index.month

# 显示结果
Month_data1.head()

Unnamed: 0_level_0,Log_return,Raw_Return,Year,Month
Day,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1
1995-01-31,-0.141139,-0.131631,1995,1
1995-02-28,-0.023979,-0.023694,1995,2
1995-03-31,0.163651,0.177803,1995,3
1995-04-30,-0.109315,-0.103552,1995,4
1995-05-31,0.188901,0.207922,1995,5


In [7]:
# 方法2：使用resample取月末价格计算月度收益率
# 这种方法直接使用月末价格计算收益率，更符合金融实践
Month_data2 = data_new.resample('ME')['Close'].last().to_frame()
Month_data2['Preclose'] = Month_data2['Close'].shift(1)
Month_data2['Raw_return'] = Month_data2['Close'] / Month_data2['Preclose'] - 1
Month_data2['Log_return'] = np.log(Month_data2['Close']) - np.log(Month_data2['Preclose'])

# 添加年月信息
Month_data2['Year'] = Month_data2.index.year
Month_data2['Month'] = Month_data2.index.month

# 显示结果
Month_data2.head()

Unnamed: 0_level_0,Close,Preclose,Raw_return,Log_return,Year,Month
Day,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1
1995-01-31,562.59,,,,1995,1
1995-02-28,549.26,562.59,-0.023694,-0.023979,1995,2
1995-03-31,646.92,549.26,0.177803,0.163651,1995,3
1995-04-30,579.93,646.92,-0.103552,-0.109315,1995,4
1995-05-31,700.51,579.93,0.207922,0.188901,1995,5


### groupy by 方法

In [8]:
# “1990-12-12”日期格式 里面的year年份 month月份 day 直接提出取来
data_new2 = data_new.copy()
data_new2['year'] = data_new2.index.year
data_new2['month'] = data_new2.index.month
data_new2
# 使用的时间、日期格式提取 字符串提出的方式 前四个字符当作年份 6-7字符是月份 提取出来的是字符串 变成数值

Unnamed: 0_level_0,Preclose,Open,Highest,Lowest,Close,Raw_return,Log_return,Pct_change_return,Apply_return,Diff_div_return,Loop_return,Loop_return2,Numpy_return,year,month
Day,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1,Unnamed: 7_level_1,Unnamed: 8_level_1,Unnamed: 9_level_1,Unnamed: 10_level_1,Unnamed: 11_level_1,Unnamed: 12_level_1,Unnamed: 13_level_1,Unnamed: 14_level_1,Unnamed: 15_level_1
1995-01-03,647.870,637.720,647.710,630.530,639.880,-0.012333,-0.012409,,-0.012333,,-0.012333,-0.012333,-0.012333,1995,1
1995-01-04,639.880,641.900,655.510,638.860,653.810,0.021770,0.021536,0.021770,0.021770,0.021770,0.021770,0.021770,0.021770,1995,1
1995-01-05,653.810,655.380,657.520,645.810,646.890,-0.010584,-0.010641,-0.010584,-0.010584,-0.010584,-0.010584,-0.010584,-0.010584,1995,1
1995-01-06,646.890,642.750,643.890,636.330,640.760,-0.009476,-0.009521,-0.009476,-0.009476,-0.009476,-0.009476,-0.009476,-0.009476,1995,1
1995-01-09,640.760,637.520,637.550,625.040,626.000,-0.023035,-0.023305,-0.023035,-0.023035,-0.023035,-0.023035,-0.023035,-0.023035,1995,1
...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...
2025-08-25,3825.759,3848.163,3883.562,3839.972,3883.562,0.015109,0.014996,0.015109,0.015109,0.015109,0.015109,0.015109,0.015109,2025,8
2025-08-26,3883.562,3871.471,3888.599,3859.758,3868.382,-0.003909,-0.003916,-0.003909,-0.003909,-0.003909,-0.003909,-0.003909,-0.003909,2025,8
2025-08-27,3868.382,3869.612,3887.198,3800.350,3800.350,-0.017587,-0.017743,-0.017587,-0.017587,-0.017587,-0.017587,-0.017587,-0.017587,2025,8
2025-08-28,3800.350,3796.711,3845.087,3761.422,3843.597,0.011380,0.011315,0.011380,0.011380,0.011380,0.011380,0.011380,0.011380,2025,8


In [9]:
# 方法3：使用groupby函数按年月分组计算月度收益率
# 首先提取年月信息
data_new3 = data_new.copy()
data_new3['year'] = data_new3.index.year
data_new3['month'] = data_new3.index.month

# 使用groupby按年月分组，然后对每组的对数收益率求和
Month_data3 = data_new3.groupby(['year', 'month'])['Log_return'].sum().to_frame()
Month_data3['Raw_Return'] = np.exp(Month_data3['Log_return']) - 1

# 显示结果
Month_data3

Unnamed: 0_level_0,Unnamed: 1_level_0,Log_return,Raw_Return
year,month,Unnamed: 2_level_1,Unnamed: 3_level_1
1995,1,-0.141139,-0.131631
1995,2,-0.023979,-0.023694
1995,3,0.163651,0.177803
1995,4,-0.109315,-0.103552
1995,5,0.188901,0.207922
...,...,...,...
2025,4,-0.017148,-0.017002
2025,5,0.020662,0.020877
2025,6,0.028547,0.028959
2025,7,0.036707,0.037389


In [10]:
# 方法4：使用apply和lambda函数进行更灵活的分组计算
# 这种方法可以对每个月的数据进行更复杂的操作
Month_data4 = pd.DataFrame(
    data_new3.groupby(['year', 'month'])['Log_return'].apply(lambda x: sum(x)))
Month_data4.columns = ['Log_return']
Month_data4['Raw_Return'] = np.exp(Month_data4['Log_return']) - 1

# 方法5：使用agg函数同时计算多个统计量
Month_data5 = data_new3.groupby(['year', 'month']).agg({
    'Log_return': ['sum', 'mean', 'std', 'count'],
    'Raw_return': ['mean', 'std']
})

# 显示结果
print("方法4结果:")
print(Month_data4.head())
print("\n方法5结果 (包含多个统计量):")
print(Month_data5.head())

方法4结果:
            Log_return  Raw_Return
year month                        
1995 1       -0.141139   -0.131631
     2       -0.023979   -0.023694
     3        0.163651    0.177803
     4       -0.109315   -0.103552
     5        0.188901    0.207922

方法5结果 (包含多个统计量):
           Log_return                           Raw_return          
                  sum      mean       std count       mean       std
year month                                                          
1995 1      -0.141139 -0.007428  0.016251    19  -0.007277  0.016140
     2      -0.023979 -0.001411  0.033003    17  -0.000891  0.033609
     3       0.163651  0.007115  0.023204    23   0.007401  0.023470
     4      -0.109315 -0.005466  0.020374    20  -0.005255  0.020169
     5       0.188901  0.008586  0.077844    22   0.011645  0.083211


# 第六章 季度收益

In [11]:
# 计算季度收益率
# 方法1：使用resample函数的'QE'参数（季度末）
Quarter_data1 = data_new.resample('QE')['Log_return'].sum().to_frame(name='Log_return')
Quarter_data1['Raw_Return'] = np.exp(Quarter_data1['Log_return']) - 1
Quarter_data1['Year'] = Quarter_data1.index.year
Quarter_data1['Quarter'] = Quarter_data1.index.quarter

# 方法2：使用季度末价格计算
Quarter_data2 = data_new.resample('QE')['Close'].last().to_frame()
Quarter_data2['Preclose'] = Quarter_data2['Close'].shift(1)
Quarter_data2['Raw_return'] = Quarter_data2['Close'] / Quarter_data2['Preclose'] - 1
Quarter_data2['Log_return'] = np.log(Quarter_data2['Close']) - np.log(Quarter_data2['Preclose'])

# 显示结果
print("季度对数收益率汇总:")
Quarter_data1
print("\n季度末价格计算的收益率:")
Quarter_data2


季度对数收益率汇总:


Unnamed: 0_level_0,Log_return,Raw_Return,Year,Quarter
Day,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1
1995-03-31,-0.001467,-0.001466,1995,1
1995-06-30,-0.025583,-0.025258,1995,2
1995-09-30,0.135980,0.145660,1995,3
1995-12-31,-0.263130,-0.231358,1995,4
1996-03-31,0.001979,0.001981,1996,1
...,...,...,...,...
2024-09-30,0.117234,0.124383,2024,3
2024-12-31,0.004565,0.004575,2024,4
2025-03-31,-0.004790,-0.004779,2025,1
2025-06-30,0.032061,0.032580,2025,2



季度末价格计算的收益率:


Unnamed: 0_level_0,Close,Preclose,Raw_return,Log_return
Day,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1
1995-03-31,646.920,,,
1995-06-30,630.580,646.9200,-0.025258,-0.025583
1995-09-30,722.430,630.5800,0.145660,0.135980
1995-12-31,555.290,722.4300,-0.231358,-0.263130
1996-03-31,556.390,555.2900,0.001981,0.001979
...,...,...,...,...
2024-09-30,3336.497,2967.4028,0.124383,0.117234
2024-12-31,3351.763,3336.4970,0.004575,0.004565
2025-03-31,3335.746,3351.7630,-0.004779,-0.004790
2025-06-30,3444.426,3335.7460,0.032580,0.032061


# 第七章 年度收益

In [12]:
# 计算年度收益率
# 方法1：使用resample函数的'YE'参数（年末）
Year_data1 = data_new.resample('YE')['Log_return'].sum().to_frame(name='Log_return')
Year_data1['Raw_Return'] = np.exp(Year_data1['Log_return']) - 1

# 方法2：使用年末价格计算
Year_data2 = data_new.resample('YE')['Close'].last().to_frame()
Year_data2['Preclose'] = Year_data2['Close'].shift(1)
Year_data2['Raw_return'] = Year_data2['Close'] / Year_data2['Preclose'] - 1
Year_data2['Log_return'] = np.log(Year_data2['Close']) - np.log(Year_data2['Preclose'])

# 方法3：使用groupby按年分组
data_new4 = data_new.copy()
data_new4['year'] = data_new4.index.year
Year_data3 = data_new4.groupby('year')['Log_return'].sum().to_frame()
Year_data3['Raw_Return'] = np.exp(Year_data3['Log_return']) - 1

# 显示结果
print("年度对数收益率汇总:")
Year_data1
print("\n年末价格计算的收益率:")
Year_data2
print("\n使用groupby计算的年度收益率:")
Year_data3


年度对数收益率汇总:


Unnamed: 0_level_0,Log_return,Raw_Return
Day,Unnamed: 1_level_1,Unnamed: 2_level_1
1995-12-31,-0.1542,-0.142899
1996-12-31,0.501639,0.651425
1997-12-31,0.264019,0.302153
1998-12-31,-0.040505,-0.039695
1999-12-31,0.175423,0.19175
2000-12-31,0.416917,0.517277
2001-12-31,-0.230898,-0.20618
2002-12-31,-0.192575,-0.175167
2003-12-31,0.097735,0.10267
2004-12-31,-0.167233,-0.153997



年末价格计算的收益率:


Unnamed: 0_level_0,Close,Preclose,Raw_return,Log_return
Day,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1
1995-12-31,555.29,,,
1996-12-31,917.02,555.29,0.651425,0.501639
1997-12-31,1194.1,917.02,0.302153,0.264019
1998-12-31,1146.7,1194.1,-0.039695,-0.040505
1999-12-31,1366.58,1146.7,0.19175,0.175423
2000-12-31,2073.48,1366.58,0.517277,0.416917
2001-12-31,1645.97,2073.48,-0.20618,-0.230898
2002-12-31,1357.65,1645.97,-0.175167,-0.192575
2003-12-31,1497.04,1357.65,0.10267,0.097735
2004-12-31,1266.5,1497.04,-0.153997,-0.167233



使用groupby计算的年度收益率:


Unnamed: 0_level_0,Log_return,Raw_Return
year,Unnamed: 1_level_1,Unnamed: 2_level_1
1995,-0.1542,-0.142899
1996,0.501639,0.651425
1997,0.264019,0.302153
1998,-0.040505,-0.039695
1999,0.175423,0.19175
2000,0.416917,0.517277
2001,-0.230898,-0.20618
2002,-0.192575,-0.175167
2003,0.097735,0.10267
2004,-0.167233,-0.153997


# 第八章 计算滚动收益率

In [13]:
# 计算滚动收益率（例如：过去30天、60天、90天的收益率 注意这里指的是前30个观测值）
# 这在金融分析中非常常见，用于观察不同时间窗口的收益表现

# 方法1：使用rolling窗口函数计算滚动对数收益率之和
rolling_returns = pd.DataFrame()
for window in [5, 10, 20, 30, 60]:
    # 计算滚动窗口的对数收益率之和
    rolling_log_return = data_new['Log_return'].rolling(window=window).sum()
    # 转换为原始收益率
    rolling_returns[f'Rolling_{window}d_Return'] = np.exp(rolling_log_return) - 1

# 方法2：使用pct_change计算滚动价格变化
rolling_price_returns = pd.DataFrame()
for window in [5, 10, 20, 30, 60]:
    rolling_price_returns[f'Rolling_{window}d_Price_Return'] = data_new['Close'].pct_change(periods=window)

# 显示结果
print("滚动收益率 (基于对数收益率累加):")
print(rolling_returns.tail())
print("\n滚动收益率 (基于价格变化):")
print(rolling_price_returns.tail())


滚动收益率 (基于对数收益率累加):
            Rolling_5d_Return  Rolling_10d_Return  Rolling_20d_Return  \
Day                                                                     
2025-08-25           0.041720            0.064705            0.079386   
2025-08-26           0.037854            0.055229            0.071660   
2025-08-27           0.009065            0.031732            0.051064   
2025-08-28           0.019225            0.048318            0.075671   
2025-08-29           0.008408            0.043594            0.083702   

            Rolling_30d_Return  Rolling_60d_Return  
Day                                                 
2025-08-25            0.103394            0.160143  
2025-08-26            0.103676            0.150627  
2025-08-27            0.084644            0.125628  
2025-08-28            0.092916            0.135781  
2025-08-29            0.091511            0.139592  

滚动收益率 (基于价格变化):
            Rolling_5d_Price_Return  Rolling_10d_Price_Return  \
Day             

In [14]:
# 计算累积收益率
# 累积收益率用于观察长期投资表现，从某个起始点开始累积

# 方法1：使用对数收益率累加后转换
# 这是最准确的方法，特别是对于长期累积
cumulative_returns = pd.DataFrame()
cumulative_returns['Cumulative_Log_Return'] = data_new['Log_return'].cumsum()
cumulative_returns['Cumulative_Return'] = np.exp(cumulative_returns['Cumulative_Log_Return']) - 1

# 方法2：使用cumprod函数直接累乘(1+r)
# 这种方法在金融实践中也很常见
cumulative_returns['Cumulative_Return_Prod'] = (1 + data_new['Raw_return']).cumprod() - 1

# 方法3：使用pandas的累积函数
cumulative_returns['Cumulative_Return_Alt'] = data_new['Raw_return'].add(1).cumprod().sub(1)

# 显示结果
print("不同方法计算的累积收益率:")
cumulative_returns


不同方法计算的累积收益率:


Unnamed: 0_level_0,Cumulative_Log_Return,Cumulative_Return,Cumulative_Return_Prod,Cumulative_Return_Alt
Day,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1
1995-01-03,-0.012409,-0.012333,-0.012333,-0.012333
1995-01-04,0.009127,0.009169,0.009169,0.009169
1995-01-05,-0.001514,-0.001513,-0.001513,-0.001513
1995-01-06,-0.011035,-0.010974,-0.010974,-0.010974
1995-01-09,-0.034340,-0.033757,-0.033757,-0.033757
...,...,...,...,...
2025-08-25,1.790819,4.994361,4.994361,4.994361
2025-08-26,1.786903,4.970931,4.970931,4.970931
2025-08-27,1.769160,4.865922,4.865922,4.865922
2025-08-28,1.780475,4.932675,4.932675,4.932675


# 收益率计算方法总结

## 日度收益率计算方法
1. **原始收益率 (Raw Return)**
   - 公式：$R_t = \frac{p_t - p_{t-1}}{p_{t-1}} = \frac{p_t}{p_{t-1}} - 1$
   - 实现方法：
     - 向量化操作：`data['Raw_return'] = data['Close'] / data['Preclose'] - 1`
     - pct_change函数：`data['Close'].pct_change()`
     - diff和div组合：`data['Close'].diff() / data['Close'].shift(1)`

2. **对数收益率 (Log Return)**
   - 公式：$r_t = \ln(p_t) - \ln(p_{t-1}) = \ln(\frac{p_t}{p_{t-1}}) = \ln(1 + R_t)$
   - 实现方法：`data['Log_return'] = np.log(data['Close']) - np.log(data['Preclose'])`

## 不同周期收益率计算方法
1. **月度/季度/年度收益率**
   - 对数收益率累加法：`data.resample('ME')['Log_return'].sum()`
   - 期末价格法：`data.resample('ME')['Close'].last().pct_change()`
   - 分组汇总法：`data.groupby(['year', 'month'])['Log_return'].sum()`

2. **滚动收益率**
   - 对数收益率累加：`data['Log_return'].rolling(window=30).sum()`
   - 价格变化法：`data['Close'].pct_change(periods=30)`

3. **累积收益率**
   - 对数收益率累加：`np.exp(data['Log_return'].cumsum()) - 1`
   - 连乘法：`(1 + data['Raw_return']).cumprod() - 1`

## 年化收益率计算方法
1. **对数收益率年化**：`np.exp(total_log_return / total_years) - 1`
2. **累积收益率年化**：`(1 + total_return) ** (1 / total_years) - 1`
3. **几何平均年化**：`daily_return_factor ** 252 - 1`

## 原始收益率与对数收益率的区别
1. **累加性**：对数收益率可以直接累加，原始收益率需要连乘
2. **对称性**：对数收益率在正负方向上更对称
3. **适用场景**：
   - 对数收益率适合长期分析和理论研究
   - 原始收益率更直观，适合短期分析和实际应用