# 介绍
提供多因子处理和组合功能

## orthogonalize
- ` jaqs.research.signaldigger.multi_factor.orthogonalize(factors_dict=None,standardize_type="z_score",winsorization=False,index_member=None) `

**简要描述：**

- 因子间存在较强同质性时，使用施密特正交化方法对因子做正交化处理，用得到的正交化残差作为因子

**参数:**

|字段|必选|类型|说明|
|:----    |:---|:----- |-----   |
|factors_dict|是|dict of pandas.DataFrame | 若干因子组成的字典(dict),形式为:{"factor_name_1":factor_1,"factor_name_2":factor_2}。每个因子值格式为一个pd.DataFrame，索引(index)为date,column为asset|
|standardize_type|否|string| 标准化方法，有"rank"（排序标准化）,"z_score"(z-score标准化)两种（"rank"/"z_score"），默认为"z_score"|
|winsorization|否|bool| 是否对因子执行去极值操作。默认不执行（False）|
|index_member |否|pandas.DataFrame of bool |是否是指数成分股。日期为索引,证券品种为columns的二维bool值表格,True代表该品种在该日期下属于指数成分股。传入该参数,则在对因子进行标准化/去极值操作时所纳入的样本只有每期横截面上属于对应指数成分股的股票，默认为空|


**返回:**

正交化处理后所得的一系列新因子。dict of pandas.DataFrame

**示例：**

In [3]:
from jaqs.data import DataView
from jaqs.research.signaldigger.multi_factor import orthogonalize

# 加载dataview数据集
dv = DataView()
dataview_folder = './data'
dv.load_dataview(dataview_folder)

# 正交化
factors_dict = {signal:dv.get_ts(signal) for signal in ["pb","pe"]}
new_factors = orthogonalize(factors_dict=factors_dict,
                            standardize_type="z_score",
                            winsorization=False,
                            index_member=None)

Dataview loaded successfully.


In [5]:
print(new_factors.keys())
new_factors["pe"].head()

dict_keys(['pb', 'pe'])


symbol,000001.SZ,000002.SZ,000008.SZ,000009.SZ,000027.SZ,000039.SZ,000060.SZ,000061.SZ,000063.SZ,000069.SZ,...,601988.SH,601989.SH,601992.SH,601997.SH,601998.SH,603000.SH,603160.SH,603858.SH,603885.SH,603993.SH
trade_date,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1,Unnamed: 7_level_1,Unnamed: 8_level_1,Unnamed: 9_level_1,Unnamed: 10_level_1,Unnamed: 11_level_1,Unnamed: 12_level_1,Unnamed: 13_level_1,Unnamed: 14_level_1,Unnamed: 15_level_1,Unnamed: 16_level_1,Unnamed: 17_level_1,Unnamed: 18_level_1,Unnamed: 19_level_1,Unnamed: 20_level_1,Unnamed: 21_level_1
20170502,-0.661167,-0.321008,0.1376,0.117712,-0.57073,-0.776328,-0.262312,-0.630645,-0.220092,-0.444949,...,-0.673545,-0.996365,-0.388348,-0.395778,-0.658705,0.429739,3.959548,0.406094,0.19767,0.081429
20170503,-0.657938,-0.326229,0.134788,0.112238,-0.571492,-0.77789,-0.267336,-0.639522,-0.206775,-0.447106,...,-0.671494,-1.002554,-0.425115,-0.389544,-0.65717,0.444861,3.968815,0.413581,0.206661,0.160738
20170504,-0.662247,-0.319974,0.132233,0.112591,-0.570338,-0.779594,-0.27766,-0.641221,-0.204759,-0.444954,...,-0.671135,-1.00123,-0.404384,-0.391661,-0.658434,0.410504,3.948198,0.40208,0.204508,0.137293
20170505,-0.66487,-0.319372,0.097919,0.09068,-0.571798,-0.782509,-0.275168,-0.636404,-0.208189,-0.448794,...,-0.670085,-0.997497,-0.409683,-0.401536,-0.659185,0.395936,3.909582,0.397496,0.228932,0.127569
20170508,-0.665537,-0.328892,0.075998,0.078145,-0.576013,-0.777366,-0.263822,-0.619676,-0.227388,-0.448982,...,-0.667077,-0.982399,-0.452894,-0.410854,-0.656776,0.412096,4.041803,0.375715,0.217575,0.144361


## get_factors_ic_df
- ` jaqs.research.signaldigger.multi_factor.get_factors_ic_df(*args, **kwargs) `

**简要描述：**

-  获取多个因子ic值序列矩阵

**参数:**

|字段|必选|类型|说明|
|:----    |:---|:----- |-----   |
|factors_dict|是|dict of pandas.DataFrame | 若干因子组成的字典(dict),形式为:{"factor_name_1":factor_1,"factor_name_2":factor_2}。每个因子值格式为一个pd.DataFrame，索引(index)为date,column为asset|
|price |是|pandas.DataFrame|因子涉及到的股票的价格数据，用于作为进出场价用于计算收益,日期为索引，股票品种为columns|
|benchmark_price | 否  |pandas.DataFrame or pandas.Series|基准价格，日期为索引。在price参数不为空的情况下，该参数生效，用于计算因子涉及到的股票的持有期**相对收益**--相对基准。默认为空，为空时计算的收益为**绝对收益**。|
|high |否  |pandas.DataFrame|因子涉及到的股票的最高价数据,用于计算持有期潜在最大上涨收益,日期为索引，股票品种为columns,默认为空|
|low |否  |pandas.DataFrame|因子涉及到的股票的最低价数据,用于计算持有期潜在最大下跌收益,日期为索引，股票品种为columns,默认为空|
|group |否  |pandas.DataFrame|因子涉及到的股票的分组(行业分类),日期为索引，股票品种为columns,默认为空|
|period |否  |int|持有周期,默认为5,即持有5天|
|n_quantiles |否  |int|根据每日因子值的大小分成n_quantiles组,默认为5,即将因子每天分成5组|
|mask |否  |pandas.DataFrame|一张由bool值组成的表格,日期为索引，股票品种为columns，表示在做因子分析时是否要对某期的某个品种过滤。对应位置为True则**过滤**（剔除）——不纳入因子分析考虑。默认为空，不执行过滤操作|
|can_enter |否  |pandas.DataFrame|一张由bool值组成的表格,日期为索引，股票品种为columns，表示某期的某个品种是否可以买入(进场)。对应位置为True则可以买入。默认为空，任何时间任何品种均可买入|
|can_exit |否  |pandas.DataFrame|一张由bool值组成的表格,日期为索引，股票品种为columns，表示某期的某个品种是否可以卖出(出场)。对应位置为True则可以卖出。默认为空，任何时间任何品种均可卖出|
|forward |否  |bool|收益对齐方式,forward=True则在当期因子下对齐下一期实现的收益；forward=False则在当期实现收益下对齐上一期的因子值。默认为True|
|commission |否 |float|手续费比例,每次换仓收取的手续费百分比,默认为万分之八0.0008|
|ret_type |否 |string|计算何种收益的ic。目前支持的收益类型有return, upside_ret, downside_ret,分别代表固定调仓收益,潜在最大上涨收益,潜在最大下跌收益。默认为return--固定调仓收益|

**返回:**

ic_df 多个因子ic值序列矩阵
类型pd.Dataframe,索引（index）为datetime,columns为各因子名称，与factors_dict中的对应。
如：

```

         BP	　　　     CFP	　　　EP	　　ILLIQUIDITY	REVS20	　　SRMI	　　　VOL20
date
2016-06-24	0.165260	0.002198	0.085632	-0.078074	0.173832	0.214377	0.068445
2016-06-27	0.165537	0.003583	0.063299	-0.048674	0.180890	0.202724	0.081748
2016-06-28	0.135215	0.010403	0.059038	-0.034879	0.111691	0.122554	0.042489
2016-06-29	0.068774	0.019848	0.058476	-0.049971	0.042805	0.053339	0.079592
2016-06-30	0.039431	0.012271	0.037432	-0.027272	0.010902	0.077293	-0.050667
```



**示例：**

In [15]:
from jaqs.research.signaldigger.multi_factor import get_factors_ic_df

factor_ic_df = get_factors_ic_df(factors_dict,
                                 price=dv.get_ts("close_adj"),
                                 high=dv.get_ts("high_adj"), # 可为空
                                 low=dv.get_ts("low_adj"),# 可为空
                                 group=dv.get_ts("sw1"), # 可为空
                                 n_quantiles=5,# quantile分类数
                                 period=5,# 持有期
                                 benchmark_price=dv.data_benchmark, # 基准价格 可不传入，持有期收益（return）计算为绝对收益
                                 commission = 0.0008,
                                 ret_type = 'upside_ret' # 计算最大潜在上涨收益的ic 
                                 )
factor_ic_df.dropna(how="all").head()

Nan Data Count (should be zero) : 0;  Percentage of effective data: 99%
Nan Data Count (should be zero) : 0;  Percentage of effective data: 99%


Unnamed: 0_level_0,Unnamed: 1_level_0,pb,pe
trade_date,group,Unnamed: 2_level_1,Unnamed: 3_level_1
20170503,110000,0.314286,-0.6
20170503,210000,0.214286,-0.261905
20170503,220000,0.021978,-0.032967
20170503,230000,0.1,0.1
20170503,240000,0.20979,0.51049


## combine_factors
- ` jaqs.research.signaldigger.multi_factor.combine_factors(*args, **kwargs) `

**简要描述：**

-  多因子组合——最终合成一个组合因子

**参数:**

|字段|必选|类型|说明|
|:----    |:---|:----- |-----   |
|factors_dict|是|dict of pandas.DataFrame | 若干因子组成的字典(dict),形式为:{"factor_name_1":factor_1,"factor_name_2":factor_2}。每个因子值格式为一个pd.DataFrame，索引(index)为date,column为asset|
|standardize_type|否|string| 标准化方法，有"rank"（排序标准化）,"z_score"(z-score标准化)两种（"rank"/"z_score"），默认为"z_score"|
|winsorization|否|bool| 是否对结果执行去极值操作。默认不执行（False）|
|index_member |否|pandas.DataFrame of bool |是否是指数成分股。日期为索引,证券品种为columns的二维bool值表格,True代表该品种在该日期下属于指数成分股。传入该参数,则在对结果进行标准化/去极值操作时所纳入的样本只有每期横截面上属于对应指数成分股的股票，默认为空|
|weighted_method|否|string | 组合方式，目前支持'equal_weight'(等权合成),'ic_weight'(以某个时间窗口的滚动平均ic为权重), 'ir_weight'(以某个时间窗口的滚动ic_ir为权重), 'max_IR'(最大化上个持有期的ic_ir为目标处理权重)，'max_IC'(最大化上个持有期的ic为目标处理权重)。默认采取'equal_weight'(等权合成)方式。若此处参数不为'equal_weight，则还需配置接下来的props参数|
|props|weighted_method不等于'equal_weight'时必须,否则可以缺省|dict|计算加权合成因子时的必要配置信息。具体配置方式见下|

**props配置参数**

|字段|缺省值|类型|说明|
|:----    |:---|:----- |-----   |
|price |是|pandas.DataFrame|因子涉及到的股票的价格数据，用于作为进出场价用于计算收益,日期为索引，股票品种为columns|
|high |否  |pandas.DataFrame|因子涉及到的股票的最高价数据,用于计算持有期潜在最大上涨收益,日期为索引，股票品种为columns,默认为空|
|low |否  |pandas.DataFrame|因子涉及到的股票的最低价数据,用于计算持有期潜在最大下跌收益,日期为索引，股票品种为columns,默认为空|
|ret_type |否 |string|计算何种收益的ic。目前支持的收益类型有return, upside_ret, downside_ret,分别代表固定调仓收益,潜在最大上涨收益,潜在最大下跌收益。默认为return--固定调仓收益|
|benchmark_price | 否  |pandas.DataFrame or pandas.Series|基准价格，日期为索引。在price参数不为空的情况下，该参数生效，用于计算因子涉及到的股票的持有期**相对收益**--相对基准。默认为空，为空时计算的收益为**绝对收益**。|
|period |否  |int|持有周期,默认为5,即持有5天|
|n_quantiles |否  |int|根据每日因子值的大小分成n_quantiles组,默认为5,即将因子每天分成5组|
|mask |否  |pandas.DataFrame|一张由bool值组成的表格,日期为索引，股票品种为columns，表示在做因子分析时是否要对某期的某个品种过滤。对应位置为True则**过滤**（剔除）——不纳入因子分析考虑。默认为空，不执行过滤操作|
|can_enter |否  |pandas.DataFrame|一张由bool值组成的表格,日期为索引，股票品种为columns，表示某期的某个品种是否可以买入(进场)。对应位置为True则可以买入。默认为空，任何时间任何品种均可买入|
|can_exit |否  |pandas.DataFrame|一张由bool值组成的表格,日期为索引，股票品种为columns，表示某期的某个品种是否可以卖出(出场)。对应位置为True则可以卖出。默认为空，任何时间任何品种均可卖出|
|forward |否  |bool|收益对齐方式,forward=True则在当期因子下对齐下一期实现的收益；forward=False则在当期实现收益下对齐上一期的因子值。默认为True|
|commission |否 |float|手续费比例,每次换仓收取的手续费百分比,默认为万分之八0.0008|
|covariance_type |否 |string|估算协方差矩阵的方法。有'simple'（普通协方差矩阵估算），'shrink'（压缩协方差矩阵估算）两种。默认为'simple'|
|rollback_period |否 |int| 滚动窗口天数。默认为120天|

**返回:**

合成后的新因子

**示例：**

In [25]:
from jaqs.research.signaldigger.multi_factor import combine_factors

props = {
    'price':dv.get_ts("close_adj"),
    'high':dv.get_ts("high_adj"), # 可为空
    'low':dv.get_ts("low_adj"),# 可为空
    'ret_type': 'return',#可选参数还有upside_ret/downside_ret 则组合因子将以优化潜在上行、下行空间为目标
    'benchmark_price': dv.data_benchmark,  # 为空计算的是绝对收益　不为空计算相对收益
    'period': 30, # 30天的持有期
    'forward': True,
    'commission': 0.0008,
    "covariance_type": "shrink",  # 协方差矩阵估算方法 还可以为"simple"
    "rollback_period": 30}  # 滚动窗口天数

comb_factor = combine_factors(factors_dict,
                              standardize_type="rank",
                              winsorization=False,
                              weighted_method="ic_weight",
                              props=props)
    

comb_factor.dropna(how="all").head()

Nan Data Count (should be zero) : 0;  Percentage of effective data: 99%
Nan Data Count (should be zero) : 0;  Percentage of effective data: 99%


symbol,000001.SZ,000002.SZ,000008.SZ,000009.SZ,000027.SZ,000039.SZ,000060.SZ,000061.SZ,000063.SZ,000069.SZ,...,601988.SH,601989.SH,601992.SH,601997.SH,601998.SH,603000.SH,603160.SH,603858.SH,603885.SH,603993.SH
trade_date,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1,Unnamed: 7_level_1,Unnamed: 8_level_1,Unnamed: 9_level_1,Unnamed: 10_level_1,Unnamed: 11_level_1,Unnamed: 12_level_1,Unnamed: 13_level_1,Unnamed: 14_level_1,Unnamed: 15_level_1,Unnamed: 16_level_1,Unnamed: 17_level_1,Unnamed: 18_level_1,Unnamed: 19_level_1,Unnamed: 20_level_1,Unnamed: 21_level_1
20170726,0.957447,0.775076,0.243161,0.118541,0.860182,0.3769,0.197568,0.133739,0.732523,0.884498,...,0.987842,0.328267,0.753799,0.902736,0.975684,0.033435,0.051672,0.422492,0.556231,0.066869
20170727,0.969605,0.765957,0.264438,0.12462,0.863222,0.382979,0.206687,0.139818,0.796353,0.881459,...,0.987842,0.343465,0.74772,0.899696,0.975684,0.024316,0.048632,0.419453,0.534954,0.082067
20170728,0.969605,0.759878,0.273556,0.121581,0.87538,0.398176,0.209726,0.142857,0.775076,0.87234,...,0.987842,0.352584,0.762918,0.896657,0.975684,0.030395,0.048632,0.404255,0.525836,0.091185
20170731,0.966565,0.753799,0.285714,0.121581,0.887538,0.404255,0.203647,0.151976,0.775076,0.869301,...,0.987842,0.3769,0.759878,0.896657,0.975684,0.039514,0.036474,0.395137,0.522796,0.094225
20170801,0.960486,0.753799,0.282675,0.12462,0.887538,0.425532,0.237082,0.167173,0.765957,0.869301,...,0.990881,0.395137,0.738602,0.896657,0.975684,0.045593,0.033435,0.398176,0.510638,0.100304
