# 介绍
提供常用的因子处理操作，如去极值，中性化等

## standardize
- ` jaqs.research.signaldigger.process.standardize(factor_df, index_member=None) `

**简要描述：**

- 横截面z-score标准化

**参数:**

|字段|必选|类型|说明|
|:----    |:---|:----- |-----   |
|factor_df |是|pandas.DataFrame |日期为索引,证券品种为columns的二维因子表格|
|index_member |否|pandas.DataFrame of bool |是否是指数成分股。日期为索引,证券品种为columns的二维bool值表格,True代表该品种在该日期下属于指数成分股。传入该参数,则进行标准化所纳入的样本只有每期横截面上属于对应指数成分股的股票，默认为空|

**返回:**

标准化后的因子

**示例：**

In [1]:
from jaqs.data import DataView
from jaqs.research.signaldigger.process import standardize

# 加载dataview数据集
dv = DataView()
dataview_folder = './data'
dv.load_dataview(dataview_folder)

# z-score标准化
standardize(factor_df = dv.get_ts("pe"), index_member = dv.get_ts("index_member")).head()

Dataview loaded successfully.


symbol,000001.SZ,000002.SZ,000008.SZ,000009.SZ,000027.SZ,000039.SZ,000060.SZ,000061.SZ,000063.SZ,000069.SZ,...,601988.SH,601989.SH,601992.SH,601997.SH,601998.SH,603000.SH,603160.SH,603858.SH,603885.SH,603993.SH
trade_date,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1,Unnamed: 7_level_1,Unnamed: 8_level_1,Unnamed: 9_level_1,Unnamed: 10_level_1,Unnamed: 11_level_1,Unnamed: 12_level_1,Unnamed: 13_level_1,Unnamed: 14_level_1,Unnamed: 15_level_1,Unnamed: 16_level_1,Unnamed: 17_level_1,Unnamed: 18_level_1,Unnamed: 19_level_1,Unnamed: 20_level_1,Unnamed: 21_level_1
20170502,-0.36338,-0.340032,-0.106714,0.152518,-0.266414,0.216918,0.086421,0.857408,-0.411592,-0.343106,...,-0.366394,0.891601,,,-0.361782,0.677455,,,-0.24894,0.13124
20170503,-0.364271,-0.341856,-0.107757,0.15119,-0.268283,0.219121,0.083804,0.85245,-0.412694,-0.344699,...,-0.367529,0.879934,,,-0.363002,0.697502,,,-0.248411,0.128307
20170504,-0.364991,-0.340861,-0.10707,0.154148,-0.2671,0.213994,0.07818,0.849831,-0.412865,-0.344161,...,-0.367343,0.871015,,,-0.363119,0.674523,,,-0.248024,0.118993
20170505,-0.364277,-0.339788,-0.116436,0.142003,-0.266276,0.199128,0.080549,0.857999,-0.412033,-0.343666,...,-0.365914,0.858166,,,-0.362034,0.659895,,,-0.243558,0.114178
20170508,-0.360932,-0.337663,-0.121213,0.133428,-0.265375,0.197282,0.087274,0.87156,-0.408468,-0.340375,...,-0.361849,0.824399,,,-0.358094,0.662941,,,-0.242522,0.121454


## winsorize
- ` jaqs.research.signaldigger.process.winsorize(factor_df, alpha=0.05, index_member=None) `

**简要描述：**

- 横截面去极值

**参数:**

|字段|必选|类型|说明|
|:----    |:---|:----- |-----   |
|factor_df |是|pandas.DataFrame |日期为索引,证券品种为columns的二维因子表格|
|alpha |否|float|去极值的边界，如0.05代表去掉左右两边各2.5%分位的极端值(保留中心部分95%分布的数据)。默认0.05|
|index_member |否|pandas.DataFrame of bool |是否是指数成分股。日期为索引,证券品种为columns的二维bool值表格,True代表该品种在该日期下属于指数成分股。传入该参数,则进行去极值所纳入的样本只有每期横截面上属于对应指数成分股的股票，默认为空|

**返回:**

去极值后的因子

**示例：**

In [3]:
from jaqs.research.signaldigger.process import winsorize

winsorize(factor_df = dv.get_ts("pe"), 
          alpha=0.05,
          index_member = dv.get_ts("index_member")).head()

symbol,000001.SZ,000002.SZ,000008.SZ,000009.SZ,000027.SZ,000039.SZ,000060.SZ,000061.SZ,000063.SZ,000069.SZ,...,601988.SH,601989.SH,601992.SH,601997.SH,601998.SH,603000.SH,603160.SH,603858.SH,603885.SH,603993.SH
trade_date,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1,Unnamed: 7_level_1,Unnamed: 8_level_1,Unnamed: 9_level_1,Unnamed: 10_level_1,Unnamed: 11_level_1,Unnamed: 12_level_1,Unnamed: 13_level_1,Unnamed: 14_level_1,Unnamed: 15_level_1,Unnamed: 16_level_1,Unnamed: 17_level_1,Unnamed: 18_level_1,Unnamed: 19_level_1,Unnamed: 20_level_1,Unnamed: 21_level_1
20170502,6.7925,10.0821,42.9544,79.4778,20.4542,88.5511,70.1653,178.7903,0.0,9.649,...,6.3679,183.6078,,,7.0177,153.4365,,,22.9161,76.48
20170503,6.7697,9.9035,42.6314,78.8332,20.1893,88.3302,69.4123,176.8719,0.0,9.506,...,6.3143,180.7143,,,6.9472,155.2097,,,22.9674,75.634
20170504,6.6405,9.9876,42.4161,78.649,20.2187,86.9501,68.1117,175.1454,0.0,9.5298,...,6.3143,178.0838,,,6.9002,150.8288,,,22.8647,73.7727
20170505,6.557,9.9193,40.586,76.0703,20.0127,83.9137,67.6325,174.3781,0.0,9.3869,...,6.3322,174.4011,,,6.8649,147.178,,,23.1319,72.2499
20170508,6.5114,9.6988,39.3479,74.2284,19.6007,82.9752,67.9063,175.3372,0.0,9.3273,...,6.3858,168.8771,,,6.9002,146.7608,,,22.7311,72.5883


## rank_standardize
- ` jaqs.research.signaldigger.process.rank_standardize(factor_df, index_member=None) `

**简要描述：**

- 排序标准化。将因子处理成横截面上的排序值（升序），并处理到0-1之间——仅保留原因子的顺序特征，剔除分布特征

**参数:**

|字段|必选|类型|说明|
|:----    |:---|:----- |-----   |
|factor_df |是|pandas.DataFrame |日期为索引,证券品种为columns的二维因子表格|
|index_member |否|pandas.DataFrame of bool |是否是指数成分股。日期为索引,证券品种为columns的二维bool值表格,True代表该品种在该日期下属于指数成分股。传入该参数,则进行排序标准化所纳入的样本只有每期横截面上属于对应指数成分股的股票，默认为空|

**返回:**

排序标准化后的因子

**示例：**

In [4]:
from jaqs.research.signaldigger.process import rank_standardize

rank_standardize(factor_df = dv.get_ts("pe"), 
                 index_member = dv.get_ts("index_member")).head()

symbol,000001.SZ,000002.SZ,000008.SZ,000009.SZ,000027.SZ,000039.SZ,000060.SZ,000061.SZ,000063.SZ,000069.SZ,...,601988.SH,601989.SH,601992.SH,601997.SH,601998.SH,603000.SH,603160.SH,603858.SH,603885.SH,603993.SH
trade_date,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1,Unnamed: 7_level_1,Unnamed: 8_level_1,Unnamed: 9_level_1,Unnamed: 10_level_1,Unnamed: 11_level_1,Unnamed: 12_level_1,Unnamed: 13_level_1,Unnamed: 14_level_1,Unnamed: 15_level_1,Unnamed: 16_level_1,Unnamed: 17_level_1,Unnamed: 18_level_1,Unnamed: 19_level_1,Unnamed: 20_level_1,Unnamed: 21_level_1
20170502,0.063545,0.117057,0.722408,0.886288,0.361204,0.90301,0.862876,0.966555,0.0,0.107023,...,0.053512,0.9699,,,0.070234,0.943144,,,0.408027,0.876254
20170503,0.063545,0.113712,0.722408,0.886288,0.354515,0.90301,0.859532,0.966555,0.0,0.100334,...,0.053512,0.9699,,,0.06689,0.939799,,,0.408027,0.87291
20170504,0.063545,0.113712,0.725753,0.886288,0.35786,0.90301,0.852843,0.963211,0.0,0.100334,...,0.053512,0.9699,,,0.06689,0.939799,,,0.408027,0.87291
20170505,0.063545,0.113712,0.712375,0.882943,0.351171,0.90301,0.859532,0.963211,0.0,0.100334,...,0.053512,0.966555,,,0.070234,0.943144,,,0.424749,0.87291
20170508,0.060201,0.103679,0.719064,0.882943,0.331104,0.90301,0.862876,0.9699,0.0,0.09699,...,0.053512,0.963211,,,0.070234,0.946488,,,0.421405,0.879599


## get_disturbed_factor
- ` jaqs.research.signaldigger.process.rank_standardizeget_disturbed_factor(factor_df) `

**简要描述：**

- 将因子值加一个极小的扰动项,用于对quantile分组做区分

**参数:**

|字段|必选|类型|说明|
|:----    |:---|:----- |-----   |
|factor_df |是|pandas.DataFrame |日期为索引,证券品种为columns的二维因子表格|

**返回:**

加扰动项后的因子

**示例：**

In [5]:
from jaqs.research.signaldigger.process import get_disturbed_factor

get_disturbed_factor(factor_df = dv.get_ts("pe")).head()

symbol,000001.SZ,000002.SZ,000008.SZ,000009.SZ,000027.SZ,000039.SZ,000060.SZ,000061.SZ,000063.SZ,000069.SZ,...,601988.SH,601989.SH,601992.SH,601997.SH,601998.SH,603000.SH,603160.SH,603858.SH,603885.SH,603993.SH
trade_date,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1,Unnamed: 7_level_1,Unnamed: 8_level_1,Unnamed: 9_level_1,Unnamed: 10_level_1,Unnamed: 11_level_1,Unnamed: 12_level_1,Unnamed: 13_level_1,Unnamed: 14_level_1,Unnamed: 15_level_1,Unnamed: 16_level_1,Unnamed: 17_level_1,Unnamed: 18_level_1,Unnamed: 19_level_1,Unnamed: 20_level_1,Unnamed: 21_level_1
20170502,6.7925,10.0821,42.9544,79.4778,20.4542,88.5511,70.1653,178.7903,4.251335e-10,9.649,...,6.3679,183.6078,32.7886,9.8565,7.0177,153.4365,50.8349,31.0157,22.9161,76.48
20170503,6.7697,9.9035,42.6314,78.8332,20.1893,88.3302,69.4123,176.8719,3.782655e-10,9.506,...,6.3143,180.7143,30.245,9.8817,6.9472,155.2097,50.7259,31.0311,22.9674,75.634
20170504,6.6405,9.9876,42.4161,78.649,20.2187,86.9501,68.1117,175.1454,4.255973e-10,9.5298,...,6.3143,178.0838,31.4771,9.8188,6.9002,150.8288,50.3727,30.6805,22.8647,73.7727
20170505,6.557,9.9193,40.586,76.0703,20.0127,83.9137,67.6325,174.3781,9.41172e-10,9.3869,...,6.3322,174.4011,30.8809,9.5609,6.8649,147.178,49.3963,30.2527,23.1319,72.2499
20170508,6.5114,9.6988,39.3479,74.2284,19.6007,82.9752,67.9063,175.3372,4.294182e-10,9.3273,...,6.3858,168.8771,27.9399,9.3282,6.9002,146.7608,50.3779,29.5167,22.7311,72.5883


## neutralize
- ` jaqs.research.signaldigger.process.neutralize(factor_df,group,float_mv=None,index_member=None) `

**简要描述：**

- 对因子做行业、市值中性化

**参数:**

|字段|必选|类型|说明|
|:----    |:---|:----- |-----   |
|factor_df |是|pandas.DataFrame |因子。日期为索引,证券品种为columns的二维表格|
|group |是|pandas.DataFrame |行业分类（也可以是其他分组方式）。日期为索引,证券品种为columns的二维表格,对应每一个品种在某期所属的分类|
|float_mv |否|pandas.DataFrame |流通市值。日期为索引,证券品种为columns的二维表格。默认为空,为空时不进行市值中性化处理|
|index_member |否|pandas.DataFrame of bool |是否是指数成分股。日期为索引,证券品种为columns的二维bool值表格,True代表该品种在该日期下属于指数成分股。传入该参数,则进行行业、市值中性化所纳入的样本只有每期横截面上属于对应指数成分股的股票，默认为空|

**返回:**

行业、市值中性化后的因子

**示例：**

In [10]:
from jaqs.research.signaldigger.process import neutralize

neutralize(factor_df = dv.get_ts("pe"),
           group = dv.get_ts("sw1")).head()

symbol,000001.SZ,000002.SZ,000008.SZ,000009.SZ,000027.SZ,000039.SZ,000060.SZ,000061.SZ,000063.SZ,000069.SZ,...,601988.SH,601989.SH,601992.SH,601997.SH,601998.SH,603000.SH,603160.SH,603858.SH,603885.SH,603993.SH
trade_date,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1,Unnamed: 7_level_1,Unnamed: 8_level_1,Unnamed: 9_level_1,Unnamed: 10_level_1,Unnamed: 11_level_1,Unnamed: 12_level_1,Unnamed: 13_level_1,Unnamed: 14_level_1,Unnamed: 15_level_1,Unnamed: 16_level_1,Unnamed: 17_level_1,Unnamed: 18_level_1,Unnamed: 19_level_1,Unnamed: 20_level_1,Unnamed: 21_level_1
20170502,-2.662629,-7.78223,-27.98201,-38.3335,-4.083109,17.61469,-83.013425,107.385838,-168.217857,-8.21533,...,-3.087229,26.9125,9.87725,0.401371,-2.437429,108.584833,3.357346,-55.150405,-6.266428,-76.698725
20170503,-2.682662,-7.82996,-28.76077,-39.5072,-3.544909,16.93803,-83.589442,105.819463,-168.313357,-8.22746,...,-3.138062,24.5886,8.60545,0.429338,-2.505162,110.440158,3.3304,-55.949523,-6.084489,-77.367742
20170504,-2.815043,-7.73389,-28.67189,-38.7479,-4.016945,15.86211,-82.4298,104.271488,-168.140586,-8.19169,...,-3.141243,26.910367,9.39235,0.363257,-2.555343,106.489883,3.025662,-55.572859,-6.116911,-76.7688
20170505,-2.762233,-7.653145,-28.89854,-37.39795,-3.835882,14.42916,-82.012883,103.912075,-167.959957,-8.185545,...,-2.987033,27.8539,9.18745,0.241667,-2.454333,103.30295,2.387592,-55.251945,-5.618411,-77.395483
20170508,-2.564538,-7.59114,-29.02696,-36.1053,-3.576855,14.60034,-80.613975,104.639975,-167.807429,-7.96264,...,-2.690138,30.356133,7.6859,0.252262,-2.175738,102.756587,4.286408,-54.397259,-5.803628,-75.931975
