# 介绍
dataview可视为一个基于pandas实现的的针对因子场景的数据库,方便因子的设计实现。通过dataview，可以快捷的调取数据，并在数据集上做运算以生成新的数据集。详细描述见[官方用户手册-数据视图](https://www.quantos.org/jaqs/doc.html)

# DataView做什么
将频繁使用的DataFrame操作自动化，使用者操作数据时尽量只考虑业务需求而不是技术实现：

1. 根据字段名，自动从不同的数据api获取数据
2. 按时间、标的整理对齐（财务数据按发布日期对齐）
3. 在已有数据基础上，添加字段、加入自定义数据或根据公式计算新数据
4. 数据查询
5. 本地存储

# 数据下载
dataview目前可以通过jaqs官方提供的免费数据源直接从网络获取行情数据和参考数据

***目前，官方尚未提供可获取数据的详细文档，大鱼金融将随时关注官方在数据文档的最新动态并进行更新***

*** 步骤 ***
1. 配置数据下载的tcp地址(data_config)--使用jaqs官方提供的免费数据源需要提前去官网注册账号,方可使用
2. 创建DataView和DataService
3. 配置待请求的数据参数(props)
4. 数据下载(prepare_data)

In [1]:
from jaqs.data import DataView
from jaqs.data import RemoteDataService # 远程数据服务类

# step 1 其中，username password分别对应官网注册的账号和序列号
data_config = {
"remote.data.address": "tcp://data.tushare.org:8910",
"remote.data.username": "18566262672",
"remote.data.password": "eyJhbGciOiJIUzI1NiJ9.eyJjcmVhdGVfdGltZSI6IjE1MTI3MDI3NTAyMTIiLCJpc3MiOiJhdXRoMCIsImlkIjoiMTg1NjYyNjI2NzIifQ.O_-yR0zYagrLRvPbggnru1Rapk4kiyAzcwYt2a3vlpM"
}

# step 2
ds = RemoteDataService()
ds.init_from_config(data_config)
dv = DataView()

# step 3
props = {'start_date': 20170501, 'end_date': 20171001, 'universe':'000300.SH',
         'fields': "pb,pe,total_oper_rev,oper_exp,sw1",
         'freq': 1}

dv.init_from_config(props, ds)
dv.prepare_data()


Begin: DataApi login 18566262672@tcp://data.tushare.org:8910
    login success 

Initialize config success.


**props参数**

|字段|缺省值|类型|说明|
|:----    |:---|:----- |-----   |
|symbol |不可缺失，symbol与universe二选一  |string |标的代码，多标的以','隔开，如'000001.SH,600300.SH'|
|universe |不可缺失，symbol与universe二选一  |string |指数代码（股票池），将该些指数的成员数据全部请求，多标的以','隔开，如沪深300成分股+上证50成分股'000300.SH,000016.SH'|
|benchmark |默认为universe中设置的第一个指数 |string |基准，可以为指数代码或个股代码，单标的|
|start_date |不可缺省 |int |开始日期|
|end_date |不可缺省 |int |结束日期|
|fields |不可缺省 |string |数据字段，多字段以','隔开，如'open,close,high,low'|
|freq |1 |int |数据类型，目前只支持1，表示日线数据|
|all_price |True |bool |是否默认下载所有日线行情相关数据。默认下载|
|adjust_mode |'post' |string |行情数据复权类型，默认后复权,目前只支持后复权|

# 数据查询

## fields
- ` jaqs.data.Dataview.fields `

**简要描述：**

- 当前dataview中的数据字段
- 说明：
  -  初始请求数据时指定universe，会自动补充index_member(是否是指数成分股)、index_weight(指数成分权重)字段；若universe为多标,取设置的第一个指数为准补充index_member和index_weight
  -  初始请求数据时指定all_price=True,会请求open、high、low、close、vwap及相应复权后的结果open_adj、high_adj、low_adj、close_adj、vwap_adj
  -  初始请求行情相关数据（如fields中包含open、high等字段,或指定all_price=True）,会自动补充trade_status(交易状态-停牌or可交易)
  -  初始请求数据字段中包含季度数据,会自动补充quarter(季度数据对应披露月份)、ann_date(季度数据)字段
  -  初始请求数据字段中包含季度数据,会自动按时间、标的整理对齐一份到日级别上
  -  初始请求数据默认会自动补充adjust_factor(复权因子)

**示例：**

In [2]:
dv.fields

['trade_status',
 'index_weight',
 'close_adj',
 'index_member',
 'oper_exp',
 'low_adj',
 'open',
 'pb',
 'low',
 'vwap',
 'close',
 'high_adj',
 'ann_date',
 'quarter',
 'vwap_adj',
 'sw1',
 'adjust_factor',
 'pe',
 'total_oper_rev',
 'high',
 'open_adj']

## _get_fields
- ` jaqs.data.Dataview._get_fields(field_type, fields) `

**简要描述：**

- 查询众多字段属于某种类型的数据
  - field_type:{'market_daily', 'ref_daily', 'income', 'balance_sheet', 'cash_flow', 'fin_indicator', 'group', 'daily', 'quarterly'
   (对应类型分别为 日行情、日参考数据、收入相关、balance_sheet相关、现金流相关、财务指标相关、行业分类相关、日级别数据、季度级别数据)
  - fields:list of str

**示例：**

In [3]:
dv._get_fields('quarterly',dv.fields) # 查询数据集的字段里有哪些是季度级别数据

['quarter', 'ann_date', 'oper_exp', 'total_oper_rev']

##  symbol
- ` jaqs.data.Dataview.symbol `

**简要描述：**

- 当前dataview中的标的品种

**示例：**

In [4]:
dv.symbol[:2] # 前两只股票

['000001.SZ', '000002.SZ']

## universe
- ` jaqs.data.Dataview.universe `

**简要描述：**

- 当前dataview中股票池的指数代码

**示例：**

In [5]:
dv.universe

['000300.SH']

## benchmark
- ` jaqs.data.Dataview.benchmark `

**简要描述：**

- 当前dataview中的基准代码

**示例：**

In [6]:
dv.benchmark

'000300.SH'

## data_benchmark
- ` jaqs.data.Dataview.data_benchmark `

**简要描述：**

- 当前dataview中的基准的日行情数据

**示例：**

In [7]:
dv.data_benchmark.head()

Unnamed: 0_level_0,close
trade_date,Unnamed: 1_level_1
20170306,3446.484
20170307,3453.9565
20170308,3448.7313
20170309,3426.9438
20170310,3427.8916


## data_inst
- ` jaqs.data.Dataview.data_inst `

**简要描述：**

- 数据集中的证券基础信息

**示例：**

In [8]:
dv.data_inst.head()

Unnamed: 0_level_0,buylot,delist_date,inst_type,list_date,multiplier,name,pricetick,product
symbol,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1,Unnamed: 7_level_1,Unnamed: 8_level_1
600000.SH,100,99999999,1,19991110,1,浦发银行,0.01,
600016.SH,100,99999999,1,20001219,1,民生银行,0.01,
600030.SH,100,99999999,1,20030106,1,中信证券,0.01,
600050.SH,100,99999999,1,20021009,1,中国联通,0.01,
600109.SH,100,99999999,1,19970807,1,国金证券,0.01,


|字段|字段中文名|
|:----    |:---|
|inst_type|	证券类型|
|symbol	|证券代码|
|name	|证券名称|
|list_date	|上市日期|
|delist_date|	退市日期|
|buylot	|最小买入单位|
|pricetick	|最小变动单位|
|product	|合约品种|
|multiplier	|合约乘数|

## dates
- ` jaqs.data.Dataview.dates `

**简要描述：**

- 数据的日期序列

**示例：**

In [9]:
dv.dates[:2] #日期序列前两个

array([20170306, 20170307])

## data_d
- ` jaqs.data.Dataview.data_d`

**简要描述：**

- 日级别数据集

**示例：**

In [10]:
dv.data_d.head()

symbol,000001.SZ,000001.SZ,000001.SZ,000001.SZ,000001.SZ,000001.SZ,000001.SZ,000001.SZ,000001.SZ,000001.SZ,...,603993.SH,603993.SH,603993.SH,603993.SH,603993.SH,603993.SH,603993.SH,603993.SH,603993.SH,603993.SH
field,adjust_factor,ann_date,close,close_adj,high,high_adj,index_member,index_weight,low,low_adj,...,open_adj,oper_exp,pb,pe,quarter,sw1,total_oper_rev,trade_status,vwap,vwap_adj
trade_date,Unnamed: 1_level_2,Unnamed: 2_level_2,Unnamed: 3_level_2,Unnamed: 4_level_2,Unnamed: 5_level_2,Unnamed: 6_level_2,Unnamed: 7_level_2,Unnamed: 8_level_2,Unnamed: 9_level_2,Unnamed: 10_level_2,Unnamed: 11_level_2,Unnamed: 12_level_2,Unnamed: 13_level_2,Unnamed: 14_level_2,Unnamed: 15_level_2,Unnamed: 16_level_2,Unnamed: 17_level_2,Unnamed: 18_level_2,Unnamed: 19_level_2,Unnamed: 20_level_2,Unnamed: 21_level_2
20170306,100.523,20161021.0,9.45,949.94235,9.46,950.94758,1.0,0.008449,9.39,943.91097,...,15.685408,0.0,4.7252,110.7088,9.0,240000,3496036000.0,交易,4.97,15.94
20170307,100.523,20161021.0,9.45,949.94235,9.46,950.94758,1.0,0.008449,9.4,944.9162,...,15.974097,0.0,4.5926,107.6027,9.0,240000,3496036000.0,交易,4.88,15.66
20170308,100.523,20161021.0,9.42,946.92666,9.45,949.94235,1.0,0.008449,9.4,944.9162,...,15.557102,0.0,4.6211,108.2683,9.0,240000,3496036000.0,交易,4.87,15.62
20170309,100.523,20161021.0,9.38,942.90574,9.43,947.93189,1.0,0.008449,9.36,940.89528,...,15.781638,0.0,4.6495,108.9339,9.0,240000,3496036000.0,交易,4.95,15.89
20170310,100.523,20161021.0,9.4,944.9162,9.41,945.92143,1.0,0.008449,9.36,940.89528,...,15.525026,0.0,4.6021,107.8246,9.0,240000,3496036000.0,交易,4.87,15.62


##  data_q
- ` jaqs.data.Dataview.data_q `

**简要描述：**

- 季度级别数据集

**示例：**

In [11]:
dv.data_q.head()

symbol,000001.SZ,000001.SZ,000001.SZ,000001.SZ,000002.SZ,000002.SZ,000002.SZ,000002.SZ,000008.SZ,000008.SZ,...,603858.SH,603858.SH,603885.SH,603885.SH,603885.SH,603885.SH,603993.SH,603993.SH,603993.SH,603993.SH
field,ann_date,oper_exp,quarter,total_oper_rev,ann_date,oper_exp,quarter,total_oper_rev,ann_date,oper_exp,...,quarter,total_oper_rev,ann_date,oper_exp,quarter,total_oper_rev,ann_date,oper_exp,quarter,total_oper_rev
report_date,Unnamed: 1_level_2,Unnamed: 2_level_2,Unnamed: 3_level_2,Unnamed: 4_level_2,Unnamed: 5_level_2,Unnamed: 6_level_2,Unnamed: 7_level_2,Unnamed: 8_level_2,Unnamed: 9_level_2,Unnamed: 10_level_2,Unnamed: 11_level_2,Unnamed: 12_level_2,Unnamed: 13_level_2,Unnamed: 14_level_2,Unnamed: 15_level_2,Unnamed: 16_level_2,Unnamed: 17_level_2,Unnamed: 18_level_2,Unnamed: 19_level_2,Unnamed: 20_level_2,Unnamed: 21_level_2
20140930,,,9,,,,9,,,,...,9,,20151031.0,0.0,9,5092343000.0,,,9,
20150331,,,3,,,,3,,,,...,3,,20160421.0,0.0,3,1956667000.0,,,3,
20150630,,,6,,,,6,,,,...,6,,,,6,,,,6,
20150930,20151023.0,47847000000.0,9,71152000000.0,20151028.0,0.0,9,79596210000.0,20151031.0,0.0,...,9,7774995000.0,20151031.0,0.0,9,6262755000.0,20151030.0,0.0,9,3174664000.0
20151231,20160310.0,67268000000.0,12,96163000000.0,20160314.0,0.0,12,195549100000.0,20160427.0,0.0,...,12,11655630000.0,20160415.0,0.0,12,8158238000.0,20160325.0,0.0,12,4196840000.0


## get
- ` jaqs.data.Dataview.get(symbol="", start_date=0, end_date=0, fields="") `

**简要描述：**

- 综合查询方法：按品种+字段+日期查询数据，返回日期为索引，品种+字段(MultiIndex)为columns的DataFrame

**参数：**

|参数名|必选|类型|说明|
|:----    |:---|:----- |-----   |
|symbol|否 |string|标的代码，多标的以','隔开，如'000001.SH,600300.SH',默认查询数据集中所有标的|
|start_date |否 |int |开始日期，默认从数据集开始日期起|
|end_date |否 |int |结束日期，默认到数据集结束日期|
|fields |否 |string |数据字段，多字段以','隔开，如'open,close,high,low'，默认查询数据集中所有字段|

**示例：**

In [12]:
dv.get("000001.SZ,000002.SZ",fields="open,high").head()

symbol,000001.SZ,000001.SZ,000002.SZ,000002.SZ
field,high,open,high,open
trade_date,Unnamed: 1_level_2,Unnamed: 2_level_2,Unnamed: 3_level_2,Unnamed: 4_level_2
20170502,8.96,8.96,19.37,19.3
20170503,8.93,8.92,19.25,19.2
20170504,8.89,8.89,19.19,18.9
20170505,8.76,8.74,19.07,18.95
20170508,8.62,8.6,18.96,18.89


## get_snapshot
- ` jaqs.data.Dataview.get_snapshot(snapshot_date, symbol="", fields="") `

**简要描述：**

- 切片查询方法：指定日期，按品种+字段查询数据，返回品种为索引，字段为columns的DataFrame

**参数：**

|参数名|必选|类型|说明|
|:----    |:---|:----- |-----   |
|snapshot_date |是 |int |指定查询切片的日期|
|symbol|否 |string|标的代码，多标的以','隔开，如'000001.SH,600300.SH',默认查询数据集中所有标的|
|fields |否 |string |数据字段，多字段以','隔开，如'open,close,high,low'，默认查询数据集中所有字段|

**示例：**

In [13]:
dv.get_snapshot(20170504,fields="open,high").head()

field,high,open
symbol,Unnamed: 1_level_1,Unnamed: 2_level_1
000001.SZ,8.89,8.89
000002.SZ,19.19,18.9
000008.SZ,7.93,7.93
000009.SZ,8.65,8.51
000027.SZ,6.93,6.84


## get_ts
- ` jaqs.data.Dataview.get_ts(field, symbol="", start_date=0, end_date=0) `

**简要描述：**

- 切片查询方法：指定字段，按时间+品种查询数据，返回时间为索引，品种为columns的DataFrame

**参数：**

|参数名|必选|类型|说明|
|:----    |:---|:----- |-----   |
|symbol|否 |string|标的代码，多标的以','隔开，如'000001.SH,600300.SH',默认查询数据集中所有标的|
|start_date |否 |int |开始日期，默认从数据集开始日期起|
|end_date |否 |int |结束日期，默认到数据集结束日期|
|field |是 |string |数据字段,**单字段**|

**示例：**

In [14]:
dv.get_ts("open").head()

symbol,000001.SZ,000002.SZ,000008.SZ,000009.SZ,000027.SZ,000039.SZ,000060.SZ,000061.SZ,000063.SZ,000069.SZ,...,601988.SH,601989.SH,601992.SH,601997.SH,601998.SH,603000.SH,603160.SH,603858.SH,603885.SH,603993.SH
trade_date,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1,Unnamed: 7_level_1,Unnamed: 8_level_1,Unnamed: 9_level_1,Unnamed: 10_level_1,Unnamed: 11_level_1,Unnamed: 12_level_1,Unnamed: 13_level_1,Unnamed: 14_level_1,Unnamed: 15_level_1,Unnamed: 16_level_1,Unnamed: 17_level_1,Unnamed: 18_level_1,Unnamed: 19_level_1,Unnamed: 20_level_1,Unnamed: 21_level_1
20170502,8.96,19.3,8.09,8.4,6.99,16.09,10.33,9.32,17.73,8.15,...,3.58,7.05,8.09,15.7,6.01,14.9,96.55,80.25,22.28,4.41
20170503,8.92,19.2,7.99,8.52,6.95,16.06,10.24,9.32,17.58,8.09,...,3.55,6.96,8.15,15.67,5.97,14.9,97.88,80.6,22.28,4.44
20170504,8.89,18.9,7.93,8.51,6.84,15.96,10.13,9.22,17.78,7.96,...,3.53,6.85,7.58,15.68,5.9,14.8,97.2,80.5,22.4,4.44
20170505,8.74,18.95,7.9,8.54,6.85,15.7,9.91,9.16,17.8,7.98,...,3.52,6.74,7.78,15.56,5.85,14.35,97.0,79.71,22.4,4.3
20170508,8.6,18.89,7.52,8.26,6.79,15.17,9.89,9.1,17.56,7.87,...,3.52,6.59,7.8,15.05,5.83,14.11,94.11,78.31,22.32,4.23


## get_ts_quarter
- ` jaqs.data.Dataview.get_ts_quarter(field, symbol="", start_date=0, end_date=0) `

**简要描述：**

- 切片查询方法：指定字段，按时间+品种查询季度数据，返回报告日期为索引，品种为columns的DataFrame
- 注意：参数中提供的field必须为**季度数据**，可通过Dataview._get_fields('quarterly',Dataview.fields)查询数据集的字段里有哪些是季度级别数据

**参数：**

|参数名|必选|类型|说明|
|:----    |:---|:----- |-----   |
|symbol|否 |string|标的代码，多标的以','隔开，如'000001.SH,600300.SH',默认查询数据集中所有标的|
|start_date |否 |int |开始日期，默认从数据集开始日期起|
|end_date |否 |int |结束日期，默认到数据集结束日期|
|field |是 |string |季度数据字段,**单字段**|

**示例：**

In [15]:
dv.get_ts_quarter('total_oper_rev').head()

symbol,000001.SZ,000002.SZ,000008.SZ,000009.SZ,000027.SZ,000039.SZ,000060.SZ,000061.SZ,000063.SZ,000069.SZ,...,601988.SH,601989.SH,601992.SH,601997.SH,601998.SH,603000.SH,603160.SH,603858.SH,603885.SH,603993.SH
report_date,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1,Unnamed: 7_level_1,Unnamed: 8_level_1,Unnamed: 9_level_1,Unnamed: 10_level_1,Unnamed: 11_level_1,Unnamed: 12_level_1,Unnamed: 13_level_1,Unnamed: 14_level_1,Unnamed: 15_level_1,Unnamed: 16_level_1,Unnamed: 17_level_1,Unnamed: 18_level_1,Unnamed: 19_level_1,Unnamed: 20_level_1,Unnamed: 21_level_1
20140930,,,,,,,,,,,...,,,,,,,,,5092343000.0,
20150331,,,,,,,,,,,...,,,,1698335000.0,,,,,1956667000.0,
20150630,,,,,,,,,,,...,,,,3622176000.0,,,481583300.0,,,
20150930,71152000000.0,79596210000.0,763109200.0,3145235000.0,8676531000.0,45271150000.0,11499570000.0,1280032000.0,68523240000.0,,...,356773000000.0,39772080000.0,25902680000.0,5605255000.0,107453000000.0,1066451000.0,732243600.0,7774995000.0,6262755000.0,3174664000.0
20151231,96163000000.0,195549100000.0,1295076000.0,4895401000.0,11129980000.0,58685800000.0,16965710000.0,1736651000.0,100186400000.0,32236330000.0,...,474321000000.0,59810800000.0,40925340000.0,7705192000.0,145134000000.0,1604762000.0,1119601000.0,11655630000.0,8158238000.0,4196840000.0


# 数据添加

## data_api
- ` jaqs.data.Dataview.data_api `

**简要描述：**

- 数据api(DataService远程数据服务类)

**示例：**

In [16]:
dv.data_api

<jaqs.data.dataservice.RemoteDataService at 0x7f607e4ad278>

## add_comp_info
- ` jaqs.data.Dataview.add_comp_info(index,data_api=None) `

**简要描述：**

- 往数据集里添加两个新字段——symbol是否属于某指数成分股 & symbol在某指数中所占的比重如何
- 区别于通过设置universe初始化默认下载的index_member和index_weight字段, 通过该方法新增的字段可更灵活的查询股票标的与任意指数的关系和权重
- 新增字段名为[index]_member [index]_weight

**参数：**

|参数名|必选|类型|说明|
|:----    |:---|:----- |-----   |
|index |是  |string|指数代码|
|data_api |否  |jaqs.data.dataservice.RemoteDataService|DataService远程数据服务类|

**示例：**

In [17]:
dv.add_comp_info('000016.SH')

In [18]:
dv.get_ts("000016.SH_weight").head()

symbol,000001.SZ,000002.SZ,000008.SZ,000009.SZ,000027.SZ,000039.SZ,000060.SZ,000061.SZ,000063.SZ,000069.SZ,...,601988.SH,601989.SH,601992.SH,601997.SH,601998.SH,603000.SH,603160.SH,603858.SH,603885.SH,603993.SH
trade_date,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1,Unnamed: 7_level_1,Unnamed: 8_level_1,Unnamed: 9_level_1,Unnamed: 10_level_1,Unnamed: 11_level_1,Unnamed: 12_level_1,Unnamed: 13_level_1,Unnamed: 14_level_1,Unnamed: 15_level_1,Unnamed: 16_level_1,Unnamed: 17_level_1,Unnamed: 18_level_1,Unnamed: 19_level_1,Unnamed: 20_level_1,Unnamed: 21_level_1
20170502,,,,,,,,,,,...,0.01881,0.01607,,,0.00459,,,,,
20170503,,,,,,,,,,,...,0.01881,0.01607,,,0.00459,,,,,
20170504,,,,,,,,,,,...,0.01881,0.01607,,,0.00459,,,,,
20170505,,,,,,,,,,,...,0.01881,0.01607,,,0.00459,,,,,
20170508,,,,,,,,,,,...,0.01881,0.01607,,,0.00459,,,,,


## add_field
- ` jaqs.data.Dataview.add_field(field_name, data_api=None) `

**简要描述：**

- 通过远程数据源往数据集里新增新字段（需确保远程数据源中含有该字段的数据）

**参数：**

|参数名|必选|类型|说明|
|:----    |:---|:----- |-----   |
|field_name|是  |string|待新增的字段名|
|data_api |否  |jaqs.data.dataservice.RemoteDataService|DataService远程数据服务类|

**示例：**

In [19]:
dv.add_field("volume")

Query data - query...
NOTE: price adjust method is [post adjust]
Query data - daily fields prepared.
Query data - query...
NOTE: price adjust method is [post adjust]
Query data - daily fields prepared.


True

In [20]:
dv.get_ts("volume").head()

symbol,000001.SZ,000002.SZ,000008.SZ,000009.SZ,000027.SZ,000039.SZ,000060.SZ,000061.SZ,000063.SZ,000069.SZ,...,601988.SH,601989.SH,601992.SH,601997.SH,601998.SH,603000.SH,603160.SH,603858.SH,603885.SH,603993.SH
trade_date,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1,Unnamed: 7_level_1,Unnamed: 8_level_1,Unnamed: 9_level_1,Unnamed: 10_level_1,Unnamed: 11_level_1,Unnamed: 12_level_1,Unnamed: 13_level_1,Unnamed: 14_level_1,Unnamed: 15_level_1,Unnamed: 16_level_1,Unnamed: 17_level_1,Unnamed: 18_level_1,Unnamed: 19_level_1,Unnamed: 20_level_1,Unnamed: 21_level_1
20170502,31102610.0,18442367.0,9684786.0,25497861.0,6432105.0,6327586.0,8965345.0,4378121.0,19238235.0,27143591.0,...,66093550.0,66530694.0,544839719.0,10925965.0,17520651.0,3119911.0,2091792.0,818200.0,926950.0,88555440.0
20170503,28031077.0,26926611.0,7763618.0,18416240.0,10746444.0,6107544.0,13106532.0,4834657.0,34074547.0,38599137.0,...,84611551.0,86231763.0,516934503.0,11688295.0,23738285.0,5159855.0,1346232.0,854976.0,1006828.0,58089924.0
20170504,69651707.0,17846705.0,8177165.0,15236008.0,7574795.0,7298552.0,25223485.0,5413535.0,33992154.0,17511452.0,...,74607306.0,105007184.0,606859878.0,15697719.0,38672243.0,6574210.0,823415.0,1148000.0,1139450.0,72505821.0
20170505,62370085.0,11655082.0,17977630.0,19831245.0,7462745.0,13769005.0,14733040.0,4027676.0,32366194.0,20326215.0,...,175567864.0,139296698.0,318657662.0,15302449.0,29776701.0,7457617.0,1135698.0,1159354.0,2677492.0,69479848.0
20170508,46008989.0,18399614.0,8826250.0,19415168.0,9090216.0,5051955.0,14364021.0,3932185.0,29396110.0,29515727.0,...,138086728.0,144402061.0,383459440.0,14624454.0,28293010.0,5304703.0,1563932.0,1425800.0,1799118.0,47210293.0


## add_formula

- ` jaqs.data.Dataview.add_formula(field_name,formula,is_quarterly,add_data=False,overwrite=True,formula_func_name_style='camel',data_api=None,register_funcs = None,within_index=True)  `


**简要描述：**

- 通过表达式定义因子

**参数：**

|参数名|必选|类型|说明|
|:----    |:---|:----- |-----   |
|field_name|是  |string|自定义的因子名称|
|formula |是  |string|因子表达式|
|is_quarterly |是  |bool| (最终结果)是否是季度数据|
|add_data |否  |bool|是否将最终结果添加到dataview数据集中，默认不添加|
|overwrite |否  |bool|若因子名称(field_name)与数据集中已有的字段冲突，是否覆盖。仅在add_data=True时生效，默认覆盖|
|formula_func_name_style |否 |string {'upper', 'lower'， 'camel'}|表达式中用到的函数名大小写规则,默认为'camel'|
|data_api |否 |jaqs.data.dataservice.RemoteDataService|DataService远程数据服务类，若因子表达式中使用到的字段在当前数据集中没有，会通过该api自动从网络请求相应字段添加到当前数据集当中|
|register_funcs |否 |dict of function|因子表达式中用到的自定义方法所组成的dict,如{"name1":func1，"name2":func2}|
|with_index |否 |bool|执行因子表达式计算的时候 是否只考虑指数成分股。仅在数据集字段中含有index_member时生效, 默认为True|

**示例一：**
直接返回

In [21]:
dv.add_formula("momentum", "Return(close_adj, 20)", is_quarterly=False).head()

symbol,000001.SZ,000002.SZ,000008.SZ,000009.SZ,000027.SZ,000039.SZ,000060.SZ,000061.SZ,000063.SZ,000069.SZ,...,601988.SH,601989.SH,601992.SH,601997.SH,601998.SH,603000.SH,603160.SH,603858.SH,603885.SH,603993.SH
trade_date,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1,Unnamed: 7_level_1,Unnamed: 8_level_1,Unnamed: 9_level_1,Unnamed: 10_level_1,Unnamed: 11_level_1,Unnamed: 12_level_1,Unnamed: 13_level_1,Unnamed: 14_level_1,Unnamed: 15_level_1,Unnamed: 16_level_1,Unnamed: 17_level_1,Unnamed: 18_level_1,Unnamed: 19_level_1,Unnamed: 20_level_1,Unnamed: 21_level_1
20170502,-0.015419,-0.064783,-0.076389,-0.05786,0.007246,-0.011707,-0.07491,-0.154265,0.037647,0.120332,...,-0.027322,-0.034578,0.708075,-0.058859,-0.099548,-0.107944,0.032925,-0.084092,-0.032118,0.008929
20170503,-0.028353,-0.083576,-0.093822,-0.066521,-0.036517,-0.006211,-0.089767,-0.162579,0.050708,0.093151,...,-0.045946,-0.076613,0.633047,-0.101772,-0.119225,-0.105232,0.030166,-0.089233,-0.038296,-0.04355
20170504,-0.051031,-0.075802,-0.113611,-0.085653,-0.057613,-0.043135,-0.126427,-0.179695,0.028852,0.076716,...,-0.040761,-0.109211,0.54386,-0.087668,-0.131657,-0.146903,0.001239,-0.108997,-0.048739,-0.151733
20170505,-0.061957,-0.09139,-0.179543,-0.114684,-0.058172,-0.095238,-0.128748,-0.185484,0.017929,0.05914,...,-0.032787,-0.132199,0.37766,-0.109549,-0.132244,-0.162114,-0.038807,-0.127279,-0.039676,-0.162778
20170508,-0.068478,-0.112019,-0.194934,-0.13612,-0.075,-0.095668,-0.129061,-0.178796,-0.013326,0.04261,...,-0.019231,-0.16188,0.133871,-0.127647,-0.13037,-0.175264,-0.002776,-0.132699,-0.058723,-0.15226


**示例二：**
添加到数据集里，则计算结果之后可以反复调用

In [22]:
dv.add_formula("momentum", "Return(close_adj, 20)", is_quarterly=False, add_data=True).head()

symbol,000001.SZ,000002.SZ,000008.SZ,000009.SZ,000027.SZ,000039.SZ,000060.SZ,000061.SZ,000063.SZ,000069.SZ,...,601988.SH,601989.SH,601992.SH,601997.SH,601998.SH,603000.SH,603160.SH,603858.SH,603885.SH,603993.SH
trade_date,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1,Unnamed: 7_level_1,Unnamed: 8_level_1,Unnamed: 9_level_1,Unnamed: 10_level_1,Unnamed: 11_level_1,Unnamed: 12_level_1,Unnamed: 13_level_1,Unnamed: 14_level_1,Unnamed: 15_level_1,Unnamed: 16_level_1,Unnamed: 17_level_1,Unnamed: 18_level_1,Unnamed: 19_level_1,Unnamed: 20_level_1,Unnamed: 21_level_1
20170502,-0.015419,-0.064783,-0.076389,-0.05786,0.007246,-0.011707,-0.07491,-0.154265,0.037647,0.120332,...,-0.027322,-0.034578,0.708075,-0.058859,-0.099548,-0.107944,0.032925,-0.084092,-0.032118,0.008929
20170503,-0.028353,-0.083576,-0.093822,-0.066521,-0.036517,-0.006211,-0.089767,-0.162579,0.050708,0.093151,...,-0.045946,-0.076613,0.633047,-0.101772,-0.119225,-0.105232,0.030166,-0.089233,-0.038296,-0.04355
20170504,-0.051031,-0.075802,-0.113611,-0.085653,-0.057613,-0.043135,-0.126427,-0.179695,0.028852,0.076716,...,-0.040761,-0.109211,0.54386,-0.087668,-0.131657,-0.146903,0.001239,-0.108997,-0.048739,-0.151733
20170505,-0.061957,-0.09139,-0.179543,-0.114684,-0.058172,-0.095238,-0.128748,-0.185484,0.017929,0.05914,...,-0.032787,-0.132199,0.37766,-0.109549,-0.132244,-0.162114,-0.038807,-0.127279,-0.039676,-0.162778
20170508,-0.068478,-0.112019,-0.194934,-0.13612,-0.075,-0.095668,-0.129061,-0.178796,-0.013326,0.04261,...,-0.019231,-0.16188,0.133871,-0.127647,-0.13037,-0.175264,-0.002776,-0.132699,-0.058723,-0.15226


**示例三：**
通过事先定义并注册一些因子计算中需要的函数方法，完成更高自由度的因子计算

In [23]:
# 定义指数平均计算函数-传入一个时间为索引,股票为columns的Dataframe,计算其指数平均序列
# SMAtoday=m/n * Pricetoday + ( n-m )/n * SMAyesterday;
def sma(df, n, m):
    a = n / m - 1
    r = df.ewm(com=a, axis=0, adjust=False)
    return r.mean()

dv.add_formula("double_SMA","SMA(SMA(close_adj,3,1),3,1)",
               is_quarterly=False,
               add_data=True,
               register_funcs={"SMA":sma}).head()

symbol,000001.SZ,000002.SZ,000008.SZ,000009.SZ,000027.SZ,000039.SZ,000060.SZ,000061.SZ,000063.SZ,000069.SZ,...,601988.SH,601989.SH,601992.SH,601997.SH,601998.SH,603000.SH,603160.SH,603858.SH,603885.SH,603993.SH
trade_date,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1,Unnamed: 7_level_1,Unnamed: 8_level_1,Unnamed: 9_level_1,Unnamed: 10_level_1,Unnamed: 11_level_1,Unnamed: 12_level_1,Unnamed: 13_level_1,Unnamed: 14_level_1,Unnamed: 15_level_1,Unnamed: 16_level_1,Unnamed: 17_level_1,Unnamed: 18_level_1,Unnamed: 19_level_1,Unnamed: 20_level_1,Unnamed: 21_level_1
20170502,902.76322,2486.337268,222.70223,86.694747,96.210407,384.0935,243.687558,213.240494,298.533382,321.03351,...,5.618304,12.228624,16.337667,15.853791,8.049842,61.962369,90.623181,81.38123,45.732059,14.250433
20170503,901.693012,2458.607759,220.011542,86.274281,95.764841,382.711958,242.81976,209.993794,299.424894,320.170244,...,5.610843,12.093762,16.388825,15.822053,7.996918,61.652794,92.037297,81.146152,45.66522,14.26287
20170504,898.648253,2434.327687,217.509487,85.953173,95.348827,380.921464,241.525309,207.048611,300.227209,319.209255,...,5.600827,11.952804,16.444759,15.784386,7.943142,61.251215,93.216093,80.8721,45.594402,14.249688
20170505,893.835609,2412.493048,214.370868,85.402007,94.885973,377.70429,240.001648,204.469036,300.466335,317.786651,...,5.592535,11.798688,16.461484,15.702713,7.890575,60.693946,93.950437,80.487844,45.588495,14.194258
20170508,888.101823,2388.403014,210.622674,84.588481,94.241335,373.764215,238.65022,202.458126,299.352026,316.098221,...,5.591171,11.616677,16.28568,15.569446,7.848774,60.111181,94.615652,79.88508,45.523878,14.133393


## func_doc
- ` jaqs.data.Dataview.func_doc() `

**简要描述：**

- add_formula 支持的内置公式查询方法
- 查询方法包括：
  - func_doc().doc # 完整文档
  - func_doc().funcs # 函数一览
  - func_doc().types # 函数类型
  - func_doc().descriptions # 函数描述
  - func_doc().search_by_type(type) # 根据函数类型查询该类型下所有的函数 type-函数类型(string) 
  - func_doc().search_by_description(description) # 根据函数描述查询可能符合该描述的所有的函数 description-函数描述(string) 
  - dv.func_doc().search_by_func(func,precise) # 根据函数名查询该函数 func-函数方法(string) precise-是否模糊查询(bool) 
  

**示例：**

In [24]:
# 完整文档-前5条
dv.func_doc().doc.head()

Unnamed: 0,分类,说明,公式,示例
0,四则运算,加法运算,+,close + open
1,四则运算,减法运算,-,close - open
2,四则运算,乘法运算,*,vwap * volume
3,四则运算,除法运算,/,close / open
4,基本数学函数,"符号函数，返回值为{-1, 0, 1}",Sign(x),Sign(close-open)


In [25]:
# 函数一览-前两个
dv.func_doc().funcs[:2]

array(['+', '-'], dtype=object)

In [26]:
# 函数类型-前两个
dv.func_doc().types[:2]

array(['四则运算', '基本数学函数'], dtype=object)

In [27]:
# 函数描述-前两个
dv.func_doc().descriptions[:2]

array(['加法运算', '减法运算'], dtype=object)

In [28]:
# 根据函数类型查询该类型下所有的函数
dv.func_doc().search_by_type("数学函数")

Unnamed: 0,分类,说明,公式,示例
4,基本数学函数,"符号函数，返回值为{-1, 0, 1}",Sign(x),Sign(close-open)
5,基本数学函数,绝对值函数,Abs(x),Abs(close-open)
6,基本数学函数,自然对数,Log(x),Log(close/open)
7,基本数学函数,对x取负,-x,-close
8,基本数学函数,幂函数,^,close ^ 2
9,基本数学函数,幂函数x^y,"Pow(x,y)","Pow(close,2)"
10,基本数学函数,保持符号的幂函数，等价于Sign(x) * (Abs(x)^e),"SignedPower(x,e)","SignedPower(close-open, 0.5)"
11,基本数学函数,取余函数,%,oi % 10


In [29]:
# 根据函数描述查询可能符合该描述的所有的函数
dv.func_doc().search_by_description("绝对值")

Unnamed: 0,分类,说明,公式,示例
5,基本数学函数,绝对值函数,Abs(x),Abs(close-open)


In [30]:
# 根据函数名查询该函数
dv.func_doc().search_by_func("Tan",precise=True)

Unnamed: 0,分类,说明,公式,示例
24,三角函数,正切函数,Tan(x),Tan(close/open)


In [31]:
# 根据函数名查询该函数 -模糊查询
dv.func_doc().search_by_func("Tan",precise=False)

Unnamed: 0,分类,说明,公式,示例
24,三角函数,正切函数,Tan(x),Tan(close/open)
56,横截面函数 - 数据处理,将指标标准化，即在横截面上减去平均值后再除以标准差,Standardize(x),"Standardize(close/Delay(close,1)-1) 表示日收益率的标准化"


## append_df

- ` jaqs.data.Dataview.append_df(df, field_name, is_quarterly=False, overwrite=True) `

**简要描述：**

- 外部构造一个pandas.DataFrame,作为新增字段通过此方法添加到数据集中(更灵活的定义因子的方式)

**参数：**

|参数名|必选|类型|说明|
|:----    |:---|:----- |-----   |
|df |是  |pandas.DataFrame，日期为索引，品种为columns|待添加的数据|
|field_name|是  |string|待新增的数据的字段名|
|is_quarterly|否  |bool|是否是季度数据,默认False|
|overwrite |否  |bool|若待新增的数据的字段名(field_name)与数据集中已有的字段冲突，是否覆盖。默认覆盖|

**示例：**

In [32]:
df = dv.get_ts('close') - dv.get_ts("high")
dv.append_df(df,"close-high",is_quarterly=False)
dv.get_ts("close-high").head()

symbol,000001.SZ,000002.SZ,000008.SZ,000009.SZ,000027.SZ,000039.SZ,000060.SZ,000061.SZ,000063.SZ,000069.SZ,...,601988.SH,601989.SH,601992.SH,601997.SH,601998.SH,603000.SH,603160.SH,603858.SH,603885.SH,603993.SH
trade_date,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1,Unnamed: 7_level_1,Unnamed: 8_level_1,Unnamed: 9_level_1,Unnamed: 10_level_1,Unnamed: 11_level_1,Unnamed: 12_level_1,Unnamed: 13_level_1,Unnamed: 14_level_1,Unnamed: 15_level_1,Unnamed: 16_level_1,Unnamed: 17_level_1,Unnamed: 18_level_1,Unnamed: 19_level_1,Unnamed: 20_level_1,Unnamed: 21_level_1
20170502,-0.02,-0.17,-0.15,-0.04,-0.04,-0.25,-0.1,-0.17,-0.15,-0.13,...,-0.02,-0.07,-0.09,-0.12,-0.06,-0.19,-2.0,-0.49,-0.1,-0.15
20170503,-0.02,-0.39,-0.12,-0.08,-0.09,-0.16,-0.21,-0.15,-0.23,-0.17,...,-0.03,-0.1,-0.54,-0.09,-0.06,-0.16,-0.58,-0.43,-0.03,-0.14
20170504,-0.15,-0.17,-0.05,-0.11,-0.06,-0.3,-0.18,-0.1,-0.25,-0.04,...,-0.01,-0.08,-0.42,-0.15,-0.04,-0.38,-1.68,-0.96,-0.18,-0.11
20170505,-0.13,-0.18,-0.36,-0.32,-0.09,-0.62,-0.12,-0.07,-0.29,-0.18,...,-0.02,-0.18,-0.21,-0.47,-0.02,-0.34,-3.63,-1.41,-0.15,-0.03
20170508,-0.05,-0.49,-0.21,-0.26,-0.14,-0.3,-0.17,-0.09,-0.64,-0.19,...,0.0,-0.17,-0.82,-0.37,-0.02,-0.38,-2.49,-1.71,-0.39,-0.02


##  append_df_symbol

- ` jaqs.data.Dataview.append_df(df, symbol_name, overwrite=False) `

**简要描述：**

- 外部构造一个pandas.DataFrame（含某个新品种的各个字段的信息）,作为新增品种通过此方法添加到数据集中
- 目前，该方法只支持添加日线数据的信息，无法添加季度数据

**参数：**

|参数名|必选|类型|说明|
|:----    |:---|:----- |-----   |
|df |是  |pandas.DataFrame，日期为索引，字段名为columns|待添加的数据|
|symbol_name|是  |string|待新增的数据的品种名|
|overwrite |否  |bool|若待新增的品种(symbol_name)与数据集中已有的品种冲突，是否覆盖。默认不覆盖|

**示例：**

In [13]:
df = dv.get("000001.SZ")
df.columns = df.columns.droplevel("symbol")
dv.append_df_symbol(df=df,symbol_name="000001.SZ",overwrite=True)

Symbol [000001.SZ] is overwritten.


# 删除数据

## remove_field

- ` jaqs.data.Dataview.remove_field(field_names) `

**简要描述：**

- 将指定字段从dataview中删除

**参数：**

|参数名|必选|类型|说明|
|:----    |:---|:----- |-----   |
|field_names|是  |string|待删除的字段，用","隔开|


**示例：**

In [15]:
print("open" in dv.fields)
dv.remove_field("open")
print("open" in dv.fields)

True
False


## remove_symbol

- ` jaqs.data.Dataview.remove_symbol(symbols) `

**简要描述：**

- 将指定品种从dataview中删除

**参数：**

|参数名|必选|类型|说明|
|:----    |:---|:----- |-----   |
|symbols|是  |string|待删除的品种，用","隔开|


**示例：**

In [16]:
print("000001.SZ" in dv.symbol)
dv.remove_symbol("000001.SZ")
print("000001.SZ" in dv.symbol)

True
False


# 数据落地

## save_dataview

- ` jaqs.data.Dataview.save_dataview(folder_path) `

**简要描述：**

- 将dataview中的数据保存到本地指定目录(folder_path)下

**参数：**

|参数名|必选|类型|说明|
|:----    |:---|:----- |-----   |
|folder_path |是  |string|保存路径|


**示例：**

In [2]:
import os
dataview_folder = './data'

if not (os.path.isdir(dataview_folder)):
    os.makedirs(dataview_folder)
    
dv.save_dataview(dataview_folder)

## load_dataview

- ` jaqs.data.Dataview.load_dataview(folder_path) `

**简要描述：**

- 将数据从本地指定目录(folder_path)下加载到dataview中
- 目前仅支持全部加载

**参数：**

|参数名|必选|类型|说明|
|:----    |:---|:----- |-----   |
|folder_path |是  |string|加载路径|


**示例：**

In [3]:
dv.load_dataview(dataview_folder)

Dataview loaded successfully.
