Skip to content

Latest commit

 

History

History
179 lines (108 loc) · 15.2 KB

README.md

File metadata and controls

179 lines (108 loc) · 15.2 KB

DataSync开发文档

1. DataSync功能

DataSync是基于Python开发的简化版证券数据ETL工具,项目已接入多种数据源与数据库,简化了数据调取清洗写入的具体技术细节,简单的用户需求可以仅通过修改配置文件就可获取需要的数据,如果需求较为复杂,也可以进行简单的二次开发来解决。

2. 安装服务

2.1 自动安装服务

在Datasync目录下(鼠标右键)以管理员模式运行datasync_install.bat

2.2 手动安装

如果自动安装失败,可尝试手动安装。

2.2.1 下载项目代码

在本地打开git命令行,使用以下命令克隆项目代码

git clone https://github.com/sicher123/DataSync.git

2.2.2 安装项目

在DataSync目录下打开命令行,输入以下命令

python setup.py install

2.2.3 配置

配置信息以Excel表格形式存放,可根据需求新增表,但必须严格按照原有格式,否则服务会出现错误。

默认配置文件存放位置为:
DataSync\datasync\config\config.xlsx
Daily_data sheet 存放的是日线配置信息,lb_data sheet存放的是季度数据配置信息,表格示例如下:

表名 (例)dbo.AINDEXEODPRICES
origin 数据源,目前支持SqlServer/Oracle MSSqlOrigin/OracleOrigin
db_config 数据库信息配置,包括数据库地址,用户名,密码 {'addr': '172.16.100.7', 'user': 'user1', 'password': '123456'}
fields 需要请求的数据字段,若为空则取表内全品种 S_DQ_LOW,S_DQ_HIGH
S_INFO_WINDCODE 需要请求的证券代码,若为空则取表内全品种 000001.SZ,600000.SH
DATE_NAME 日期索引的字段名 TRADE_DT
start_date 默认开始日期,若本地数据为空,则请求以该日期为起始时间的数据 20080101
其它字段 …… ……

2.2.4 安装为windows定时任务

1)打开windows任务计划程序

1539055935883

2)创建任务

1539056202717

3)在DataSync项目下找到run_sync.bat文件,在计划任务下设置以日频率执行该脚本。

1539056540773

2.3 服务确认

​ 在定时任务执行了run_sync.bat或手动运行后,在系统桌面会生成一个日志文件目录,检查日志,查看是否提示数据同步成功,否则需要检查代码。

3. 项目结构

3.1 config&props

config目录下存放的是固定的配置文件,props目录下提供了不同配置文件的读写接口,最终输出的是一个python字典对象。目前的服务使用的配置文件是config.xlsx文件,具体配置信息如下:

表名 (例)dbo.AINDEXEODPRICES
origin 数据源,目前支持SqlServer/Oracle MSSqlOrigin/OracleOrigin
db_config 数据库信息配置,包括数据库地址,用户名,密码 {'addr': '172.16.100.7', 'user': 'user1', 'password': '123456'}
fields 需要请求的数据字段,若为空则取表内全品种 S_DQ_LOW,S_DQ_HIGH
S_INFO_WINDCODE 需要请求的证券代码,若为空则取表内全品种 000001.SZ,600000.SH
DATE_NAME 日期索引的字段名 TRADE_DT
start_date 默认开始日期,若本地数据为空,则请求以该日期为起始时间的数据 20080101
其它字段 …… ……

在目前已经支持的数据接口范围内,可通过修改/添加文件配置;

3.2 origin

3.2.1 基本信息

数据源模块,目前支持的数据源:

数据源 支持数据 说明
jaqs 股票分钟行情/日行情/财务数据 需要安装jaqs包 ,见http://qunatos.org/pro/
oracle/sqlserver/mongodb 本地仓库数据 /

如需拓展新的数据源,可自定义添加。

3.2.2 函数说明

origin都需要实现共同的基础方法;以MSSqlorigin为例说明,必须实现的方法有:

  • props_to_sql : 将配置信息转换为数据源的识别语句
  • connect : 数据库连接
  • read : 数据读取接口

3.3 storage

3.3.1 基本信息

本地数据仓库接口,目前支持数据仓库有:

数据仓库 说明 使用场景
内存 数据存储在内存中 仅使用小数据量,低频率数据
excel 以excel文件存储数据 需要跨平台做研究
hdf5-pandas 基于pandas的HDF5文件,使用方便但占用较多资源 中等数据量,频繁全量读取
hdf5 原生HDF5,性能较好但不够灵活 大数据量,频繁全量读取
mongodb 键值型数据库,比sql类数据库更适合证券数据 全类数据存储
sqlite 文件形式的轻型sql数据库 中小数据量存储,频繁查询

可自定义添加新的数据库。

3.3.2 函数说明

origin都需要实现共同的基础方法;以MSSqlorigin为例说明,必须实现的方法有:

  • get_update_info : 获取时间序列数据最晚一条记录的时间
  • update_file/update_table: 数据写入接口

其它自定义方法有:

  • execute : 简化sql执行操作
  • set_attr : 写入其它信息的接口

3.4 sync

3.4.1 基本信息

此目录下是实现同步服务的脚本,可根据自己的实际功能需求编写与拓展,没有固定格式。

3.4.2 函数说明

以guojin_sync为模板,具体流程如下:

  • read_config : 调取配置
  • get_props : 分解配置信息,放置出现一次性调用数据过多导致内存不足的情况
  • spc_treatment : 清洗非标准化数据
  • Updater : 更新器,分流不同格式的数据以不同方式更新
  • run : 使用配置信息读取数据
  • check_n_rollback : 检查本地数据正确性并备份
dbo.AINDEXEODPRICES dbo.ASHAREEODDERIVATIVEINDICATOR
DATE_NAME TRADE_DT TRADE_DT
S_INFO_WINDCODE 000015.SH,399675.SZ,000095.SH,399635.SZ,399437.SZ,399015.SZ,399363.SZ,399989.SZ,000905.SH,000090.SH,000147.SH,399377.SZ,399396.SZ,000010.SH,000005.SH,399374.SZ,000941.SH,000841.SH,000153.SH,000801.SH,399995.SZ,000957.SH,000912.SH,000909.SH,399244.SZ,399429.SZ,000994.SH,000117.SH,399398.SZ,399685.SZ,399324.SZ,000091.SH,399002.SZ,000071.SH,000977.SH,000961.SH,399417.SZ,000019.SH,000978.SH,399004.SZ,399646.SZ,000985.SH,000098.SH,000917.SH,000910.SH,000938.SH,000828.SH,399012.SZ,399673.SZ,000099.SH,000037.SH,000937.SH,000965.SH,000094.SH,399409.SZ,399411.SZ,399624.SZ,399314.SZ,000943.SH,399393.SZ,399419.SZ,399814.SZ,000045.SH,399677.SZ,399385.SZ,399629.SZ,000944.SH,399439.SZ,399556.SZ,000819.SH,399337.SZ,000135.SH,000003.SH,399432.SZ,000073.SH,399806.SZ,399312.SZ,399380.SZ,399994.SZ,399384.SZ,000044.SH,399642.SZ,399965.SZ,399602.SZ,399664.SZ,000002.SH,000046.SH,000155.SH,000919.SH,399686.SZ,000815.SH,399681.SZ,399997.SZ,000068.SH,399627.SZ,399382.SZ,399403.SZ,399656.SZ,399356.SZ,399972.SZ,399554.SZ,000921.SH,399616.SZ,000054.SH,399339.SZ,399557.SZ,399553.SZ,000152.SH,399622.SZ,000030.SH,000018.SH,000952.SH,399604.SZ,399606.SZ,000138.SH,000097.SH,399618.SZ,399307.SZ,399695.SZ,000108.SH,399306.SZ,399555.SZ,399006.SZ,399240.SZ,399341.SZ,000121.SH,000983.SH,399697.SZ,399626.SZ,399007.SZ,000008.SH,399657.SZ,399418.SZ,000855.SH,399703.SZ,399322.SZ,399100.SZ,399010.SZ,399638.SZ,399433.SZ,000141.SH,399321.SZ,399362.SZ,000846.SH,399706.SZ,000984.SH,399550.SZ,000078.SH,399316.SZ,000025.SH,000145.SH,399392.SZ,000810.SH,399992.SZ,399348.SZ,399966.SZ,399993.SZ,000118.SH,000829.SH,399674.SZ,399687.SZ,000107.SH,399364.SZ,399610.SZ,399353.SZ,000038.SH,399813.SZ,399431.SZ,000040.SH,000131.SH,399310.SZ,000106.SH,399662.SZ,000066.SH,399420.SZ,000920.SH,399407.SZ,000120.SH,000092.SH,399361.SZ,000975.SH,000058.SH,399670.SZ,399639.SZ,399623.SZ,000161.SH,399351.SZ,399651.SZ,000968.SH,399315.SZ,399389.SZ,000915.SH,399669.SZ,000150.SH,399705.SZ,000129.SH,000840.SH,399249.SZ,399394.SZ,399434.SZ,399236.SZ,399011.SZ,000953.SH,000096.SH,399481.SZ,000918.SH,399428.SZ,000009.SH,000033.SH,399375.SZ,000922.SH,000047.SH,399103.SZ,000814.SH,000824.SH,000805.SH,000995.SH,000827.SH,000852.SH,000838.SH,000151.SH,000126.SH,000991.SH,399684.SZ,000130.SH,000069.SH,399611.SZ,000812.SH,399412.SZ,000914.SH,399676.SZ,000049.SH,399701.SZ,000048.SH,000960.SH,399406.SZ,399672.SZ,399613.SZ,000928.SH,399683.SZ,399998.SZ,399369.SZ,399647.SZ,000158.SH,399808.SZ,399303.SZ,399366.SZ,000804.SH,000811.SH,000806.SH,399619.SZ,000136.SH,000820.SH,000986.SH,000104.SH,000052.SH,399405.SZ,399630.SZ,399693.SZ,399017.SZ,399807.SZ,000972.SH,000992.SH,399682.SZ,399986.SZ,399388.SZ,399648.SZ,000955.SH,000950.SH,000939.SH,000825.SH,399378.SZ,399655.SZ,399238.SZ,000125.SH,000077.SH,399370.SZ,399391.SZ,000027.SH,399013.SZ,399653.SZ,000064.SH,399232.SZ,399235.SZ,399976.SZ,000945.SH,399643.SZ,399400.SZ,399698.SZ,000103.SH,000032.SH,000115.SH,000119.SH,399234.SZ,399241.SZ,399438.SZ,000031.SH,399365.SZ,000109.SH,000826.SH,399634.SZ,399628.SZ,000039.SH,399617.SZ,000056.SH,000079.SH,000041.SH,399248.SZ,000970.SH,000074.SH,000026.SH,399803.SZ,000940.SH,399102.SZ,000122.SH,000114.SH,000958.SH,399661.SZ,000839.SH,399413.SZ,000929.SH,000100.SH,000927.SH,399973.SZ,000822.SH,000816.SH,399372.SZ,000802.SH,000093.SH,000128.SH,399983.SZ,399395.SZ,399346.SZ,399423.SZ,000133.SH,000134.SH,399333.SZ,399422.SZ,000053.SH,399625.SZ,399243.SZ,399317.SZ,000947.SH,000146.SH,399313.SZ,399975.SZ,000063.SH,399390.SZ,399410.SZ,399680.SZ,399679.SZ,000933.SH,000982.SH,000160.SH,000809.SH,000966.SH,000007.SH,399399.SZ,399358.SZ,000110.SH,399016.SZ,399357.SZ,399699.SZ,399707.SZ,399386.SZ,000057.SH,000908.SH,399614.SZ,399678.SZ,000959.SH,000979.SH,000998.SH,000808.SH,000817.SH,000901.SH,000990.SH,000028.SH,399688.SZ,399691.SZ,000844.SH,000987.SH,399659.SZ,000139.SH,399671.SZ,399974.SZ,399644.SZ,000926.SH,399809.SZ,000946.SH,000902.SH,399326.SZ,399620.SZ,000981.SH,399959.SZ,399668.SZ,000062.SH,399970.SZ,000034.SH,000832.SH,399441.SZ,000076.SH,399381.SZ,399352.SZ,000132.SH,000913.SH,000936.SH,399330.SZ,399344.SZ,000951.SH,399612.SZ,399404.SZ,000807.SH,399637.SZ,399996.SZ,399237.SZ,000903.SH,399650.SZ,000803.SH,399704.SZ,399009.SZ,399636.SZ,399018.SZ,399379.SZ,000948.SH,399552.SZ,000020.SH,399702.SZ,399387.SZ,399551.SZ,000006.SH,000949.SH,399812.SZ,000907.SH,399320.SZ,399632.SZ,399101.SZ,399631.SZ,399408.SZ,000035.SH,000954.SH,399667.SZ,000159.SH,000123.SH,000967.SH,399242.SZ,399367.SZ,000102.SH,000989.SH,000818.SH,000906.SH,399649.SZ,399633.SZ,000065.SH,000070.SH,000830.SH,399350.SZ,000050.SH,000980.SH,399383.SZ,399335.SZ,000017.SH,399001.SZ,000969.SH,000932.SH,399689.SZ,399654.SZ,399231.SZ,399435.SZ,000930.SH,399660.SZ,399990.SZ,000971.SH,399107.SZ,399373.SZ,399427.SZ,000021.SH,000016.SH,000051.SH,000993.SH,399663.SZ,399376.SZ,399805.SZ,000956.SH,399401.SZ,000149.SH,399696.SZ,399694.SZ,000060.SH,000148.SH,000105.SH,000942.SH,000001.SH,399666.SZ,399355.SZ,000988.SH,399608.SZ,399359.SZ,399652.SZ,399233.SZ,399319.SZ,000113.SH,399368.SZ,000843.SH,000137.SH,000821.SH,399008.SZ,000142.SH,000963.SH,000904.SH,000072.SH,399690.SZ,000931.SH,000935.SH,399300.SZ,000075.SH,000112.SH,000925.SH,000055.SH,399645.SZ,000911.SH,000036.SH,399371.SZ,399991.SZ,000067.SH,000004.SH,399436.SZ,000962.SH,399311.SZ,000842.SH,000813.SH,399621.SZ,399640.SZ,399641.SZ,000059.SH,399665.SZ,000934.SH,399239.SZ,399360.SZ,000831.SH,000042.SH,000029.SH,399440.SZ,399005.SZ,399802.SZ,000162.SH,000043.SH,399397.SZ,000916.SH,399658.SZ,399804.SZ,399328.SZ,000964.SH,399967.SZ,000111.SH,399615.SZ,399692.SZ,399971.SZ,399402.SZ,000300.SH
db_config {'addr': '172.16.100.7', 'user': 'bigfish01', 'password': 'bigfish01@0514'} {'addr': '172.16.100.7', 'user': 'bigfish01', 'password': 'bigfish01@0514'}
fields S_DQ_VOLUME,S_DQ_AMOUNT,S_DQ_OPEN,S_DQ_HIGH,OBJECT_ID,S_DQ_LOW,TRADE_DT,S_DQ_CLOSE,S_INFO_WINDCODE OPER_REV_TTM,UP_DOWN_LIMIT_STATUS,TOT_SHR_TODAY,S_DQ_MV,S_VAL_PS_TTM,S_VAL_PCF_NCF,NET_PROFIT_PARENT_COMP_LYR,S_PRICE_DIV_DPS,S_DQ_CLOSE_TODAY,S_VAL_PCF_OCF,NET_CASH_FLOWS_OPER_ACT_TTM,FLOAT_A_SHR_TODAY,S_VAL_PCF_NCFTTM,OBJECT_ID,OPER_REV_LYR,S_INFO_WINDCODE,S_VAL_PB_NEW,NET_ASSETS_TODAY,S_VAL_PCF_OCFTTM,S_VAL_PE,FREE_SHARES_TODAY,NET_CASH_FLOWS_OPER_ACT_LYR,S_DQ_TURN,S_VAL_PS,S_VAL_MV,NET_PROFIT_PARENT_COMP_TTM,S_VAL_PE_TTM,S_DQ_FREETURNOVER,TRADE_DT
folder_path D:/hdf5_data D:/hdf5_data
origin MSSqlOrigin MSSqlOrigin
start_date 20000101 20000101

3.4.3 特别注意

  • 因为本服务以固定频率自动更新数据;若改写服务,则需规范每日数据的起止时间,否则会出现数据缺失或者数据重复的现象。