# lecture03_如何获取金融数据
b站链接：https://www.bilibili.com/video/av20727707/?p=5

本节重点：
- 利用tushare获取金融数据: www.tushare.org
- 通过quantos获取金融数据
- 通过爬虫获取金融数据
- pandas DataFrame的相关知识

## 通过tushare获取金融数据

In [2]:
import tushare as ts

df = ts.get_hist_data(
        '600030', 
        start='2018-01-01', 
        end='2018-01-31'
)

In [3]:
df.head()

Unnamed: 0_level_0,open,high,close,low,volume,price_change,p_change,ma5,ma10,ma20,v_ma5,v_ma10,v_ma20
date,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1,Unnamed: 7_level_1,Unnamed: 8_level_1,Unnamed: 9_level_1,Unnamed: 10_level_1,Unnamed: 11_level_1,Unnamed: 12_level_1,Unnamed: 13_level_1
2018-01-31,21.45,21.7,21.31,20.94,2472534.0,-0.32,-1.48,21.954,21.791,20.668,2652434.9,3008657.2,2506280.92
2018-01-30,21.91,22.34,21.65,21.53,2451540.0,-0.46,-2.08,22.278,21.755,20.534,3172105.3,3259291.5,2458637.67
2018-01-29,22.47,22.73,22.12,22.01,2644808.5,-0.22,-0.98,22.204,21.616,20.374,3353126.3,3247210.83,2405724.9
2018-01-26,22.33,22.74,22.35,22.21,2519975.0,0.02,0.09,22.024,21.348,20.172,3186164.1,3206865.4,2305660.55
2018-01-25,22.5,22.85,22.34,22.24,3173317.0,-0.58,-2.53,21.812,21.046,19.961,3408204.45,3059471.71,2230712.51


## 通过quantos获取金融数据
- 基础数据，主要是一些基础信息，包括证券信息，行业代码，指数信息，交易日历等。
- 市场信息，即由市场行情产生的数据，包括实时行情、实时分钟线、历史tick、历史日线、历史分钟线等。
- 参考数据，包括股票的复权因子、分红、停复牌、行业分类，指数的成份股，公募基金的净值等。


- `通过pip安装jasq`

In [None]:
# 导入DataApi
from jasq.data import  DataApi

# 初始化api
api = DataApi('tcp://data.quantos.org:8910')

# 获取用户名和密码
import os
user = os.environ.get('QUANTOS_USER')
token = os.environ.get('QUANTOS_TOKEN')

# 登录
info, msg = api.login(user, token)
print(info, msg)

In [None]:
# 查询日收盘价信息
df, msg = api.daily(
        symbol='399001.SZ',
        start_date='2018-02-01',
        end_date='2018-02-28',
        fields=open,high,low,close
)

## 通过爬虫获取金融数据
- 请求参数
- 请求方法：post/get
- 返回结果和字符集

In [4]:
import requests

In [5]:
# 准备好请求的数据
URL = 'http://www.chinamoney.com.cn/dqs/rest/dqs-u-fx/RefRateHis'
data = {'lang' : 'CN',
       'startDateTool' : '13 Mar 2018',
       'endDateTool' : '13 Mar 2018',
       'currencyCide' : 'USD.CNY'
       }

In [8]:
# request header
USER_AGENT = "Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/70.0.3538.110 Safari/537.36"

# simulate http request
session = requests.Session()
session.headers['User-Agent'] = USER_AGENT
res = session.get(URL, params=data)
if res.status_code != 200:
    print('query_error, status_code = ', res.status_code)
    
# display http response
rsp = res.text
rsp[:500]

'{"head":{"version":"2.0","provider":"CWAP","req_code":"","rep_code":"200","rep_message":"","ts":1545298988001,"producer":"","tstext":"2018-12-20 17:43:08"},"data":{"currencyList":[{"currValue":"ALL","currValueDesc":"全部"},{"currValue":"USD.CNY","currValueDesc":"USD/CNY"},{"currValue":"EUR.CNY","currValueDesc":"EUR/CNY"},{"currValue":"JPY.CNY","currValueDesc":"100JPY/CNY"},{"currValue":"GBP.CNY","currValueDesc":"GBP/CNY"}],"message":"","flag":"0","endDateTool":"13 Mar 2018","startDateTool":"13 Mar'

In [7]:
import json

# 载入数据并记录
rsp_json = json.loads(rsp)
raw_record = rsp_json['records']

for record in raw_record:
    print(record['rateOf11hour'])

6.3244
7.8006
5.9425
8.7908


In [1]:
# 用bs4来解析html网页

from bs4 import BeautifulSoup

html_doc = """
        <html><head><title>The Dormouse's story</title></head>
        <body>
        <a href="http://example.com/elsie" class="sister" id="link1">Elsie</a>,
        <a href="http://example.com/lacie" class="sister" id="link2">Lacie</a> and
        <a href="http://example.com/tillie" class="sister" id="link3">Tillie</a>;
"""

soup = BeautifulSoup(html_doc, 'html.parser', from_encoding='utf-8')



In [2]:
print('测试1：获取所有的链接')
links = soup.find_all('a')
for link in links:
    print(link.name, link['href'], link.get_text())

测试1：获取所有的链接
a http://example.com/elsie Elsie
a http://example.com/lacie Lacie
a http://example.com/tillie Tillie


In [3]:
print('\n测试2：获取Lacie的链接')
link_node = soup.find('a', href='http://example.com/lacie')
print(link_node.name, link_node['href'], link_node.get_text())


测试2：获取Lacie的链接
a http://example.com/lacie Lacie
