Skip to content

wget,urllib,re

zixcon edited this page Dec 26, 2017 · 1 revision

urllib

3.x与2.x相比,它整合了urllib,urllib2,urllib3等一系列的模块

所以要实现下载如下: import urllib.request

  • 直接下载 url = 'http://www.cbrc.gov.cn/chinese/files/2017/BF2D2E4669B1458CB1655D0762AD0F60.pdf' data = urllib.request.urlopen(url) with open("去库存-urllib.pdf", "wb") as code: code.write(data.read())

  • 伪装User-Agent下载 headers = {'User-Agent': 'Mozilla/5.0 (Windows NT 6.1; WOW64; rv:23.0) Gecko/20100101 Firefox/23.0'} req = urllib.request.Request(url=url, headers=headers) data = urllib.request.urlopen(req) with open("去库存-urllib.pdf", "wb") as code: code.write(data.read())

wget

re

reg = re.compile(r'

(.*?)') // r表示防止转义 item = re.findall(reg, html)

Clone this wiki locally