# requests的使用

## 2.实例引入

In [13]:
import requests

r = requests.get('https://www.baidu.com')
print(type(r))
print(r.status_code)
print(type(r.text))
print(r.text[:100])
print(r.cookies)

<class 'requests.models.Response'>
200
<class 'str'>
<!DOCTYPE html>
<!--STATUS OK--><html> <head><meta http-equiv=content-type content=text/html;charse
<RequestsCookieJar[<Cookie BDORZ=27315 for .baidu.com/>]>


In [13]:
import requests

r = requests.get('https://www.httpbin.org/get')
r = requests.post('https://www.httpbin.org/post')
r = requests.put('https://www.httpbin.org/put')
r = requests.delete('https://www.httpbin.org/delete')
r = requests.patch('https://www.httpbin.org/patch')

## 3.GET请求

### 基本实例

In [14]:
import requests

r = requests.get('https://www.httpbin.org/get')
print(r.text)

{
  "args": {}, 
  "headers": {
    "Accept": "*/*", 
    "Accept-Encoding": "gzip, deflate", 
    "Host": "www.httpbin.org", 
    "User-Agent": "python-requests/2.27.1", 
    "X-Amzn-Trace-Id": "Root=1-61dee746-45e16af911a99cc3045cccd1"
  }, 
  "origin": "183.157.122.3", 
  "url": "https://www.httpbin.org/get"
}



对于GET方法如果要加参数

In [15]:
import requests

r = requests.get('https://www.httpbin.org/get?name=germey&age=25')
print(r.text)

{
  "args": {
    "age": "25", 
    "name": "germey"
  }, 
  "headers": {
    "Accept": "*/*", 
    "Accept-Encoding": "gzip, deflate", 
    "Host": "www.httpbin.org", 
    "User-Agent": "python-requests/2.27.1", 
    "X-Amzn-Trace-Id": "Root=1-61dee74c-6eb7958d76f3a1053a01f408"
  }, 
  "origin": "183.157.122.3", 
  "url": "https://www.httpbin.org/get?name=germey&age=25"
}



这样很麻烦，可以使用params参数传递，如下，会自动构造`https://www.httpbin.org/get?name=germey&age=25`

In [16]:
import requests

param = {
    'name': 'germey',
    'age': 25
}
r = requests.get('https://www.httpbin.org/get', params=param)
print(r.text)

{
  "args": {
    "age": "25", 
    "name": "germey"
  }, 
  "headers": {
    "Accept": "*/*", 
    "Accept-Encoding": "gzip, deflate", 
    "Host": "www.httpbin.org", 
    "User-Agent": "python-requests/2.27.1", 
    "X-Amzn-Trace-Id": "Root=1-61dee751-5eb21ebe746af290052c2da3"
  }, 
  "origin": "183.157.122.3", 
  "url": "https://www.httpbin.org/get?name=germey&age=25"
}



In [17]:
import requests

r = requests.get('https://www.httpbin.org/get')
print(r.text)
print(r.json())
print(type(r.json()))

{
  "args": {}, 
  "headers": {
    "Accept": "*/*", 
    "Accept-Encoding": "gzip, deflate", 
    "Host": "www.httpbin.org", 
    "User-Agent": "python-requests/2.27.1", 
    "X-Amzn-Trace-Id": "Root=1-61dee757-7fc48a0c05842de740f252be"
  }, 
  "origin": "183.157.122.3", 
  "url": "https://www.httpbin.org/get"
}

{'args': {}, 'headers': {'Accept': '*/*', 'Accept-Encoding': 'gzip, deflate', 'Host': 'www.httpbin.org', 'User-Agent': 'python-requests/2.27.1', 'X-Amzn-Trace-Id': 'Root=1-61dee757-7fc48a0c05842de740f252be'}, 'origin': '183.157.122.3', 'url': 'https://www.httpbin.org/get'}
<class 'dict'>


### 抓取网页

In [18]:
import requests
import re

r = requests.get('https://ssr1.scrape.center/')
# print(r.text)
pattern = re.compile('<h2.*?>(.*?)</h2>', re.S)
titles = re.findall(pattern, r.text)
print(titles)

['霸王别姬 - Farewell My Concubine', '这个杀手不太冷 - Léon', '肖申克的救赎 - The Shawshank Redemption', '泰坦尼克号 - Titanic', '罗马假日 - Roman Holiday', '唐伯虎点秋香 - Flirting Scholar', '乱世佳人 - Gone with the Wind', '喜剧之王 - The King of Comedy', '楚门的世界 - The Truman Show', '狮子王 - The Lion King']


### 抓取二进制数据

图片、音频、视频等多媒体文件本质上都是由二进制组成的，由于有特定的保存格式和对应的解析方式，才可以看到多媒体，想要抓取他们，就必须拿到它们的二进制数据

In [None]:
import requests

r = requests.get('https://scrape.center/favicon.ico')
print(r.text)
print(r.context)

`r.text`中出现了乱码，`r.content`的前面带有一个b，代表是bytes类型的数据。由于图片是二进制数据，将图片转换成str当然会出现乱码
`text`字符串，`content`bytes
同样用下面方法可以获取音频和视频文件

In [21]:
import requests

r = requests.get('https://scrape.center/favicon.ico')
with open('favicon.ico', 'wb') as f:
    f.write(r.content)

### 添加请求头

In [22]:
import requests

headers = {
    'User-Agent': 'Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/97.0.4692.71 Safari/537.36 Edg/97.0.1072.55'
}
r = requests.get('https://www.httpbin.org/get', headers=headers)
print(r.text)

{
  "args": {}, 
  "headers": {
    "Accept": "*/*", 
    "Accept-Encoding": "gzip, deflate", 
    "Host": "www.httpbin.org", 
    "User-Agent": "Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/97.0.4692.71 Safari/537.36 Edg/97.0.1072.55", 
    "X-Amzn-Trace-Id": "Root=1-61dee9e0-249b8bb00dd2f8f61eca0c15"
  }, 
  "origin": "183.157.122.3", 
  "url": "https://www.httpbin.org/get"
}



## 4.POST请求

In [None]:
import requests

data = {
    'name': 'germey',
    'age': '25'
}
r = requests.post('https://www.httpbin.org/post', data=data)
print(r.text)

## 5.响应

发送请求后自然会得到响应。上面实例使用`text`和`content`获取响应内容

In [7]:
import requests

r = requests.get('https://ssr1.scrape.center')
print(type(r.status_code), r.status_code)
print(type(r.headers), r.headers)
print(type(r.cookies), r.cookies)
print(type(r.url), r.url)
print(type(r.history), r.history)

<class 'int'> 200
<class 'requests.structures.CaseInsensitiveDict'> {'Date': 'Thu, 13 Jan 2022 12:23:43 GMT', 'Content-Type': 'text/html; charset=utf-8', 'Content-Length': '41538', 'Connection': 'keep-alive', 'Server': 'Lego Server', 'Cache-Control': 'max-age=600', 'Expires': 'Thu, 13 Jan 2022 12:33:43 GMT', 'X-Frame-Options': 'DENY', 'X-Content-Type-Options': 'nosniff', 'Strict-Transport-Security': 'max-age=15724800; includeSubDomains', 'X-NWS-LOG-UUID': '43016158-74e7-4ca3-bf09-f6162d7f08c9', 'X-Cache-Lookup': 'Cache Miss', 'X-Daa-Tunnel': 'hop_count=1'}
<class 'requests.cookies.RequestsCookieJar'> <RequestsCookieJar[]>
<class 'str'> https://ssr1.scrape.center/
<class 'list'> []


In [9]:
import requests

r = requests.get('https://ssr1.scrape.center/')
# exit() if not r.status_code == requests.codes.ok else print('Request Successfully.')
exit() if r.status_code is not requests.codes.ok else print('Request Successfully.')

Request Successfully.


## 6.高级用法

文件上传、Cookie设置、代理设置

### 文件上传

In [None]:
import requests

file = {'file': open('favicon.ico', 'rb')}
r = requests.post('https://httpbin.org/post', files=file)
print(r.text)

### Cookie设置

items()的作用为将对象转换为由元组组成的列表

In [1]:
import requests

r = requests.get('https://www.baidu.com')
print(r.cookies)
print(r.cookies.items())
for key, value in r.cookies.items():
    print(key + '=' + value)

<RequestsCookieJar[<Cookie BDORZ=27315 for .baidu.com/>]>
[('BDORZ', '27315')]
BDORZ=27315


可以使用Cookie来维持登陆状态。下面以Github为例，首先登录Github，将请求头中的Cookie内容复制下来

In [None]:
import requests

headers = {
    'Cookie': 'xx',
    'User-Agent': 'Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/97.0.4692.71 Safari/537.36 Edg/97.0.1072.55',
}
r = requests.get('https://github.com/', headers=headers)
print(r.text)

也可以通过cookies参数来设置Cookies的信息。

新建RequestsCookieJar对象，然后利用split方法对复制下来的Cookie内容做分割，利用set方法设置好每个Cookie条目的键名和键值。

In [None]:
import requests

cookies = 'xx'
jar = requests.cookies.RequestsCookieJar()
headers = {
    'User-Agent': 'Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/97.0.4692.71 Safari/537.36 Edg/97.0.1072.55',
}
for cookie in cookies.split(';'):
    key, value = cookies.split('=', 1)
    jar.set(key, value)
r = requests.get('https://github.com/', cookies=jar, headers=headers)
print(r.text)

### Session维持

使用get与post方法模拟网页请求，但这两种方法相当于用两个浏览器打开了不用页面。如果想要解决第二次请求的时候是在第一次请求基础上，又不想设置Cookie，就可以使用Session对象。

利用Session对象，我们可以方便地维护一个Session。可以对比下面两个例子的差别。Session在平常用的非常广泛，可以用于模拟在一个浏览器中打开同一站点的不同页面。

In [6]:
import requests

requests.get('https://www.httpbin.org/cookies/set/number/123456789')
r = requests.get('https://www.httpbin.org/cookies')
print(r.text)

{
  "cookies": {}
}



上面例子没有成功获取Cookies

In [7]:
import requests

s = requests.Session()
s.get('https://www.httpbin.org/cookies/set/number/123456789')
r = s.get('https://www.httpbin.org/cookies')
print(r.text)

{
  "cookies": {
    "number": "123456789"
  }
}



### SSL证书验证

现在很多网站要求使用HTTPS协议，但某些网站可能并没有设置好HTTPS证书，或者网站的HTTPS证书可能并不被CA机构认可，这些网站就会出现SSL证书错误的提示。例如`https://ssr2.scrape.center/`，可以使用verify参数控制是否验证证书。

In [None]:
import requests

r = requests.get('https://ssr2.scrape.center/')
print(r.status_code)

In [9]:
import requests

r = requests.get('https://ssr2.scrape.center/', verify=False)
print(r.status_code)



200


可以设置忽略警告的方式来屏蔽这个警告

In [10]:
import requests
from requests.packages import urllib3

urllib3.disable_warnings()
r = requests.get('https://ssr2.scrape.center/', verify=False)
print(r.status_code)

200


或者通过捕获警告到日志的方式忽略

In [11]:
import requests
import logging

logging.captureWarnings(True)
r = requests.get('https://ssr2.scrape.center/', verify=False)
print(r.status_code)

200


### 超时设置

下面设置的timeout是用作连接和读取的timeout的总和；如果要分别指定用作连接和读取的timeout，可以传入一个元组；如果想永久等待，可以直接将timeout设置为None，或者直接不加参数

In [12]:
import requests

r = requests.get('https://www.httpbin.org/get', timeout=1)
# r = requests.get('https://www.httpbin.org/get', timeout=(5, 30))
# r = requests.get('https://www.httpbin.org/get', timeout=None)
print(r.status_code)

200


### 身份认证

`https://ssr3.scrape.center`

如果用户和密码正确，那么会返回200状态码；如果认证失败则返回401状态码

In [15]:
import requests
from requests.auth import HTTPBasicAuth

r = requests.get('https://ssr3.scrape.center', auth=HTTPBasicAuth('admin', 'admin'))
print(r.status_code)

200


如果参数都传一个HTTPBasicAuth类，就显得有点繁琐，所以requests库提供一种方法，可以直接传一个元组，他会默认使用HTTPBasicAuth这个类来认证

In [16]:
import requests

r = requests.get('https://ssr3.scrape.center', auth=('admin', 'admin'))
print(r.status_code)

200


此外requests库还提供其他认证方式，如OAuth认证

### 代理设置

proxies参数

In [27]:
import requests

# proxies = {
#     'https': 'http://user:passwd@localhost:7890'
# }
proxies = {
    'http': 'http://localhost:7890',
    'https': 'http://localhost:7890'
}
r = requests.get('https://www.google.com', proxies=proxies)
print(r.status_code)

200


除了基本的HTTP代理外，requests库还支持SOCKS协议的代理

首先安装`pip install "requests[socks]"`

In [None]:
import requests

proxies = {
    'http': 'socks5://user:passwd@localhost:7891',
    'https': 'socks5://user:passwd@localhost:7891'
}
r = requests.get('https://www.google.com', proxies=proxies)
print(r.status_code)

### Prepared Requests

requests内部是怎样实现的？

实际上在requests发送请求的时候，实在内部构造了一个Request对象，并给这个对象赋予了各种参数，包括url、headers、data等参数，然后直接把这个Request对象发送出去，请求成功后会再得到一个Response对象，解析这个对象即可。