# requests的使用

## 2.实例引入

In [13]:
import requests

r = requests.get('https://www.baidu.com')
print(type(r))
print(r.status_code)
print(type(r.text))
print(r.text[:100])
print(r.cookies)

<class 'requests.models.Response'>
200
<class 'str'>
<!DOCTYPE html>
<!--STATUS OK--><html> <head><meta http-equiv=content-type content=text/html;charse
<RequestsCookieJar[<Cookie BDORZ=27315 for .baidu.com/>]>


In [13]:
import requests

r = requests.get('https://www.httpbin.org/get')
r = requests.post('https://www.httpbin.org/post')
r = requests.put('https://www.httpbin.org/put')
r = requests.delete('https://www.httpbin.org/delete')
r = requests.patch('https://www.httpbin.org/patch')

## 3.GET请求

### 基本实例

In [14]:
import requests

r = requests.get('https://www.httpbin.org/get')
print(r.text)

{
  "args": {}, 
  "headers": {
    "Accept": "*/*", 
    "Accept-Encoding": "gzip, deflate", 
    "Host": "www.httpbin.org", 
    "User-Agent": "python-requests/2.27.1", 
    "X-Amzn-Trace-Id": "Root=1-61dee746-45e16af911a99cc3045cccd1"
  }, 
  "origin": "183.157.122.3", 
  "url": "https://www.httpbin.org/get"
}



对于GET方法如果要加参数

In [15]:
import requests

r = requests.get('https://www.httpbin.org/get?name=germey&age=25')
print(r.text)

{
  "args": {
    "age": "25", 
    "name": "germey"
  }, 
  "headers": {
    "Accept": "*/*", 
    "Accept-Encoding": "gzip, deflate", 
    "Host": "www.httpbin.org", 
    "User-Agent": "python-requests/2.27.1", 
    "X-Amzn-Trace-Id": "Root=1-61dee74c-6eb7958d76f3a1053a01f408"
  }, 
  "origin": "183.157.122.3", 
  "url": "https://www.httpbin.org/get?name=germey&age=25"
}



这样很麻烦，可以使用params参数传递，如下，会自动构造`https://www.httpbin.org/get?name=germey&age=25`

In [16]:
import requests

param = {
    'name': 'germey',
    'age': 25
}
r = requests.get('https://www.httpbin.org/get', params=param)
print(r.text)

{
  "args": {
    "age": "25", 
    "name": "germey"
  }, 
  "headers": {
    "Accept": "*/*", 
    "Accept-Encoding": "gzip, deflate", 
    "Host": "www.httpbin.org", 
    "User-Agent": "python-requests/2.27.1", 
    "X-Amzn-Trace-Id": "Root=1-61dee751-5eb21ebe746af290052c2da3"
  }, 
  "origin": "183.157.122.3", 
  "url": "https://www.httpbin.org/get?name=germey&age=25"
}



In [17]:
import requests

r = requests.get('https://www.httpbin.org/get')
print(r.text)
print(r.json())
print(type(r.json()))

{
  "args": {}, 
  "headers": {
    "Accept": "*/*", 
    "Accept-Encoding": "gzip, deflate", 
    "Host": "www.httpbin.org", 
    "User-Agent": "python-requests/2.27.1", 
    "X-Amzn-Trace-Id": "Root=1-61dee757-7fc48a0c05842de740f252be"
  }, 
  "origin": "183.157.122.3", 
  "url": "https://www.httpbin.org/get"
}

{'args': {}, 'headers': {'Accept': '*/*', 'Accept-Encoding': 'gzip, deflate', 'Host': 'www.httpbin.org', 'User-Agent': 'python-requests/2.27.1', 'X-Amzn-Trace-Id': 'Root=1-61dee757-7fc48a0c05842de740f252be'}, 'origin': '183.157.122.3', 'url': 'https://www.httpbin.org/get'}
<class 'dict'>


### 抓取网页

In [18]:
import requests
import re

r = requests.get('https://ssr1.scrape.center/')
# print(r.text)
pattern = re.compile('<h2.*?>(.*?)</h2>', re.S)
titles = re.findall(pattern, r.text)
print(titles)

['霸王别姬 - Farewell My Concubine', '这个杀手不太冷 - Léon', '肖申克的救赎 - The Shawshank Redemption', '泰坦尼克号 - Titanic', '罗马假日 - Roman Holiday', '唐伯虎点秋香 - Flirting Scholar', '乱世佳人 - Gone with the Wind', '喜剧之王 - The King of Comedy', '楚门的世界 - The Truman Show', '狮子王 - The Lion King']


### 抓取二进制数据

图片、音频、视频等多媒体文件本质上都是由二进制组成的，由于有特定的保存格式和对应的解析方式，才可以看到多媒体，想要抓取他们，就必须拿到它们的二进制数据

In [None]:
import requests

r = requests.get('https://scrape.center/favicon.ico')
print(r.text)
print(r.context)

`r.text`中出现了乱码，`r.content`的前面带有一个b，代表是bytes类型的数据。由于图片是二进制数据，将图片转换成str当然会出现乱码
`text`字符串，`content`bytes
同样用下面方法可以获取音频和视频文件

In [21]:
import requests

r = requests.get('https://scrape.center/favicon.ico')
with open('favicon.ico', 'wb') as f:
    f.write(r.content)

### 添加请求头

In [22]:
import requests

headers = {
    'User-Agent': 'Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/97.0.4692.71 Safari/537.36 Edg/97.0.1072.55'
}
r = requests.get('https://www.httpbin.org/get', headers=headers)
print(r.text)

{
  "args": {}, 
  "headers": {
    "Accept": "*/*", 
    "Accept-Encoding": "gzip, deflate", 
    "Host": "www.httpbin.org", 
    "User-Agent": "Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/97.0.4692.71 Safari/537.36 Edg/97.0.1072.55", 
    "X-Amzn-Trace-Id": "Root=1-61dee9e0-249b8bb00dd2f8f61eca0c15"
  }, 
  "origin": "183.157.122.3", 
  "url": "https://www.httpbin.org/get"
}



## 4.POST请求

In [23]:
import requests

data = {
    'name': 'germey',
    'age': '25'
}
r = requests.post('https://www.httpbin.org/post', data=data)
print(r.text)

{
  "args": {}, 
  "data": "", 
  "files": {}, 
  "form": {
    "age": "25", 
    "name": "germey"
  }, 
  "headers": {
    "Accept": "*/*", 
    "Accept-Encoding": "gzip, deflate", 
    "Content-Length": "18", 
    "Content-Type": "application/x-www-form-urlencoded", 
    "Host": "www.httpbin.org", 
    "User-Agent": "python-requests/2.27.1", 
    "X-Amzn-Trace-Id": "Root=1-61deeba0-1637c966532f960430121c03"
  }, 
  "json": null, 
  "origin": "36.22.229.44", 
  "url": "https://www.httpbin.org/post"
}

