# httpx的使用

前面介绍了urllib库和requests库的使用，已经可以爬取大多数网站的数据（只支持`HTTP/1.1`），但对于某些强制使用`HTTP/2.0`协议访问的网站依然无能为力。

目前来说，支持`HTTP/2.0`的请求库比较有代表性的是`hyper`和`httpx`，本节我们介绍`httpx`的使用。

## 1.实例

<https://spa16.scrape.center/>就是强制使用`HTTP/2.0`访问的一个网站，这个网站用`requests`是无法爬取的：

In [1]:
import requests

url = 'https://spa16.scrape.center/'
response = requests.get(url)
print(response.text)

ProxyError: HTTPSConnectionPool(host='spa16.scrape.center', port=443): Max retries exceeded with url: / (Caused by ProxyError('Cannot connect to proxy.', RemoteDisconnected('Remote end closed connection without response')))

可以看到，首先抛出的就是`RemoteDisconnected`错误，请求失败

## 2.安装

```
pip3 install httpx
# 但是这样安装完的`httpx`是不支持`HTTP/2.0`的，如果先支持，运行下面
pip3 install httpx[http2]
```

In [9]:
!pip install httpx 'httpx[http2]'



## 3.基本使用

`httpx`和`requests`的很多API存在相似之处，我们先看下最基本的GET请求用法：

In [3]:
import httpx

response = httpx.get('https://www.httpbin.org/get')
print(response.status_code)
print(response.headers)
print(response.text)

200
Headers({'date': 'Mon, 28 Mar 2022 06:30:17 GMT', 'content-type': 'application/json', 'content-length': '312', 'connection': 'keep-alive', 'server': 'gunicorn/19.9.0', 'access-control-allow-origin': '*', 'access-control-allow-credentials': 'true'})
{
  "args": {}, 
  "headers": {
    "Accept": "*/*", 
    "Accept-Encoding": "gzip, deflate", 
    "Host": "www.httpbin.org", 
    "User-Agent": "python-httpx/0.22.0", 
    "X-Amzn-Trace-Id": "Root=1-624155f9-1f107a615f22b9510ff28835"
  }, 
  "origin": "43.251.24.145", 
  "url": "https://www.httpbin.org/get"
}



In [4]:
import httpx

headers = {
    'User-Agent': 'Mozilla/5.0 (Macintosh; Intel Mac OS X 10_15_7) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/99.0.4844.84 Safari/537.36'
}
response = httpx.get('https://www.httpbin.org/get', headers=headers)
print(response.text)

{
  "args": {}, 
  "headers": {
    "Accept": "*/*", 
    "Accept-Encoding": "gzip, deflate", 
    "Host": "www.httpbin.org", 
    "User-Agent": "Mozilla/5.0 (Macintosh; Intel Mac OS X 10_15_7) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/99.0.4844.84 Safari/537.36", 
    "X-Amzn-Trace-Id": "Root=1-62415738-3d63369d61a3d678207f64a7"
  }, 
  "origin": "43.251.24.145", 
  "url": "https://www.httpbin.org/get"
}



In [5]:
import httpx

response = httpx.get('https://spa16.scrape.center')
print(response.text)

RemoteProtocolError: Server disconnected without sending a response.

可以看到，跑出了和使用`requests`请求是类似的错误，其实`httpx`默认是不会开启对`HTTP/2.0`的支持的，默认使用的是`HTTP/1.1`，需要手动声明一下才能使用`HTTP/2.0`，代码改写如下：

In [17]:
import httpx

client = httpx.Client(http2=True)
response = client.get('https://spa16.scrape.center/')
print(response.text)

<!DOCTYPE html><html lang=en><head><meta charset=utf-8><meta http-equiv=X-UA-Compatible content="IE=edge"><meta name=viewport content="width=device-width,initial-scale=1"><meta name=referrer content=no-referrer><link rel=icon href=/favicon.ico><title>Scrape | Book</title><link href=/css/chunk-50522e84.e4e1dae6.css rel=prefetch><link href=/css/chunk-f52d396c.4f574d24.css rel=prefetch><link href=/js/chunk-50522e84.6b3e24aa.js rel=prefetch><link href=/js/chunk-f52d396c.f8f41620.js rel=prefetch><link href=/css/app.ea9d802a.css rel=preload as=style><link href=/js/app.b93891e2.js rel=preload as=script><link href=/js/chunk-vendors.a02ff921.js rel=preload as=script><link href=/css/app.ea9d802a.css rel=stylesheet></head><body><noscript><strong>We're sorry but portal doesn't work properly without JavaScript enabled. Please enable it to continue.</strong></noscript><div id=app></div><script src=/js/chunk-vendors.a02ff921.js></script><script src=/js/app.b93891e2.js></script></body></html>


`httpx`和`requests`有很多相似的API，上面实现的是GET请求，对于POST请求、PUT请求和DELETE请求来说，实现方式是类似的

In [None]:
import httpx

r = httpx.get('https://www.httpbin.org/get', params={'name': 'germey'})
r = httpx.post('https://www.httpbin.org/post', data={'name': 'germey'})
r = httpx.put('https://www.httpbin.org/put')
r = httpx.delete('https://www.httpbin.org/delete')
r = httpx.patch('https://www.httpbin.org/patch')

基于得到的`Response`对象，可以使用如下属性和方法获取想要的内容：
- status_code：状态码
- text：响应体的文本内容
- content：响应体的二进制内容，当请求目标是二进制数据（如图片）时，可以使用此属性获取
- headers：响应头，是`Headers`对象
- json：方法，可以调用此方法实现将文本结果转化为`JSON`对象

除了这些，`httpx`还有很多用法，可以参考[官方文档](https://www.python-httpx.org/quickstart/)

## 4.Client对象

`htttpx`中有一些基本的API和`requests`中的非常相似，但也有一些API是不相似的，下面介绍`Client`对象的使用。官方比较推荐的是`with as`语法，示例如下：

In [18]:
import httpx

with httpx.Client() as client:
    r = client.get('https://www.httpbin.org/get')
    print(r)

<Response [200 OK]>


这个用法等价于：

In [20]:
import httpx

client = httpx.Client()
try:
    r = client.get('https://www.httpbin.org/get')
    print(r)
finally:
    client.close()

<Response [200 OK]>


两种方式的运行结果是一样的，只不过这里需要我们在最后显式地调用`close`方法来关闭`Client`对象。

另外，在声明`Client`对象是可以指定一些参数，例如`headers`，这样使用该对象发起的所有请求都会默认带上这些参数配置，示例如下：

In [21]:
import httpx

url = 'https://www.httpbin.org/headers'
headers = {'User-Agent': 'my-app/0.0.1'}

with httpx.Client(headers=headers) as client:
    r = client.get(url)
    print(r.json()['headers']['User-Agent'])

my-app/0.0.1


关于Client对象的更多高级用法可以参考官方文档：<https://www.python-httpx.org/advanced/>

## 5.支持HTTP/2.0

首先要声明`Client`对象，然后将`http2`参数设置为`True`，如果不设置，那么默认支持`HTTP/1.1`，即不开启对`HTTP/2.0`的支持

In [22]:
import httpx

client = httpx.Client(http2=True)
r = client.get('https://www.httpbin.org/get')
print(r.text)
print(r.http_version)

{
  "args": {}, 
  "headers": {
    "Accept": "*/*", 
    "Accept-Encoding": "gzip, deflate", 
    "Host": "www.httpbin.org", 
    "User-Agent": "python-httpx/0.22.0", 
    "X-Amzn-Trace-Id": "Root=1-62419aac-4a07e2ec540a03085d009c6b"
  }, 
  "origin": "43.251.24.145", 
  "url": "https://www.httpbin.org/get"
}

HTTP/2


## 6.支持异步请求

`httpx`还支持异步客户端请求（即`AsyncClient`），支持`Python`的`async`请求模式，写法如下：

In [25]:
import httpx
import asyncio

async def fetch(url):
    async with httpx.AsyncClient(http2=True) as client:
        response = await client.get(url)
        print(response.text)

if __name__ == '__main__':
    asyncio.get_event_loop().run_until_complete(fetch('https://www.httpbin.org/get'))

RuntimeError: This event loop is already running

关于异步请求，目前仅了解一下即可，后面章节也会专门对于异步请求进行讲解。