# HTTP clients
FPNP3e ch9

# References
- [HTTP](https://en.wikipedia.org/wiki/HTTP)
- [HTTP Documentation](https://httpwg.org/specs/)

Objectives
---
- Learn how to use the HTTP protocol from the perspective of a client
  - fetch and cache documents
  - submit queries or data to the server 
- Get familiar with HTTP version 1.1 defined in [RFCs 9110-9112](https://httpwg.org/specs/)
  -  the most common version in use today

HTTP overview
---
- a request–response protocol in the client–server model
  - HTTP headers are managed end-to-end
- intermediate HTTP nodes (proxy servers, web caches, etc.) may be used to improve performance
  - HTTP headers are managed hop-by-hop
- a stateless protocol
  - no requirements on the web server to retain information or status about each user for the duration of multiple requests
- states can be implemented to manage user sessions
  - using cookies or hidden variables 
- HTTP 1.1/2 runs on TCP
  - HTTP 3 runs on QUIC + UDP


Python Client Libraries
---
- [urllib](https://docs.python.org/3/library/urllib.html), built into PSL
- [Requests](https://requests.readthedocs.io/en/latest/), a full-featured third-party solution
- Their basic interfaces are quite similar
    - a callable that opens an HTTP connection,
    - makes a request, and waits for the response headers 
        - before returning a response object that presents them to the programmer
    - The response body is left queued on the incoming socket
        - and read only when the programmer asks
- testbed website: [httpbin.org](http://httpbin.org/)
  ```bash
    # install required packages
    pip install gunicorn requests
    # Host httpbin.org locally with docker
    docker run -p 80:80 kennethreitz/httpbin
  ```

In [1]:
# fetch httpbin with Requests
import requests
# r = requests.get('http://localhost/headers')
r = requests.get('http://httpbin.org/headers')
print(r.text)

{
  "headers": {
    "Accept": "*/*", 
    "Accept-Encoding": "gzip, deflate, br", 
    "Host": "httpbin.org", 
    "User-Agent": "python-requests/2.31.0", 
    "X-Amzn-Trace-Id": "Root=1-651d6983-46752f1a256fe87b361c1f73"
  }
}



In [2]:
# fetch httpbin with urllib
from urllib.request import urlopen
import urllib.error
#r = urlopen('http://localhost/headers')
r = urlopen('http://httpbin.org/headers')
print(r.read().decode('utf-8'))

{
  "headers": {
    "Accept-Encoding": "identity", 
    "Host": "httpbin.org", 
    "User-Agent": "Python-urllib/3.11", 
    "X-Amzn-Trace-Id": "Root=1-651d6988-4f2526060d73fdac5ac6b559"
  }
}



Differences between urllib and Requests
---
| lib\feature | supports gzip | determines correct decoding |
| --- | --- | --- |
| Requests | Y | Y |
| urllib | N | N |

- To go beyond the HTTP protocol to be more browser-like
    - refer to related libraries such as *mechanize*
- Here we focus on the HTTP protocol

Ports, Encryption, and Framing
---
- 80:  the standard port for plain-text HTTP conversations
- 443:  the standard port for HTTP conversations wrapped by TLS
- Non standard ports can be used. 
  - The client needs to specify it in the URLs

```mermaid
sequenceDiagram
  Client->>Server: send a request that names a document
  Note right of Client: wait for a complete response 
  Server-->>Client: a response of an error or  the requested document 
```
- the request and response use the same rules to establish formatting and framing
- In HTTP/1.1, the client is not permitted to transmit a second request over the same socket until the response is finished


🔭 Practice
---
- Run httpbin with docker
  - Access http://localhost/ip with curl
    ```bash
    curl -v localhost/ip
    ```
- *Optional:* Explore HTTP request and response using 
  - [httpie](https://httpie.io/)
    - [Install httpie](https://httpie.io/docs/cli/debian-and-ubuntu) then play with the examples
  - or [http-prompt](https://http-prompt.com/)




HTTP message structure
---
- Both HTTP request and response are called a HTTP message
- Each message is composed of three parts
  - Each part consists of zero or more lines 
    - each line ends with a carriage return and linefeed (CRLF, ASCII codes 13 and 10)
  1. A first line that names
     - a method and document in the request
     - a return code and description in the response
  2. Zero or more lines represents header entries 
     - each entry consist of a name, a colon, and a value
     - entry name is case-insensitive
     - A *mandatory* blank line (CRLFCRLF) terminates the entire list of entries
  3. An optional body
     - There are several options for framing the body 
- No prior warning about how long the line and headers might be
  - commonsense maximums are set on their length to avoid DoS attack



Three framing options for the message body
---
1. *a Content-Length header entry with value of a decimal integer* specifies the length of the body in bytes similar to framing method **M5**.
   - may not be feasible for data generated dynamically
2. a header entry specifies *Transfer-Encoding of chunked* similar to framing method **M6**
   - used to frame a body without knowing its length before hand
   - separately delivered in smaller pieces each prefixed by its length in the format below in order
     - a *hexadecimal* length field
     - (optional $O_1$): a semicolon and extension option
     - a line delimiter CRLF 
     - a block of data of the stated length 
     - again a line delimiter CRLF
   - the last chunk has length 0 bytes without the block of data
   - (optional $O_2$): a few last HTTP header entries if $O_1$ specified
3. *Connection: close* specified by the server to send a body of arbitrary length then close the TCP socket

Methods
---


Paths and Hosts
---



Status Codes
---


Caching and Validation
---


Content Encoding
---



Content Negotiation
---


Content Type
---

HTTP Authentication
---

Cookies
---

Connections, Keep-Alive, and httplib
---