# 6.1 Introduction to Requests

_requests_ is a third-party Python library created to allow Python to handle HTTP requests and it is one of the most popular non-standard library modules. The creator, Kenneth Reitz, also created [HTTP Bin](httpbin.org) which is a helpful tool for inspecting and debugging HTTP client behavior. The source code for requests can be found [here] (https://github.com/psf/requests) and is updated regularly. Its documention [here](https://requests.readthedocs.io/en/latest/).

### 6.1.1 Sending Requests

To send a **HTTP _GET_ request** we can simply type the following,

In [None]:
import requests

response = requests.get("https://httpbin.org/get")
print(response)

We see that we have an OK HTTP status. To get the full content of the response,

In [None]:
# Response as a string,
response_txt = response.text
print(response_txt)

# Response as a dictionary,
response_dict = response.json()

Some important things to consider when constructing our request are the **parameters** and the **user-agent**. Parameters for the GET method are useful for when we want query something on a website. Note that we can pass additional information (such as the user-agent) via the requests by including it in our **header**. In context of our request,

"Request header is sent by the client i.e. internet browser in an HTTP transaction. These headers send many details about the source of the request, e.g. the type of browser (or application in general) being used and its version.
HTTP request headers are an important part of any HTTP communication. Websites tailor their layouts and design in accordance to the type of machine, operating system and application making the request. A collection of information on the software and hardware of the source is sometimes called “user agent”. Otherwise, content might be displayed incorrectly.
If the website in question does not recognize the user agent, it will often default to one of these two actions. Some websites will display a default HTML version they have prepared for cases like these while others will block the request entirely."  ~ [Source](https://oxylabs.io/blog/http-headers-explained)

To summarise, specifiying the user-agent can be useful because websites respond differently based on it. Let us consider querying Google,

In [None]:
# Our parameters are contained in a dictionary,
parameters = {
    "q": "What is the weather today?"
}

# Setting the user-agent as that of a browser,
headers = {"User-Agent": "Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36(KHTML, like Gecko) Chrome/111.0.0.0 Safari/537.36"}

# Printing the URL of the response,
response = requests.get("https://www.google.co.uk/search", params = parameters, headers = headers)
print(response.url)

The parameter _q_ contains the query string while the specified user-agent in the header of the request allows us to recover the correct URL. Without the default header Google treats our request differently,

In [None]:
# Google treats non-browser user-agents differently.
parameters = {
    "q": "What is the weather today?"
}

response = requests.get("https://www.google.co.uk/search", params = parameters)
print(response.url)

(Add examples of the other HTTP request types such as POST and DELETE)

### 6.1.2 Handling Requests

Knowing the [HTTP status code](https://umbraco.com/knowledge-base/http-status-codes/) allows us to handle the response accordingly,

In [None]:
import requests

response = requests.get("https://httpbin.org/status/404")

if response.status_code == 404:
    print("HTTP Status: 404 - Not Found")
elif response.status_code == 200:
    print("HTTP Status: 200 - OK")
elif response.status_code == 400:
    print("HTTP Status: 400 - Bad Request")
else:
    print("HTTP Status: {}".format(response.status_code))

We can also handle latency/timeout issues by including a _timeout_ parameter in our request. In general, we can handle any error returned through the code below,

In [15]:
import requests

# If the response takes longer than 5 seconds to be received, the code will return a "timeout" error.
try:
    response = requests.get("https://httpbin.org/delay/25", timeout = 10)

    if response.status_code == 404:
        print("HTTP Status: 404 - Not Found")
    elif response.status_code == 200:
        print("HTTP Status: 200 - OK")
    elif response.status_code == 400:
        print("HTTP Status: 400 - Bad Request")
    else:
        print("HTTP Status: {}".format(response.status_code))

except Exception as err:
    print(err)

HTTPSConnectionPool(host='httpbin.org', port=443): Read timed out. (read timeout=10)


### 6.1.3 Using Proxies

(Needs to be finished)

In [22]:
proxies = {
    "http": "139.99.237.62.80",
    "https": "139.99.237.62.80"
}

headers = {"User-Agent": "Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36(KHTML, like Gecko) Chrome/111.0.0.0 Safari/537.36"}

try:
    response = requests.get("https://httpbin.org/get", headers = headers, proxies = proxies, timeout = 25)

    if response.status_code == 404:
        print("HTTP Status: 404 - Not Found")
    elif response.status_code == 200:
        print("HTTP Status: 200 - OK")
    elif response.status_code == 400:
        print("HTTP Status: 400 - Bad Request")
    else:
        print("HTTP Status: {}".format(response.status_code))
        
    print(response.text)

except Exception as err:
    print(err)

HTTPSConnectionPool(host='httpbin.org', port=443): Max retries exceeded with url: /get (Caused by ProxyError('Cannot connect to proxy.', NewConnectionError('<urllib3.connection.HTTPSConnection object at 0x00000131463B4C10>: Failed to establish a new connection: [Errno 11001] getaddrinfo failed')))
