<a href="https://colab.research.google.com/github/kbotnen/pythonkurs_v25/blob/main/kode/Pythonkurs%20-%20Del%202%20-%20A%20short%20REST%20API%20-%20Intro%20to%20requests.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

# Introduction to Requests
## HTTP Requests with Python

In [None]:
import requests

# Initiate the requests object.
response = requests.get("https://visualisere.no")


# The response object has a lot of information.

print(f"The URL you requested is {response.url}")

print(f"The response status is {response.status_code}")

print(f"This requests took {response.elapsed.microseconds} microseconds.")

if response.is_redirect:
    print("Redirect")
else:
    print("Not redirect")

In [None]:
# We can retrieve more metadata.
print(f'Response headers:\n{response.headers}\n')

print(f'Response cookies:\n{response.cookies}\n')

print(f"Content type is {response.headers['content-type']}.")

In [None]:
# And ofcourse also the actual content (or in this exapmle, the first 1.000 characters.
response.text[:1000]

In [None]:
# We can add parameters to the URL. Useful when searching on webpage to name one example.
url = "https://api.github.com/search/repositories"
params = {"q": "python+pandas"}
response = requests.get(url, params=params)
print(f"The response status is {response.status_code}")

In [None]:
# And the content.
response.text[:1000]

# Exploring GET Requests
GET is one of the most frequently used request types, and it retrieves data fro a server. The only required parameter is the URL, but several other optional parameters exists.

In [None]:
# Basic request that prints the status code (200 = 0K), and the first 1.000 lines of the response.
import requests

URL = "https://www.reddit.com"
response = requests.get(URL)
print(f"The response status is {response.status_code}")

#response.text[:1000]

In [None]:
# We can customize a URLs query string, i.e if we want to do a search. In requests we can use the params keyword argument which takes a dictionary of strings.
url = "https://www.google.com/maps/search/?api=1"

query = "Universitetet i Bergen/"
params = {"query": query}
# Make a GET request:
response = requests.get(url=url, params=params, allow_redirects=False)
# Print URL string used in the request above:
print(f"Requested URL: {response.url}")

In [None]:
# We can use multiple arguments with the params parameter.
url = "https://crawler-test.com/urls/parameter_1_2"
params = {"parameter_x": "x", "parameter_1": "foo"}
# The GET request:
response = requests.get(url=url, params=params)
print(f"The response status is {response.status_code}")

response.url

In [None]:
# We can disallow redirects, first show with redirects allowed.

# URL reached after redirects:
url = "https://httpbin.org/redirect/3"
# The GET request:
response = requests.get(url=url)
print(f"The response status is {response.status_code}")

response.history

In [None]:
# We can disallow redirects, now show with redirects disallowed.
# URL reached after redirects:
url = "http://httpbin.org/redirect/3"
# The GET request:
response = requests.get(url=url, allow_redirects=False)
print(f"The response status is {response.status_code}")

In [None]:
# When using requests library we can exchang e more than just HTTP request or response.
url = 'https://httpbin.org/get'
headers = {'Content-Type': 'text/html'}
# The GET request:
response = requests.get(url=url, headers=headers)
print(f"The response status is {response.status_code}\n")
print(response.content)

In [None]:
# Verift the content type by converting the response body to JSON.
try:
    print(response.json())
except:
    print('Response is not JSON')

In [None]:
# Basic authentication: When we need to access exclusive content. Deny.

url = "https://postman-echo.com/basic-auth"
# The GET request:
response = requests.get(url=url)
print(f"The response status is {response.status_code}")

In [None]:
# Basic authentication When we need to access exclusive content. Allow.
url = "https://postman-echo.com/basic-auth"
# Authentication credentials:
username = "postman"
password = "password"
# The GET request:
response = requests.get(url=url, auth=(username, password))
print(f"The response status is {response.status_code}")

In [None]:
# Basic authentication When we need to access exclusive content. Allow, and see the response.
try:
    print(response.json())
except:
    print('Response is not JSON')

In [None]:
# Authorization Header: When we dont have username+password, and want to use an access token instead.
url = "https://postman-echo.com/basic-auth"
headers = {"Authorization" : "Basic cG9zdG1hbjpwYXNzd29yZA=="}
# The GET request:
response = requests.get(url=url, headers=headers)
print(f"The response status is {response.status_code}\n")
try:
    print(f'Response JSON:\n{response.json()}')
except:
    print('Response is not JSON')

In [None]:
# Detour: Decoding a base64 string.
import base64

decoded_string = base64.b64decode("cG9zdG1hbjpwYXNzd29yZA==")
print(f"The decoded_string is: {decoded_string}")

encoded_string = base64.b64encode(b"postman:password")

print(f"The encoded_string is: {encoded_string}")

In [None]:
# Timeout 2 sec.
url = "https://www.reddit.com"
response = requests.get(url, timeout=2)
print(f"The response status is {response.status_code}")

In [None]:
# Timeou very short.
url = "https://www.reddit.com"
timeout = 0.01
try:
    response = requests.get(url, timeout=timeout)
    print(f"The response status is {response.status_code}")
except (requests.exceptions.ConnectTimeout, requests.exceptions.ReadTimeout) as e:
    print(f"No response has been received within {timeout} seconds:\n{e}")

# Diving into HTTP methods.
Let us get to know the other methods:
* POST
* PUT
* PATCH
* DELETE

A HTTP request is initiated by a client to a server to exhange content and information. The most common HTTP methods (POST, GET, PUT, PATCH and DELETE) correspond to the CRUD (CREATE, READ, UPATE and DELETE) operations that are typicale performed on databases.

In [None]:
# The POST method is used to post data, such as files or forms on the web. Essentially the POST method enables the creation of new resources. A successful POST request will typically return a status code of 201.

import requests

url = "https://jsonplaceholder.typicode.com/posts"
data = {"title" : "Sample POST request", "body" : "How to make a POST request"}
response = requests.post(url=url, data=data)
print(f"The response status is {response.status_code}\n")

In [None]:
try:
    print(response.json())
except:
    print('Response is not JSON')

In [None]:
'''
The PUT method resembles the POST method because both involev sending data to the erver. When making a PUT request to a URL that points to an existing resource, the server update that resource with the data sent in the request body. If the targeted resource does not exist, the server may or may not create the new resource, depending on the configuration.
The status code of a successful PUT request is 200, however we migh encounter a few more:
200: An existing resource has been modified.
201: A new resource has been created.
500x: The resource cannot be created or modified.
'''
# When making PUT request we need to remember a few key points. PUT is never cached, it does not remain in browser history, it cannot be bookmarked, and it has no retrictions on data length.

url = "http://httpbin.org/put"
data = {"title" : "Sample PUT request", "body" : "How to make a PUT request"}
response = requests.put(url=url, data=data)
print(f"The response status is {response.status_code}\n")

In [None]:
try:
    print(response.json())
except:
    print('Response is not JSON')

### PUT vs POST
Both PUT and POST methods are used for sending data to a server. However, there is a crucial difference between them:

* The PUT method is used to update or replace an existing resource on the host, while the POST method creates a new resource.
* Another distinction is their idempotency. An idempotent action means that performing the same request multiple times has the same effect as performing it once. Thus, if you make the same PUT request several times, the result will always be the same. This concept does not apply to the POST method, as repeating the same request could potentially create multiple instances of the same resource.

In [None]:
# The PATCH method is particularly usefu for partialy updating a resource. It comes in handy when making minor modifications to an otherwise large resource.

item = {
    "id": 1001,
    "category_level_1": "Electronics",
    "category_level_2": "TV Accessories",
    "category_level_3": "HDMI cables",
    "category_level_4": "3m",
    "price": 24.9
}

url = "http://httpbin.org/patch"
data = {"key": "value"}
response = requests.patch(url=url, data=data)
print(f"The response status is {response.status_code}\n")

### PATCH vs PUT
As discussed earlier, the PATCH method enables partial updating of a resource, making it more efficient and practical than the PUT method, particularly when dealing with minor changes to large resources. It is important to note that the PUT method entirely replaces the existing resource with the one sent to the server.

The main distinction between these two methods lies in how the server processes the enclosed resource. In the case of PUT, the enclosed resource is considered an updated version of the existing resource, and the server is requested to replace the original version. On the other hand, the PATCH method treats the enclosed resource as a set of instructions detailing the modifications to be made to the existing resource to obtain the updated version.

In [None]:
'''The DELETE method is used to remove a resource from a server. A couple of relevant status codes associated with the DELETE method:
200: The action has been enacted, and the response include a message body.
204: The action has been enacted, but the response does not include a message body.
202: The request has been accepted, however the delete operation has not yet been performed, so the acion is pending.
404: The method is not allowed for the specified resource.
'''

url = "http://httpbin.org/delete"
response = requests.delete(url=url)
print(f"The response status is {response.status_code}\n")

### Less common HTTPS methods
The most commonly-used HTTP methods include POST, GET, PUT, PATCH, and DELETE. However, there are other less-frequently used but important methods:
* CONNECT: This method is employed for establishing a network connection to a web server.
* OPTIONS: Clients can use the OPTIONS method to determine which HTTP methods and options are supported by a web server. A specific URL or an * (referring to the entire server) can be used.
* TRACE: Primarily utilized for debugging purposes, the TRACE method enables a server to echo the received HTTP request back to the requester.

# Interacting with Secured APIs

## Basic authentication
Basic authentication, true to its name, is a straightforward authentication technique. To access the web server, users must supply a username and password as credentials.

In [None]:
# Basic authentcation, with the credentials.
url = "https://postman-echo.com/basic-auth"
# Authentication credentials:
username = "postman"
password = "password"
# The GET request:
response = requests.get(url, auth=(username, password))
print(f"The response status is {response.status_code}")

In [None]:
# Basic authentication, without the credentials.
url = "https://postman-echo.com/basic-auth"
# The GET request:
response = requests.get(url)
print(f"The response status is {response.status_code}")

### Token-Based authentication
In some cases, we may not have a username and password for basic authentication. Instead, we might be provided with a token to be used as credentials.

In [None]:
url = "https://postman-echo.com/basic-auth"
# Authentication credentials:
token = "Basic cG9zdG1hbjpwYXNzd29yZA=="
header = {"Authorization" : token}
# The GET request:
response = requests.get(url, headers=header)
print(f"The response status is {response.status_code}")

## Digest Authentication
Basic authentication is generally not considered secure because credentials are transmitted to the host in an unprotected manner. Digest authentication aims to address this issue by creating a hash of the password using a hashing algorithm. This prevents credentials from being transmitted in plain text.

Although digest authentication is more secure than basic authentication, it is still possible for malicious attempts to obtain the password hash. To mitigate this, a random number called nonce (number used only once) is used in the process. After hashing the password and username, the resulting value is combined with the nonce and hashed again before being sent to the server.

The server then compares the received final hash with the hash and nonce stored in its database.

In [None]:
from requests.auth import HTTPDigestAuth

url = 'https://httpbin.org/digest-auth/auth/user/pass'
# The GET request:
response = requests.get(url, auth=HTTPDigestAuth('user', 'pass'))
print(f"The response status is {response.status_code}")

### OAuth
OAuth (Open Authorization) is an authentication protocol that facilitates the secure exchange of your information with third-party services, without requiring you to share your password.

Suppose you have a blog on Medium, and you want your articles to be automatically posted or shared on your Twitter account. To set this up, you grant permission for Medium to access your Twitter profile and post tweets on your timeline. During the initial setup, Twitter will ask for your approval, and the rest of the process will be facilitated using OAuth.

In [None]:
# pip install oauthlib
# pip install requests_oauthlib
from requests_oauthlib import OAuth1

# URL and OAuth credentials:
url = 'https://api.twitter.com/1.1/account/verify_credentials.json'
auth = OAuth1('YOUR_APP_KEY', 'YOUR_APP_SECRET', 'USER_OAUTH_TOKEN', 'USER_OAUTH_TOKEN_SECRET')
# The GET request:
response = requests.get(url, auth=auth)
print(f"The response status is {response.status_code}")

## Optimizing HTTP Requests
### Measuring performance

In [None]:
%%time
# Single request
response = requests.get('https://www.google.com')
# Measure one request.

In [None]:
%%time
# Multiple requests
response1 = requests.get('https://www.google.com')
response2 = requests.get('https://www.google.com')
response3 = requests.get('https://www.google.com')
response4 = requests.get('https://www.google.com')
response5 = requests.get('https://www.google.com')
# Measure multiple requests.

### Connection Pooling
Connection pooling is an optimization technique that enables the reuse of existing connections rather than creating a new connection with each request, resulting in faster response times. The Session object of the Requests library can be used to implement connection pooling.

In [None]:
%%time
# Without connection pooling:
response1 = requests.get('https://www.google.com')
response2 = requests.get('https://www.google.com')
response3 = requests.get('https://www.google.com')
response4 = requests.get('https://www.google.com')
response5 = requests.get('https://www.google.com')

In [None]:
%%time
# With connection pooling:
session = requests.Session()
response1 = session.get('https://www.google.com')
response2 = session.get('https://www.google.com')
response3 = session.get('https://www.google.com')
response4 = session.get('https://www.google.com')
response5 = session.get('https://www.google.com')

### Compression
The performance of web requests is influenced by both the number of requests and the amount of data being transferred. Enabling compression can help reduce the data transfer size, which is particularly important when requesting large amounts of data from a server.

The Requests library has the Accept-Encoding header, which, by default, supports gzip and deflate compression options.

In [None]:
# URL and headers:
url = "https://www.google.com/maps/search/?api=1"
headers = {'Accept-Encoding': 'gzip'}
# The GET request:
response = requests.get(url, headers=headers)
print(f"The response status is {response.status_code}")

### Keep-Alive
In HTTP 1.0, server connections are closed after sending the response. To reuse the same connection for multiple requests and responses, the Connection: keep-alive header can be employed.

However, in HTTP 1.1, all connections are treated as persistent by default, unless specified otherwise. When using the Requests library, we can create a Session object for multiple requests, which ensures that connections are automatically kept alive.

In [None]:
# Create a session object:
session = requests.session()
# Check the connection:
session.headers["Connection"]

In [None]:
# Create a session object:
session = requests.session()
url = "https://www.reddit.com"
status_codes = {}
# Send 10 get requests using the same session and save the status codes:
for i in range(10):
    response = session.get(url=url)
    status_codes[i] = response.status_code
print(status_codes)

### Persisting Session Parameteres
One of the benefits of using Session objects is their ability to persist parameters across multiple requests. This can be achieved by supplying data to various properties of a Session object.

In [None]:
# Authenticate a Session object.
url = "https://postman-echo.com/basic-auth"
session = requests.Session()
session.auth = ('postman', 'password')
response1 = session.get(url=url)
response2 = session.get(url=url)
response3 = requests.get(url=url)
print(f"The response 1 status is {response1.status_code}")
print(f"The response 2 status is {response2.status_code}")
print(f"The response 3 status is {response3.status_code}")

# Advanced Requests Features

## Setting Request Timeout
When making an HTTP request using the Requests library, there is no default timeout set for the request. As a result, your request could potentially run for an extended period. Although this might prevent your code from failing in some cases, it could lead to undesired outcomes if a client-server is unresponsive. Therefore, it is a good practice to set a timeout to better understand any issues caused by delayed responses or server unresponsiveness.

In [None]:
# Set timeout, but no handling of exceptions.
import requests

url = "https://www.google.com/maps/search/?api=1"
# The GET request:
response = requests.get(url=url, params={"query": "Spotify+Camp+Nou/"}, timeout=5)
print(f"The response status is {response.status_code}")

In [None]:
# Set timeout, handle exceptions.
url = "https://www.google.com/maps/search/?api=1"
# Set a timeout value:
timeout = 0.001
# The GET request:
try:
    response = requests.get(url=url, params={"query": "Spotify+Camp+Nou/"}, timeout=timeout)
    print(f"The response status is {response.status_code}")
except requests.exceptions.ConnectTimeout:
    print("Connection could not be created within the given timeout value.")
except requests.exceptions.ReadTimeout:
    print("Reading could not be done within the given timeout value.")

## Distinct Timeout Values
A GET request comprises two main parts:
* Establishing a connection
* Reading the content

Timeouts can occur during either of these phases. To pinpoint the exact cause of a timeout, we can set distinct timeouts for the connection and content reading stages. The stream parameter accepts a tuple containing the separate timeout values for these steps.

In [None]:
# Connection Timeout is low.
url = "https://www.google.com/maps/search/?api=1"
# Set a timeout value:
timeout = (0.001, 2)
# The GET request:
try:
    response = requests.get(url=url, params={"query": "Spotify+Camp+Nou/"}, timeout=timeout)
    print(f"The response status is {response.status_code}")
except requests.exceptions.ConnectTimeout:
    print("Connection could not be created within the given timeout value.")
except requests.exceptions.ReadTimeout:
    print("Reading could not be done within the given timeout value.")

In [None]:
# Read Timeout is low.
url = "https://www.google.com/maps/search/?api=1"
# Set a timeout value:
timeout = (2, 0.001)
# The GET request:
try:
    response = requests.get(url=url, params={"query": "Spotify+Camp+Nou/"}, timeout=timeout)
    print(f"The response status is {response.status_code}")
except requests.exceptions.ConnectTimeout:
    print("Connection could not be created within the given timeout value.")
except requests.exceptions.ReadTimeout:
    print("Reading could not be done within the given timeout value.")

## Loading images
To download a file using the Requests library, you need to read the content of a response and save it to a local file on your device. Three methods can be used to read the response from a server: content, json, or raw.

In [None]:
url = "https://upload.wikimedia.org/wikipedia/commons/a/af/Tux.png"
# The GET request:
response = requests.get(url=url, allow_redirects=True)
# Write the content to a local file:
with open("tux.png", "wb") as f:
    f.write(response.content)
print(f"The response status is {response.status_code}")

In [None]:
from IPython.display import Image

Image(filename="tux.png")

## Downloading Files
Downloading different file types, such as JSON or text files, follows a similar process as downloading images.

In [None]:
# Download a json file.
url = "https://reqres.in/api/users"
# The GET request:
response = requests.get(url=url)
# Write the content to a local file:
with open("sample_file.json", "wb") as f:
    f.write(response.content)
print(f"The response status is {response.status_code}")

In [None]:
# One of many ways to display json content.
import json

with open("sample_file.json", "r") as f:
     data = json.load(f)
# Convert it to a Python dictionary:
data_dict = dict(data)
# Display the content:
data_dict

## Handling Large Files
Suppose we want to avoid downloading large files. The Requests library offers a flexible structure that allows us to check the file size before initiating the download. To achieve this, consider the following two points:

* By default, the response body is downloaded immediately upon making a request. We can change this behavior by setting the stream parameter to True.
* Before downloading the file, we can verify its size by checking the Content-Length header through the headers attribute of the response.

In [None]:
url = "https://upload.wikimedia.org/wikipedia/commons/a/af/Tux.png"
# The GET request:
response = requests.get(url=url, allow_redirects=True, stream=True)
# Get the content-length header:
content_length = response.headers.get('content-length')
print(f"The content length is {content_length}")

In [None]:
url = "https://upload.wikimedia.org/wikipedia/commons/a/af/Tux.png"
# The GET request:
response = requests.get(url=url, allow_redirects=True)
# Get the content-length header:
content_length = int(response.headers.get('content-length'))
# Conditionally download the file:
if content_length < 10000:
    # Write the content to a local file:
    with open("tux.png", "wb") as f:
        f.write(response.content)
else:
    print(f"The file size is too large!")

## Streaming APIs
Streaming APIs are designed for real-time data consumption and are commonly employed by social media platforms, such as the Twitter Streaming API.

The Requests library provides support for iterating over streaming APIs using the iter_lines method. It is essential to set the stream parameter to True when using this method

In [None]:
url = "https://httpbin.org/stream/20"
# The GET request:
response = requests.get('https://httpbin.org/stream/5', stream=True)
for line in response.iter_lines():
    # Filter out keep-alive new lines:
    if line:
        decoded_line = json.loads(line.decode('utf-8'))
        print(f"\nLine number {decoded_line['id']}")
        print(decoded_line)

## Retrying Requests
In certain cases, a webpage may fail to load in the browser. Often, we may attempt to retry the connection immediately or after a short duration. This is a common experience when sending HTTP requests to a server.

There are several reasons why a request might not succeed. Some issues can be resolved by retrying the request. The Requests library offers a flexible approach to implementing retries in a request.

To achieve this, we can use HTTPAdapter and Retry adapters.

In [None]:
from requests.adapters import HTTPAdapter, Retry

url = "https://www.google.com/maps/search/?api=1"
# Create a session object:
session = requests.Session()
retries = Retry(total=5)
session.mount('https://', HTTPAdapter(max_retries=retries))
session.get(url=url)

## Custom Retry Strategies
We can create more customized retry strategies by using additional parameters:
* backoff_factor: This parameter determines the backoff factor to apply between attempts after the second try (as most errors are resolved immediately by a second attempt without any delay). The waiting period is calculated using the following formula:
{backoff_factor} * (2 ** ({number_of_total_retries} - 1))
* status_forcelist: This is a set of integer HTTP status codes on which we should enforce a retry. A retry is triggered if the request method is in allowed_methods and the response status code is in status_forcelist.

If the request continues to fail after reaching the specified maximum number of attempts, a RetryError exception will be raised.

In [None]:
url = "http://httpstat.us/503"
# Create a session object:
session = requests.Session()
retries = Retry(total=3, backoff_factor=0.5, status_forcelist=[502, 503, 504])
session.mount('https://', HTTPAdapter(max_retries=retries))
try:
    session.get(url=url)
except requests.exceptions.RetryError as e:
    print(str(e.args[0]))