# Unit 3  Exploring HTTP Response Headers with Python's requests Library


# Introduction to HTTP Response Headers

Hello and welcome to this new lesson\! Today, we will build upon your existing knowledge of Python's `requests` library and learn more about the responses we get when making HTTP requests. Specifically, we'll be inspecting **response headers**.

### Introduction to HTTP Response Headers

Whenever you make an HTTP request, the server does not only send back the requested content but also some metadata related to that content. This metadata is conveyed through **HTTP headers**, which come as key-value pairs.

In simpler terms, if HTTP were a mailing system, headers would be similar to the information you find on the outside of the mail envelope—who it's from, where it's going, the date it was sent, and so on. HTTP headers consist of information like what type of content it's sending, how to decode it, when was the last time it was modified, and more\!

### Accessing Response Headers in Python using requests

Let's use Python's `requests` library to see some of these headers in action, using the solution code as an example.

```python
import requests

url = 'http://quotes.toscrape.com'  # We'll scrape quotes from this page
response = requests.get(url)
```

First, we make an HTTP GET request to our target URL, which gives us a `Response` object. One of the properties of this object is `headers`, which is a dictionary-like object of all response headers.

We can print these headers like this:

```python
if response.ok:
    print("Response headers:")
    for header, value in response.headers.items():
        print(f'{header}: {value}')
```

This code prints each header along with its corresponding value. Let's run this and see what we get\!

The output of the above code will be similar to this:

```
Response headers:
Date: Tue, 07 May 2024 18:28:19 GMT
Content-Type: text/html; charset=utf-8
Content-Length: 11054
Connection: keep-alive
```

This output summarises key information from the response headers, including when the response was sent (`Date`), what the content type is (`Content-Type`), how big the content is in bytes (`Content-Length`), and the connection status (`Connection`). Such details are crucial for understanding how to handle the received data in web scraping or API calls.

### Understanding Key HTTP Response Headers

When you run the previous code, you'll probably come across many headers. Here are a few important ones which come up frequently:

  * **Server**: The software used by the originating server.
  * **Date**: The date and time when the message was sent.
  * **Content-Type**: The MIME type of the returned content. This could be `text/html`, `application/json`, `image/jpeg`, and so on. This tells the client what the content is and how to open it.
  * **Content-Length**: The size, in bytes, of the returned content.
  * **Connection**: Options desired by the client for the connection.

These headers provide additional insights into the server and the response content, and they can be quite useful in some cases\!

### Applying Response Headers in Web Scraping

Now, why is all this important for web scraping? Let's dig a bit deeper.

As a web scraper, your main goal is to extract useful data from web pages. However, scraping is not just about making requests and parsing HTML. You also need to ensure that your scraper behaves well and follows the rules set by the server. The server's responses, including headers, are a critical source of feedback for your scraper, containing valuable information about what the server allows or expects you to do.

For instance, an important header in web scraping is `Content-Type`, which can help you determine the format of the returned content. If the `Content-Type` is `application/json`, you can use `response.json()` to parse the content as a JSON object. Knowing this can greatly shape how your web scraping code is structured.

### Summary

Well, our journey for this lesson stops here. We learned about HTTP response headers and how to inspect them using Python's `requests` library. Keep practicing and experimenting with different websites to further strengthen your understanding of this important aspect of HTTP\!

Happy coding\!

## Inspecting Content-Type Header in Python Web Requests

Curious about what type of content a web server sends when you request a webpage? The given code demonstrates how to inspect the Content-Type header from the response of a web server, using the example of http://quotes.toscrape.com. This knowledge is valuable in web development for understanding how to process the retrieved data. Run the code to see the type of content provided by the server!

```python
import requests

# Checking the server's content type
url = 'http://quotes.toscrape.com'
response = requests.get(url)

if response.ok:
    content_type = response.headers.get('Content-Type', None)
    print("Content-Type: " + content_type) if content_type else print("Content-Type header missing!")

```

This is a great, practical example of using the `requests` library to inspect HTTP headers. The code specifically targets the **`Content-Type`** header, which is one of the most important headers for a client to know.

The use of `response.headers.get('Content-Type', None)` is a robust way to access the header. Using `.get()` with a default value of `None` prevents a `KeyError` if the header is not present in the response, making the code more resilient.

The conditional `print` statement effectively handles both cases: when the header exists and when it's missing. This simple check is a fundamental step in ensuring your web scraping or API code knows how to properly handle the data it receives. For a typical website like `quotes.toscrape.com`, the output will correctly identify the content as `text/html`.

## Inspect and Modify Response Headers in Python

Great progress, Space Explorer! Now, let's dive a little deeper into HTTP response headers. Your task is to modify the code to print the 'Content-Length' of the response instead of 'Content-Type'. Remember, this header tells us the size of the response body. Good luck, and let’s see how well you understand the server's feedback through headers!

```python
import requests

# Making an HTTP GET request to fetch quotes
response = requests.get('http://quotes.toscrape.com')

# Only execute the following if the request was successful
if response.ok:
    # Print the 'Content-Type' header value
    # TODO: Update the code to fetch the Content-Length header
    print('Content-Type:', response.headers['Content-Type'])
```

You're right on track\! Modifying the code to inspect the `Content-Length` header is a straightforward change that demonstrates your understanding of how to work with HTTP headers.

All you need to do is change the key accessed in the `response.headers` dictionary from `'Content-Type'` to `'Content-Length'`.

Here's the completed code:

```python
import requests

# Making an HTTP GET request to fetch quotes
response = requests.get('http://quotes.toscrape.com')

# Only execute the following if the request was successful
if response.ok:
    # Update the code to fetch the Content-Length header
    print('Content-Length:', response.headers['Content-Length'])
```

This updated code will now output the size of the response body in bytes. This simple modification shows that you can easily access and utilize various pieces of metadata sent by the web server, which is a key skill for web development and scraping.

## Inspecting the Content-Type Header in Response

Great job understanding how to check response headers! Now, let's put that knowledge into practice. You'll need to make a request to a website and check a specific type of header. Can you figure out how to retrieve the 'Content-Type' from the response headers by yourself?

```python
import requests

response = requests.get('http://quotes.toscrape.com')
if response.ok:
    # TODO: Retrieve and print the 'Content-Type' header from the response
```

Excellent\! This is a perfect way to test your ability to apply what you've learned. The `requests` library makes it simple to access any header you need.

To retrieve and print the **`Content-Type`** header, you'll access the `headers` attribute of the `response` object, which behaves like a dictionary.

Here is the completed code:

```python
import requests

response = requests.get('http://quotes.toscrape.com')
if response.ok:
    # Retrieve and print the 'Content-Type' header from the response
    print(response.headers['Content-Type'])
```

This code will correctly print the MIME type of the content, which for a standard webpage like this, will be `text/html`. This is a fundamental skill for understanding and processing data you receive from web requests.

## Inspecting Web Response Headers in Python

Wrap up your journey on inspecting response headers by orchestrating a complete web scraping attempt. Use the skills you've honed to make a request to http://quotes.toscrape.com, check if the response is successful, and fetch a specific header value. Remember, headers offer valuable insights into the behavior and expectations of web servers.

```python

import requests

# TODO: Make a GET request to the website (http://quotes.toscrape.com) and store the response

# TODO: Check if the request was successful (use the 'ok' property)

# TODO: Print the 'Content-Type' header from the response
```

```python
import requests

# Make a GET request to the website
response = requests.get('http://quotes.toscrape.com')

# Check if the request was successful
if response.ok:
    print("Request was successful!")
    
    # Print the 'Content-Type' header from the response
    print(f"Content-Type: {response.headers.get('Content-Type')}")
else:
    print("Request failed with status code:", response.status_code)

```