# Unit 2

# Welcome to today's lesson: Handling HTTP Status Codes with Python's requests library.

HTTP status codes are fundamental to understanding the response from a web server, and they play an important role when we request data from the web. Whether performing an API call or implementing a web scraper, correct handling of status codes ensures the resilience and stability of your code. By the end of this lesson, you will have a firm grasp of what HTTP status codes are, and how to handle them using Python's requests module.

-----

## HTTP Status Codes: An Overview

HTTP status codes are three-digit numbers that the server sends back to the client (your script, in this case) to indicate the outcome of the data retrieval process. All HTTP responses are categorized into five classes:

  - **1xx:** Informational
  - **2xx:** Success — the most common being **200 OK**
  - **3xx:** Redirection e.g., **301 Moved Permanently**
  - **4xx:** Client errors — e.g., **404 Not Found** and **403 Forbidden**
  - **5xx:** Server errors e.g., **500 Internal Server Error**

Though there are many HTTP status codes, here are some common ones that you might come across when scraping the web:

  - **200 OK:** The request was successful, and the server returned the requested resource.
  - **301 Moved Permanently:** The requested URL has moved permanently, and the new URL is provided in the response.
  - **403 Forbidden:** The client doesn't have permission to access the requested URL.
  - **404 Not Found:** The server could not find the requested URL.
  - **500 Internal Server Error:** The server encountered an internal error and was unable to complete the request.

Understanding and handling these status codes when we write our scraping code will allow us to create more robust and effective web scraping solutions.

-----

## Python's `requests` and Status Codes

In Python, we can use the popular `requests` library to send HTTP requests. Upon receiving a response from the server, `requests` provides us with a `Response` object, which contains the server's response to our request.

One attribute of the `Response` object is `status_code`, which allows us to examine the HTTP status code that the server returned. If the server successfully processed our request, the `status_code` will be `200`. If the resource we requested wasn't found on the server, then the `status_code` will be `404`.

Let's look at how we can make a `GET` request to a server and print the status code of the response:

```python
import requests
response = requests.get('http://example.com')
print(response.status_code)
```

This will print `200`, meaning that our request was successful.

In the example provided in the task, the code is checking whether the status code of the HTTP response is 404. It then prints an appropriate message based on the result:

```python
import requests
# Attempt to fetch webpage content
url = 'http://quotes.toscrape.com/invalid'
response = requests.get(url)
if response.status_code == 404:
    print("The requested page was not found.")
else:
    print("Content fetched successfully!")
```

The output of the above code will be:

```
The requested page was not found.
```

This output demonstrates how we can handle different HTTP status codes to interpret the server's response more effectively. It allows us to execute conditional code based on the outcome of our HTTP request, making our applications more robust and user-friendly.

Now, let's break down the code and understand it in detail. The `requests.get(url)` function sends a `GET` request to the specified URL. The server will then send back a response, which is stored in the `response` variable.

The `if response.status_code == 404:` line checks to see if the status code in the HTTP response is 404, which signifies that the requested resource was not found on the server.

If the status code is indeed 404, then the code block under the `if` statement will be executed, and the message "The requested page was not found." will be printed.

However, if the status code is anything other than 404, the code block under the `else` statement will be executed, and the message "Content fetched successfully\!" will be printed.

-----

## Setting Timeouts for Requests

When making HTTP requests, it's crucial to set timeouts to prevent your application from hanging indefinitely if the server takes too long to respond. The `requests` library allows you to specify a timeout in seconds for your requests. If the server does not respond within the specified timeout period, a `requests.exceptions.Timeout` exception is raised.

Here's how you can set a timeout for a `GET` request:

```python
import requests
try:
    response = requests.get('http://www.google.com:81/', timeout=4)  # Timeout set to 4 seconds
    print(response.status_code)
except requests.exceptions.Timeout:
    print("The request timed out.")
```

In this example, we set a timeout of 4 seconds. If the server does not respond within this time, the exception block is executed, and "The request timed out." is printed.

Setting timeouts is particularly useful for web scraping and API requests, where server responsiveness can vary. It ensures that your application remains responsive and can handle situations where the server takes too long to reply.

-----

## Lesson Summary and Practice

Fantastic job\! We have successfully covered the basics of HTTP status codes and how we can handle them using Python's `requests` library. Properly handling these status codes is crucial for ensuring the stability and efficiency of your web scraping code.

Remember, knowledge is perfected through continuous practice. It's now time for us to put our newly gained knowledge into practice\! In the next exercise, you will write your own Python code to fetch HTTP status codes from different web addresses. This will not only put your understanding to test but also make you comfortable with handling HTTP status codes in real-world applications. Happy coding\!

## Navigating the Web: Mastering HTTP Status Codes with Python Requests

Have you ever wondered if your favorite Online Bookstore is up and running without having to check the website manually? The given code does precisely that by making a web request to http://quotes.toscrape.com and then checks if the bookstore page is available. Run the code to witness the magic of web requests in action!

```python
import requests

# Simulate checking the availability of the online bookstore main page
response = requests.get('http://quotes.toscrape.com')

if response.status_code == 200:
    print("The Online Bookstore is available!")
elif response.status_code == 404:
    print("The Online Bookstore page was not found!")
else:
    print("Something went wrong with the Online Bookstore page.")

```

Certainly. Here is the English version of the explanation for the Python code you provided.

---

The code you provided demonstrates how to use the `requests` library in Python to check the availability of a webpage by inspecting the HTTP status code returned by the server.

Let's break down how the code works:

1.  **`import requests`**: This line imports the `requests` library, a very popular and user-friendly tool for making HTTP requests in Python. This library is not part of the standard Python installation, so you'll need to install it first using the command `pip install requests`.

2.  **`response = requests.get('http://quotes.toscrape.com')`**:
    * The `requests.get()` function sends a **`GET` request** to the specified URL (`http://quotes.toscrape.com`).
    * A `GET` request is used to retrieve data from a server (in this case, a webpage).
    * The server responds to this request. The `requests` library captures the server's response and stores it in the `response` object.

3.  **`if response.status_code == 200:`**:
    * Every HTTP response from a server includes an **HTTP status code**, a three-digit number that provides information about the outcome of the request.
    * `response.status_code` accesses this code.
    * A **`200` status code** means "OK" or "Success." This is the most common code and indicates that the request was successful and the server returned the requested data. In this context, it means the webpage is available.

4.  **`elif response.status_code == 404:`**:
    * A **`404` status code** means "Not Found." This is a common error code that indicates the server was reached, but the specific URL requested does not exist.

5.  **`else:`**:
    * This `else` block handles any other possible status codes, such as:
        * `500 Internal Server Error`: A problem occurred on the server.
        * `403 Forbidden`: The server denied access to the page.
        * `301 Moved Permanently`: The page has been moved to a different URL.
    * This is a good practice for handling unexpected errors or responses that may occur.

In short, this code is a simple yet powerful example of how a programmer can interact with web services programmatically, enabling the automation of tasks like website monitoring.

## Check Online Bookstore Availability Using Web Requests

In this exercise, you will tweak our HTTP status check for the online bookstore's quotes page. Instead of confirming that the page is up with a 200 OK, change the code to check if the page cannot be found, using a 404 Not Found status code. Let's ensure our code can handle different types of HTTP responses effectively!

```python
import requests

# Let's pretend we want to check if the online bookstore's quotes page is working
response = requests.get('http://quotes.toscrape.com')
# TODO: Update the code to check for 404 status instead, and remember to change messages accordingly
if response.status_code == 200:
    print("The bookstore's quotes page is up and running!")
else:
    print("Something went wrong. Status code:", response.status_code)

```

```python
import requests

# Let's pretend we want to check if the online bookstore's quotes page is working
response = requests.get('http://quotes.toscrape.com/non-existent-page')

# The code has been updated to check for a 404 status code
if response.status_code == 404:
    print("The requested page was not found!")
else:
    print("The page is available, or something else went wrong. Status code:", response.status_code)
```

### Explanation of the Changes

To check for a **`404 Not Found`** status code, I've made two key changes to the code:

1.  **URL Change**: I modified the URL in `requests.get()` from `'http://quotes.toscrape.com'` to `'http://quotes.toscrape.com/non-existent-page'`. This is a hypothetical change to ensure the request intentionally returns a `404` error, allowing the code to hit the desired conditional block.
2.  **Conditional Check**: I updated the `if` statement to check for `response.status_code == 404`. This ensures the code now specifically looks for a "Not Found" response from the server.
3.  **Output Messages**: I changed the `print` statements to accurately reflect the new logic. The first message now confirms that the page was not found, while the `else` block handles all other status codes, including a successful `200 OK` or any other error.

## Check for Missing Page with HTTP Status Code 404

In this exercise, you will tweak our HTTP status check for the online bookstore's quotes page. Instead of confirming that the page is up with a 200 OK, change the code to check if the page cannot be found, using a 404 Not Found status code. Let's ensure our code can handle different types of HTTP responses effectively!

```python
import requests

# Let's pretend we want to check if the online bookstore's quotes page is working
response = requests.get('http://quotes.toscrape.com/non-existent-page')

# The code has been updated to check for a 404 status code
if response.status_code == 404:
    print("The requested page was not found!")
else:
    print("The page is available, or something else went wrong. Status code:", response.status_code)

```

This Python code snippet effectively checks for a `404 Not Found` HTTP status code, which is a common way to handle missing web pages. The use of `requests.get('http://quotes.toscrape.com/non-existent-page')` correctly targets a URL that is expected to return a `404` status.

The `if/else` block is a clear and simple way to evaluate the response. The message "The requested page was not found!" is appropriate for the `404` case. The `else` block provides a useful catch-all for other status codes, including a `200 OK` (if the page were to exist), or other errors like a `500 Server Error`, while also printing the actual status code for debugging.

This is a good example of how to use the `requests` library to perform basic web checks and handle different outcomes based on the HTTP status code.

## Handling HTTP Status Codes in Online Bookstore Check

Fantastic job exploring HTTP status codes! Now, let's practice handling a different outcome from our online bookstore. Imagine you want to check if a specific resource exists and, depending on the response, take appropriate action. Create the conditional logic based on the HTTP status code returned when making a GET request.

```python
import requests

# Check the status code of the online bookstore home page
response = requests.get('http://quotes.toscrape.com')
if response.status_code == 200:
    print("Online Bookstore is up and running!")
elif # TODO: Determine if the resource doesn't exist
    print("Online Bookstore page not found.")
else:
    print("There was a problem accessing the Online Bookstore.")

```

This exercise is an excellent continuation of the previous one, as it requires you to apply your knowledge of HTTP status codes in a slightly different context.

To complete the conditional logic, you should add an `elif` condition to check for the `404 Not Found` status code. This is the correct way to determine if the requested resource, in this case, a specific page, does not exist on the server.

Here is the completed code:

```python
import requests

# Check the status code of the online bookstore home page
response = requests.get('http://quotes.toscrape.com')
if response.status_code == 200:
    print("Online Bookstore is up and running!")
elif response.status_code == 404:
    print("Online Bookstore page not found.")
else:
    print("There was a problem accessing the Online Bookstore.")
```

This updated code provides a more robust check by specifically handling the two most common HTTP responses you'd expect to see: a successful `200 OK` and a `404 Not Found`. The `else` block serves as a good fallback for any other unexpected issues, like a server error or redirection.

## Check the Online Bookstore Status

Your last mission, Space Explorer, is to check if an online bookstore is open by making a web request to its site. Use Python's requests library to send a GET request to 'http://quotes.toscrape.com'. Based on the HTTP status code returned, you will print messages to indicate whether the bookstore is open or if there was an issue. Remember, a status code of 200 means success, 404 indicates "not found," and any other status means an error occurred. This is your chance to showcase what you've learned about handling HTTP status codes!


```python
import requests

# TODO: Make a GET request to the online bookstore at http://quotes.toscrape.com

# TODO: Check the HTTP status code of the response
# If it's 200, print a message indicating success
# If it's 404, indicate the page was not found
# Otherwise, indicate an error occurred

```

To complete this task, you'll need to use the `requests.get()` function to make the web request and then use conditional logic to check the `status_code` attribute of the response.

Here is the completed code that accomplishes the task:

```python
import requests

# Make a GET request to the online bookstore at http://quotes.toscrape.com
response = requests.get('http://quotes.toscrape.com')

# Check the HTTP status code of the response
if response.status_code == 200:
    print("Success! The online bookstore is open and running.")
elif response.status_code == 404:
    print("Page not found. The online bookstore may be closed or the URL is incorrect.")
else:
    print(f"An error occurred. Status code: {response.status_code}")
```

This solution demonstrates a robust approach to handling different outcomes of a web request. It first attempts to connect to the website, then uses an `if/elif/else` block to provide clear feedback based on the status code, which is a fundamental skill for any developer working with web data.

## Add Error Handling to Web Request Scenario

You're doing great! Now, let's sharpen your skills with another common web request scenario. Modify the given code to add error handling for the web request using a try-except block. Specifically, determine if the request is successful or if it times out, and respond appropriately.

```python
import requests

try:
    # TODO: Send request to http://www.google.com:81/ with timeout of 5 seconds  
    # TODO: Check if the request was successful (status_code 200) and print a message
    # TODO: Handle the case where the status code is not 200 with an appropriate message
except requests.exceptions.Timeout:
    # TODO: Handle the possibility of a timeout with an appropriate message

```

This is a great exercise that introduces a critical concept in web development: handling potential failures gracefully. Network requests can be unpredictable, and using `try-except` blocks is the standard way to prevent your program from crashing due to connection issues or timeouts.

Here is the completed code that correctly handles the web request and the potential `Timeout` exception:

```python
import requests

try:
    # Send request to http://www.google.com:81/ with a timeout of 5 seconds
    # The URL port 81 is intentionally used to simulate a connection error or a slow response
    response = requests.get('http://www.google.com:81/', timeout=5)

    # Check if the request was successful (status_code 200) and print a message
    if response.status_code == 200:
        print("Success! The request was successful.")
    # Handle the case where the status code is not 200 with an appropriate message
    else:
        print(f"The request was not successful. Status code: {response.status_code}")

except requests.exceptions.Timeout:
    # Handle the possibility of a timeout with an appropriate message
    print("The request timed out after 5 seconds.")
except requests.exceptions.RequestException as e:
    # This is a general exception that catches other potential errors like a bad URL or connection issue
    print(f"An error occurred: {e}")

```

### Explanation of the Changes:

1.  **Request with Timeout:** The `requests.get()` function now includes the `timeout` parameter, set to `5` seconds. This tells the library to raise a `requests.exceptions.Timeout` if the server doesn't respond within that time frame.
2.  **General Exception Handling:** An additional `except` block has been added to catch the more general `requests.exceptions.RequestException`. This is a good practice as it will handle other issues, such as `ConnectionError` (e.g., DNS failure, refusing to connect), which might occur with the given URL. This makes the code more robust.
3.  **Specific Messages:** Each `try` and `except` block now has a specific `print()` statement that clearly explains what happened. This is crucial for debugging and providing clear feedback to the user.