# Unit 3 Scraping HTML Lists with Beautiful Soup

# Web Scraping HTML Lists

Welcome\! In this lesson, we will dive into the world of web scraping, specifically focusing on scraping **HTML lists**. Let's start with a brief introduction to HTML lists and their significance in web scraping.

-----

## HTML Lists Overview

HTML lists are used to display a series of items in a structured manner. Broadly, there are two types of lists:

  * **Ordered Lists (`<ol>`)**: These lists are numbered (e.g., 1, 2, 3).
  * **Unordered Lists (`<ul>`)**: These lists are bulleted (e.g., •, •, •).

Each item in these lists is enclosed within `<li>` tags. Lists are commonly found on web pages in forms like navigation menus, product listings, etc., making them ideal targets for web scraping.

Here is an example of an ordered list:

```html
<ol>
    <li>Item 1</li>
    <li>Item 2</li>
    <li>Item 3</li>
</ol>
```

-----

## Loading Libraries and Fetching the Webpage

We start by importing the required libraries and fetching the HTML content of the webpage.

```python
from bs4 import BeautifulSoup
import requests

url = "https://books.toscrape.com/"
response = requests.get(url)
soup = BeautifulSoup(response.text, "html.parser")
```

Next, we use a CSS selector to identify the specific list containing the books: `soup.select(".page_inner section ol li")`. This selects all `<li>` elements that are descendants of `.page_inner section ol`.

With that, we can loop through the selected items and extract the book titles:

```python
books_ordered_list = soup.select(".page_inner section ol li")

for book in books_ordered_list:
    title = book.select("article h3 a")[0]["title"]
    print(title)
```

**Explanation of the code:**

  * `book.select("article h3 a")[0]`: Selects the `<a>` tag inside the `<h3>` of the `<article>` tag. Note that `select` returns a list, so we use `[0]` to access the first element.
  * `book.select("article h3 a")[0]["title"]`: Extracts the `title` attribute of the `<a>` tag.
  * `print(title)`: Prints the extracted book title.

The output will display the titles of the books listed on the webpage:

```
A Light in the Attic
Tipping the Velvet
Soumission
Sharp Objects
Sapiens: A Brief History of Humankind
The Requiem Red
The Dirty Little Secrets of Getting Your Dream Job
...
```

-----

## Summary

In this lesson on HTML lists, we explored the basics of HTML lists and their significance in web scraping. We also learned how to fetch a webpage, identify specific lists using CSS selectors, and extract information from the selected list items. This knowledge will be invaluable as we proceed with more advanced web scraping techniques.

Now, let's put this knowledge into practice with some hands-on exercises\!

## Run Web Scraping Code

Nice work on understanding the basics of HTML lists!

Let's run the code you saw in the lesson to see how it works in real-time.

We will be fetching and parsing an HTML page to extract book titles using the requests library for fetching the webpage content and BeautifulSoup for parsing the HTML. This task will help you see how to utilize CSS selectors to navigate through an HTML structure and extract specific information.

Here's the code that fetches and prints the titles of books listed on the "Books to Scrape" website:

```python
from bs4 import BeautifulSoup
import requests

url = "https://books.toscrape.com/"
response = requests.get(url)
soup = BeautifulSoup(response.text, "html.parser")

books_ordered_list = soup.select(".page_inner section ol li")

for book in books_ordered_list:
    title = book.select("article h3 a")[0]["title"]
    print(title)

```

## Enhance Web Scraping Skills

Great job so far! You've learned how to extract book titles from an HTML list.

Now, let's enhance our scraping script to include the price of each book.

```python
from bs4 import BeautifulSoup
import requests

url = "https://books.toscrape.com/"
response = requests.get(url)
soup = BeautifulSoup(response.text, "html.parser")

books_ordered_list = soup.select(".page_inner section ol li")

for book in books_ordered_list:
    title = book.select("article h3 a")[0]["title"]
    print(title)

    # TODO: Extract and print the price of each book alongside its title.
    # Hint: The price is located inside the .product_price .price_color class within each book article.

```

```python
from bs4 import BeautifulSoup
import requests

url = "https://books.toscrape.com/"
response = requests.get(url)
soup = BeautifulSoup(response.text, "html.parser")

books_ordered_list = soup.select(".page_inner section ol li")

for book in books_ordered_list:
    title = book.select("article h3 a")[0]["title"]
    # TODO: Extract and print the price of each book alongside its title.
    # Hint: The price is located inside the .product_price .price_color class within each book article.
    price = book.select_one(".product_price .price_color").text
    print(f"Title: {title}, Price: {price}")
```

## Add Book Availability Status

## Complete the Webpage Parsing Task

## Scrape Book Titles Efficiently

## Scrape Book Titles with Robustness