# Beginner Level: Basics of Web Scraping
## 1. Introduction to Web Scraping
**What is Web Scraping?:** The process of extracting data from websites.

**Use Cases:** Data collection for research, price monitoring, lead generation, etc.

**Legal and Ethical Considerations:** Understand the website's robots.txt file, which indicates allowed crawling behavior.

## 2. Tools and Libraries
**Python:** A popular choice for web scraping due to its libraries.

**Libraries:**

**requests:** To send HTTP requests.

**BeautifulSoup:** For parsing HTML and extracting information.

In [None]:
!pip install requests beautifulsoup4





In [None]:
import requests
from bs4 import BeautifulSoup


In [None]:
url = 'http://books.toscrape.com'
response = requests.get(url)


In [None]:
soup = BeautifulSoup(response.text, 'html.parser')


In [None]:
# Find all book titles
titles = soup.find_all('h3')

for title in titles:
    print(title.get_text())


A Light in the ...
Tipping the Velvet
Soumission
Sharp Objects
Sapiens: A Brief History ...
The Requiem Red
The Dirty Little Secrets ...
The Coming Woman: A ...
The Boys in the ...
The Black Maria
Starving Hearts (Triangular Trade ...
Shakespeare's Sonnets
Set Me Free
Scott Pilgrim's Precious Little ...
Rip it Up and ...
Our Band Could Be ...
Olio
Mesaerion: The Best Science ...
Libertarianism for Beginners
It's Only the Himalayas


In [None]:
# Find all prices
prices = soup.find_all('p', class_='price_color')

for price in prices:
    print(price.get_text())


Â£51.77
Â£53.74
Â£50.10
Â£47.82
Â£54.23
Â£22.65
Â£33.34
Â£17.93
Â£22.60
Â£52.15
Â£13.99
Â£20.66
Â£17.46
Â£52.29
Â£35.02
Â£57.25
Â£23.88
Â£37.59
Â£51.33
Â£45.17


In [None]:
base_url = 'http://books.toscrape.com/catalogue/page-{}.html'

for page_num in range(1, 3):  # Scraping the first 2 pages
    url = base_url.format(page_num)
    response = requests.get(url)
    soup = BeautifulSoup(response.text, 'html.parser')

    # Extract titles and prices
    titles = soup.find_all('h3')
    prices = soup.find_all('p', class_='price_color')

    print(f"Page {page_num} Titles:")
    for title in titles:
        print(title.get_text())

    print(f"Page {page_num} Prices:")
    for price in prices:
        print(price.get_text())


Page 1 Titles:
A Light in the ...
Tipping the Velvet
Soumission
Sharp Objects
Sapiens: A Brief History ...
The Requiem Red
The Dirty Little Secrets ...
The Coming Woman: A ...
The Boys in the ...
The Black Maria
Starving Hearts (Triangular Trade ...
Shakespeare's Sonnets
Set Me Free
Scott Pilgrim's Precious Little ...
Rip it Up and ...
Our Band Could Be ...
Olio
Mesaerion: The Best Science ...
Libertarianism for Beginners
It's Only the Himalayas
Page 1 Prices:
Â£51.77
Â£53.74
Â£50.10
Â£47.82
Â£54.23
Â£22.65
Â£33.34
Â£17.93
Â£22.60
Â£52.15
Â£13.99
Â£20.66
Â£17.46
Â£52.29
Â£35.02
Â£57.25
Â£23.88
Â£37.59
Â£51.33
Â£45.17
Page 2 Titles:
In Her Wake
How Music Works
Foolproof Preserving: A Guide ...
Chase Me (Paris Nights ...
Black Dust
Birdsong: A Story in ...
America's Cradle of Quarterbacks: ...
Aladdin and His Wonderful ...
Worlds Elsewhere: Journeys Around ...
Wall and Piece
The Four Agreements: A ...
The Five Love Languages: ...
The Elephant Tree
The Bear and the ...
Sophie's World
Penn

In [None]:
import pandas as pd

data = {
    'Title': [title.get_text() for title in titles],
    'Price': [price.get_text() for price in prices]
}

df = pd.DataFrame(data)
df.to_csv('books.csv', index=False)
