# **Assignment - 4**

Ques 1. Write a Python program to scrape all available books from the website 
(https://books.toscrape.com/) Books to Scrape – a live site built for practicing scraping (safe,  
legal, no anti-bot). For each book, extract the following details:  
1. Title  
2. Price  
3. Availability (In stock / Out of stock)  
4. Star Rating (One, Two, Three, Four, Five)  
Store the scraped results into a Pandas DataFrame and export them to a CSV file named books.csv.  
  
(Note: Use the requests library to fetch the HTML page. Use BeautifulSoup to parse and extract
book details and handle pagination so that books from all pages are scraped)


In [4]:
import requests
from bs4 import BeautifulSoup
import pandas as pd

books = []
page = 1

while True:
    url = f"https://books.toscrape.com/catalogue/page-{page}.html"
    response = requests.get(url)
    soup = BeautifulSoup(response.text, "html.parser")

    articles = soup.find_all("article", class_="product_pod")
    if len(articles) == 0:
        break

    for article in articles:
        title = article.h3.a["title"]
        price = article.find("p", class_="price_color").text.replace("Â", "")
        availability = article.find("p", class_="instock availability").text.strip()
        rating = article.p["class"][1]  
        books.append([title, price, availability, rating])

    page += 1

df = pd.DataFrame(books, columns=["Title", "Price", "Availability", "Rating"])
df.to_csv("books.csv", index=False)
print(df)


                                                 Title   Price Availability  \
0                                 A Light in the Attic  £51.77     In stock   
1                                   Tipping the Velvet  £53.74     In stock   
2                                           Soumission  £50.10     In stock   
3                                        Sharp Objects  £47.82     In stock   
4                Sapiens: A Brief History of Humankind  £54.23     In stock   
..                                                 ...     ...          ...   
995  Alice in Wonderland (Alice's Adventures in Won...  £55.53     In stock   
996   Ajin: Demi-Human, Volume 1 (Ajin: Demi-Human #1)  £57.06     In stock   
997  A Spy's Devotion (The Regency Spies of London #1)  £16.97     In stock   
998                1st to Die (Women's Murder Club #1)  £53.98     In stock   
999                 1,000 Places to See Before You Die  £26.08     In stock   

    Rating  
0    Three  
1      One  
2      One  

Ques 2. Write a Python program to scrape the IMDB Top 250 Movies list (https://www.imdb.com/chart/top/) .  
For each movie, extract the following details:  
1. Rank (1–250)  
2. Movie Title  
3. Year of Release  
4. IMDB Rating  
Store the results in a Pandas DataFrame and export it to a CSV file named imdb_top250.csv.  
(Note: Use Selenium/Playwright to scrape the required details from this website)  

In [14]:
import pandas as pd
from selenium import webdriver
from selenium.webdriver.common.by import By
import time

driver = webdriver.Chrome()
driver.get("https://www.imdb.com/chart/top/")
time.sleep(3)

movies = []
rows = driver.find_elements(By.XPATH, '//li[@class="ipc-metadata-list-summary-item"]')

for rank, row in enumerate(rows, start=1):
    try:
        title = row.find_element(By.XPATH, './/h3').text.split('. ')[1]
        year = row.find_element(By.XPATH, './/span[contains(@class,"title-metadata-item")]').text
        rating = row.find_element(By.XPATH, './/span[contains(@class,"ipc-rating-star--rating")]').text
        movies.append([rank, title, year, rating])
    except:
        continue

driver.quit()

df = pd.DataFrame(movies, columns=["Rank", "Movie Title", "Year", "IMDB Rating"])
df.to_csv("imdb_top250.csv", index=False)
print(df)


     Rank               Movie Title  Year IMDB Rating
0       1  The Shawshank Redemption  1994         9.3
1       2             The Godfather  1972         9.2
2       3           The Dark Knight  2008         9.1
3       4     The Godfather Part II  1974         9.0
4       5              12 Angry Men  1957         9.0
..    ...                       ...   ...         ...
245   246        Gangs of Wasseypur  2012         8.2
246   247             Into the Wild  2007         8.0
247   248                  The Help  2011         8.1
248   249                  Drishyam  2015         8.2
249   250               Lost Ladies  2023         8.3

[250 rows x 4 columns]


Q3. Write a Python program to scrape the weather information for top world cities from the given website (https://www.timeanddate.com/weather/) .   
For each city, extract the following details:  
1. City Name  
2. Temperature  
3. Weather Condition (e.g., Clear, Cloudy, Rainy, etc.)  
Store the results in a Pandas DataFrame and export it to a CSV file named weather.csv.  

In [2]:
import requests
from bs4 import BeautifulSoup
import pandas as pd

response = requests.get("https://www.timeanddate.com/weather/")
soup = BeautifulSoup(response.content, 'html.parser')

table = soup.find('table', {'class': 'zebra fw tb-theme'})
print("Table found:", table is not None)

weather = []
if table:
    rows = table.find_all('tr')  
    print("Number of rows found:", len(rows))
    for row in rows:
        cols = row.find_all('td')
        if len(cols) >= 3:  
            city = cols[0].text.strip()
            temp = cols[1].text.strip()
            
            condition_td = cols[2]
            img = condition_td.find('img')
            if img and img.has_attr('alt'):
                condition = img['alt'].strip()
            else:
                condition = condition_td.text.strip()
            
            weather.append([city, temp, condition])

df= pd.DataFrame(weather, columns=['City', 'Temperature', 'Condition'])
print(df)
df.to_csv('weather.csv', index=False)


Table found: True
Number of rows found: 72
           City Temperature                           Condition
0         Accra   रवि 16.49                Broken clouds. Warm.
1   Addis Ababa   रवि 19.49               Passing clouds. Mild.
2      Adelaide   सोम 02.19                  Refreshingly cool.
3       Algiers   रवि 17.49             Scattered clouds. Warm.
4        Almaty   रवि 21.49  Passing clouds. Refreshingly cool.
..          ...         ...                                 ...
66    Kathmandu   रवि 22.34                Partly cloudy. Mild.
67     Kingston   रवि 11.49                                Hot.
68     Kinshasa   रवि 17.49                         Clear. Hot.
69   Kiritimati   सोम 06.49               Passing clouds. Warm.
70      Kolkata   रवि 22.19    Light rain. Partly cloudy. Warm.

[71 rows x 3 columns]
