 <h1 style="color: #fd7e14; margin-top: 20px;"><i class="fas fa-shopping-cart"></i> Amazon Bestsellers Web Scraper</h1>

<h2 style="color: #0d6efd; margin-top: 20px;"><i class="fas fa-info-circle"></i> Project Summary</h2>
    <p>This project is an Amazon Bestsellers Web Scraper designed to automate the process of logging in, navigating to various bestselling categories, and extracting key information about top products. The extracted data includes details such as product name, price, discount, ratings, shipping information, and more. Data is saved in both CSV and JSON formats for easy analysis and usage</p>

<h2 style="color: #0d6efd; margin-top: 20px;"><i class="fas fa-tasks"></i> My Approach</h2>
    <div style="background-color: #f8f9fa; padding: 15px; border-left: 4px solid #0d6efd; border-radius: 4px;">
        <p><strong>Step 1:</strong> Set up the Selenium WebDriver to control a Chrome browser in headless mode for running on platforms like Google Colab.</p>
        <p><strong>Step 2:</strong> Automate Amazon login to access account-restricted data.</p>
        <p><strong>Step 3:</strong> Navigate to each category page under "Amazon Bestsellers" and scrape details of top products by locating and extracting data elements such as price, discount, ratings, and seller information.</p>
        <p><strong>Step 4:</strong> Save the collected data in CSV and JSON formats for later analysis and reporting.</p>
    </div>

<div style="background-color: #f8f9fa; border-left: 5px solid #0d6efd; padding: 10px; margin: 20px 0;">
    <strong style="color: #0d6efd;">Note:</strong>
    <p>This section highlights important information regarding the project. Ensure to:
    <ul>
        <li>No need to use any additional IDE , you can run the code here only</li>
        <li>Please Change the username and password before running the project</li>
        <li>Amzon may ask for 2FA You have 15 sec to enter the code , however you can increase the time as per your need from below code</li>
    </ul>
    </p>
</div>

<h2 style="color: #198754; margin-top: 20px;"><i class="fas fa-tasks"></i> Step 1: Install Selenium and WebDriver Manager</h2>

In [None]:
!pip install selenium
!pip install webdriver-manager

<h2 style="color: #198754; margin-top: 20px;"><i class="fas fa-tasks"></i> Step 2: Import Required Libraries</h2>

In [4]:
from selenium import webdriver
from webdriver_manager.chrome import ChromeDriverManager
from selenium.webdriver.chrome.service import Service
from selenium.webdriver.common.by import By
import time
import csv
import json

<h2 style="color: #198754; margin-top: 20px;"><i class="fas fa-tasks"></i> Step 3: Setup service and initialize the Chrome browser</h2>

In [5]:
# Setup service and initialize the Chrome browser
service = Service(ChromeDriverManager().install())
web = webdriver.Chrome(service=service)

# Navigate to Amazon's homepage
web.get('https://www.amazon.in/')

# Maximize browser window
web.maximize_window()

# Allow the page to load
time.sleep(2)

<h2 style="color: #198754; margin-top: 20px;"><i class="fas fa-tasks"></i> Step 4: Account authentication</h2>

In [6]:
# Click the account list to log in
web.find_element(By.ID,'nav-link-accountList-nav-line-1').click()

# Locate the email input field and enter email/phone
email_field = web.find_element(By.ID, 'ap_email')
email_field.send_keys('youremail@gmail.com')  # Replace with your email/phone number

# Click the 'Continue' button
web.find_element(By.ID, 'continue').click()

# Allow some time for next page to load
time.sleep(2)

# Locate the password input field and enter the password
password_field = web.find_element(By.ID, 'ap_password')
password_field.send_keys('Password')  # Replace with your Amazon password

# Click the 'Sign-In' button
web.find_element(By.ID, 'signInSubmit').click()

# Wait for the login process to complete
time.sleep(5)

# Verify if logged in successfully
if "Your Account" in web.page_source:
    print("Login successful!")
else:
    print("Login failed, check credentials or 2FA.")
    time.sleep(15)


Login successful!


<h2 style="color: #198754; margin-top: 20px;"><i class="fas fa-tasks"></i>Step 5:Web Scraping</h2>

In [7]:
# List of category URLs
category_urls = [
    "https://www.amazon.in/gp/bestsellers/kitchen/ref=zg_bs_nav_kitchen_0",
    "https://www.amazon.in/gp/bestsellers/shoes/ref=zg_bs_nav_shoes_0",
    "https://www.amazon.in/gp/bestsellers/computers/ref=zg_bs_nav_computers_0",
    "https://www.amazon.in/gp/bestsellers/electronics/ref=zg_bs_nav_electronics_0",
    "https://www.amazon.in/gp/bestsellers/gift-cards/ref=zg_bs_nav_gift-cards_0"
    "https://www.amazon.in/gp/bestsellers/jewelry/ref=zg_bs_nav_jewelry_0",
    "https://www.amazon.in/gp/bestsellers/toys/ref=zg_bs_nav_toys_0"
]

# List to store product data
product_data = []

# Loop through each category URL
for category_url in category_urls:
    web.get(category_url)
    time.sleep(2)  # Wait for the page to load

    # Extracting category name
    category_name = category_url.split('/')[-2].capitalize()

    try:
        products = web.find_elements(By.CLASS_NAME, '_cDEzb_p13n-sc-css-line-clamp-3_g3dy1')  # Product titles
        prices = web.find_elements(By.CLASS_NAME, '_cDEzb_p13n-sc-price_3mJ9Z')  # Product prices
        discounts = web.find_elements(By.XPATH, "//span[@class='_cDEzb_p13n-sc-discount_2m8mP']")  # Sale discounts
        ratings = web.find_elements(By.XPATH, "//i[@class='a-icon a-icon-star-small a-star-small-3-5 aok-align-top']/span")  # Product ratings
        ships_from = web.find_elements(By.XPATH, "//span[contains(text(),'Ships from')]")  # Ship from
        sold_by = web.find_elements(By.XPATH, "//span[contains(text(),'Sold by')]")  # Sold by
        descriptions = web.find_elements(By.CLASS_NAME, '_cDEzb_p13n-sc-css-line-clamp-3_g3dy1')  # Product descriptions
        past_month_bought = web.find_elements(By.XPATH, "//span[contains(text(),'bought in the past month')]")  # Number bought in the past month
        images = web.find_elements(By.XPATH, "//img[@class='s-image']")  # Product images

        # Loop through products and extract details
        for i in range(len(products)):
            product_info = {
                'name': 'N/A',  # Default value
                'price': 'N/A',
                'discount': 'N/A',
                'best_seller_rating': 'N/A',
                'ship_from': 'N/A',
                'sold_by': 'N/A',
                'rating': 'N/A',
                'description': 'N/A',
                'number_bought': 'N/A',
                'category': category_name,
                'images': []
            }

            try:
                product_info['name'] = products[i].text
            except IndexError:
                pass  # Default value remains 'N/A'

            try:
                product_info['price'] = prices[i].text
            except IndexError:
                pass

            try:
                product_info['discount'] = discounts[i].text if i < len(discounts) else "N/A"
            except IndexError:
                pass

            try:
                product_info['best_seller_rating'] = ratings[i].text if i < len(ratings) else "N/A"
            except IndexError:
                pass

            try:
                product_info['ship_from'] = ships_from[i].text if i < len(ships_from) else "N/A"
            except IndexError:
                pass

            try:
                product_info['sold_by'] = sold_by[i].text if i < len(sold_by) else "N/A"
            except IndexError:
                pass

            try:
                product_info['description'] = descriptions[i].text if i < len(descriptions) else "N/A"
            except IndexError:
                pass

            try:
                product_info['number_bought'] = past_month_bought[i].text if i < len(past_month_bought) else "N/A"
            except IndexError:
                pass

            try:
                # Get all available images
                product_info['images'] = [img.get_attribute('src') for img in images]
            except Exception:
                product_info['images'] = []  # If there are no images, keep it empty

            # Append the product information to the list
            product_data.append(product_info)

    except Exception as e:
        print(f"An error occurred while processing category {category_name}: {e}")
        continue  # Move on to the next category


<h2 style="color: #198754; margin-top: 20px;"><i class="fas fa-tasks"></i>Step 6:Save data to CSV and JSON</h2>

In [8]:
# Save the product data to CSV
csv_file_path = 'product_data.csv'
with open(csv_file_path, mode='w', newline='', encoding='utf-8') as csvfile:
    fieldnames = ['name', 'price', 'discount', 'best_seller_rating', 'ship_from', 'sold_by', 'rating', 'description', 'number_bought', 'category', 'images']
    writer = csv.DictWriter(csvfile, fieldnames=fieldnames)

    writer.writeheader()
    for item in product_data:
        writer.writerow(item)

print(f"Product data saved to {csv_file_path}")

# Save the product data to JSON
json_file_path = 'product_data.json'
with open(json_file_path, mode='w', encoding='utf-8') as jsonfile:
    json.dump(product_data, jsonfile, ensure_ascii=False, indent=4)

print(f"Product data saved to {json_file_path}")


Product data saved to product_data.csv
Product data saved to product_data.json


<h2 style="color: #198754; margin-top: 20px;"><i class="fas fa-tasks"></i>Step 7 :Close the browser</h2>

In [None]:
web.quit()