A comprehensive Python web scraper that extracts detailed product information from Amazon product pages using Beautiful Soup and requests. This tool mimics browser behavior to avoid being blocked and provides an intuitive command-line interface for scraping single or multiple products.
- โจ Features
- ๐ Supported Domains
- ๐ Requirements
- ๐ Installation
- ๐ฎ Usage
- ๐ Sample Output
- ๐พ JSON Export
- ๐ง Technical Details
- Single Product Scraping: Extract detailed information from a single Amazon product
- Bulk Product Scraping: Process multiple Amazon URLs in one session
- Comprehensive Data Extraction: Scrapes multiple data points including:
- Product title
- Price information
- Customer ratings and review counts
- Product images (high-resolution URLs)
- Product categories (breadcrumb navigation)
- "About this item" bullet points
- Product URLs
- Smart URL Validation: Validates Amazon URLs across multiple international domains
- Loading Animations: Beautiful spinner animations during scraping operations
- Progress Tracking: Real-time progress indicators for bulk operations
- Error Handling: Robust error handling with detailed feedback
- Data Export: JSON export functionality with timestamps
- Rate Limiting: Built-in delays between requests to respect server resources
- Multi-domain Support: Works with Amazon domains worldwide (US, UK, CA, DE, FR, IT, ES, IN, JP, AU)
Country | Domain | Status |
---|---|---|
United States | amazon.com | โ |
United Kingdom | amazon.co.uk | โ |
Canada | amazon.ca | โ |
Germany | amazon.de | โ |
France | amazon.fr | โ |
Italy | amazon.it | โ |
Spain | amazon.es | โ |
India | amazon.in | โ |
Japan | amazon.co.jp | โ |
Australia | amazon.com.au | โ |
- Python: 3.7 or higher
- Internet Connection: Required for web scraping
- Dependencies: Listed in
requirements.txt
# Clone the repository
git clone https://github.com/KhaledSaeed18/amazon-product-scraper.git
cd amazon-product-scraper
# Create and activate virtual environment
python -m venv venv
# Windows
venv\Scripts\activate
# Mac/Linux
source venv/bin/activate
# Install dependencies
pip install -r requirements.txt
-
Download the Project
# Download as ZIP or clone git clone https://github.com/KhaledSaeed18/amazon-product-scraper.git
-
Create Virtual Environment
cd amazon-product-scraper python -m venv venv
-
Activate Virtual Environment
# Windows Command Prompt venv\Scripts\activate.bat # Windows PowerShell venv\Scripts\Activate.ps1 # Mac/Linux source venv/bin/activate
-
Install Dependencies
pip install -r requirements.txt
python scraper.py
The scraper presents an interactive menu with two options:
๐ Amazon Product Scraper - Configuration
=============================================
๐ Choose scraping mode:
1๏ธโฃ Single Product URL
2๏ธโฃ Multiple Product URLs
Select option (1 or 2):
- Select option
1
for single product scraping - Enter a valid Amazon product URL
- Wait for the scraping process to complete
- View the detailed product information
- Optionally save results to JSON file
Supported URL formats:
https://www.amazon.com/dp/B08N5WRWNW
https://amazon.co.uk/gp/product/B08N5WRWNW
https://www.amazon.de/dp/B08N5WRWNW
https://amazon.com/Some-Product-Name/dp/B08N5WRWNW
- Select option
2
for bulk scraping - Enter multiple Amazon product URLs (one per line)
- Press Enter twice when finished entering URLs
- Monitor progress as each product is scraped
- View comprehensive summary of all products
- Optionally save all results to JSON file
Tips for Multiple Products:
- Add 2-3 second delays between requests (built-in)
- Maximum recommended: 50 products per session
- URLs are validated before processing
โ
Product details fetched successfully!
========================================
๐ Title: Echo Dot (4th Gen) | Smart speaker with Alexa | Charcoal
๐ฐ Price: $49.99
๐ Category: Electronics โบ Smart Home โบ Smart Speakers
โญ Rating: 4.7/5 (125,432 ratings)
๐ผ๏ธ Image: https://m.media-amazon.com/images/I/714Rd3c42AL._AC_SL1500_.jpg
๐ URL: https://www.amazon.com/dp/B07FZ8S74R
๐ About this item:
1. Meet Echo Dot - Our most popular smart speaker with a fabric
design. It is our most compact smart speaker that fits
perfectly into small spaces.
2. Improved speaker quality - Better speaker quality than Echo Dot
Gen 2 for richer and louder sound. Pair with a second Echo Dot
for stereo sound.
==================================================
๐ฏ SCRAPING SUMMARY
==================================================
โ
Successfully scraped: 3/3 products
๐ฆ PRODUCT 1
--------------------
๐ Title: Echo Dot (4th Gen) | Smart speaker with Alexa | Charcoal
๐ฐ Price: $49.99
โญ Rating: 4.7/5 (125,432 ratings)
๐ Category: Electronics โบ Smart Home โบ Smart Speakers
๐ URL: https://www.amazon.com/dp/B07FZ8S74R
When saving data to JSON, the file includes comprehensive metadata:
{
"scraping_info": {
"timestamp": "2025-07-04T10:30:00.000000",
"mode": "single_product",
"total_products": 1,
"successful_scrapes": 1
},
"products": [
{
"url": "https://www.amazon.com/dp/B07FZ8S74R",
"scraped_successfully": true,
"product_data": {
"title": "Echo Dot (4th Gen) | Smart speaker with Alexa | Charcoal",
"price": "$49.99",
"rating": "4.7",
"num_ratings": "125,432",
"image_url": "https://m.media-amazon.com/images/I/714Rd3c42AL._AC_SL1500_.jpg",
"about_item": [
"Meet Echo Dot - Our most popular smart speaker...",
"Improved speaker quality - Better speaker quality..."
],
"breadcrumbs": [
"Electronics",
"Smart Home",
"Smart Speakers"
]
}
}
]
}
- User-Agent Spoofing: Mimics Chrome browser to avoid detection
- Request Headers: Includes proper Accept-Language headers
- HTML Parsing: Uses lxml parser for optimal performance
- Element Targeting: Uses specific CSS selectors and IDs for reliable data extraction
- URL Validation: Comprehensive validation for Amazon URLs
- Network Errors: Graceful handling of connection issues
- Missing Elements: Safe extraction with fallback values
The scraper extracts the following information:
Data Point | CSS Selector/Method | Fallback |
---|---|---|
Product Title | span#productTitle |
N/A |
Price | span.a-price |
"Not available" |
Rating | i.a-icon-star elements |
"Not available" |
Review Count | span#acrCustomerReviewText |
"Not available" |
Product Image | img#landingImage |
"Not available" |
Categories | Breadcrumb navigation | Empty array |
Features | "About this item" bullets | Empty array |
Happy Scraping! ๐