## Amazon Product Price Tracking Using Python

This notebook is a small, practical data collection project built as part of my data analyst portfolio. The goal is simple: extract product information from a live Amazon product page and understand what can go wrong along the way.

Amazon pages are not designed for scraping. The HTML structure is complex, prices are often split across multiple elements, and the page content can change depending on product variations or location. Because of this, even basic fields like price cannot always be extracted using a single fixed selector.

In this notebook, I use Python with `requests` and `BeautifulSoup` to inspect the raw HTML, extract product details such as title and price, and handle cases where the expected elements are missing or structured differently. The focus is on understanding the page structure and writing clear, defensive code rather than forcing a solution that may break easily.

This project reflects real-world data collection, where data sources are often messy and assumptions do not always hold.

In [1]:
# import libraries 

from bs4 import BeautifulSoup
import requests
import time
import datetime
import smtplib

In [2]:
# connecting to the website url and pulling in data

URL = 'https://www.amazon.com/realpeoplegoods-Data-Analyst-Gift-Scientist/dp/B0934L7B2C/ref=sr_1_17?dib=eyJ2IjoiMSJ9.MPm9HIcpsXjGqAHcXiIxey6mJiozpt19oT821OKPkKah-JeIkA6i48IoDpFRM7I53YfuLQIk3Z2CjgA4FETZo6UTcQ-SHvBcAAuk4fYgFIQpMD6wXHTLAsYv6ikm2cs1SFLTP4dd3NEIEYRcg4qA2_XI7ABIOR2RkUTtvClIFb4mhTfrjoSEAhmYzB7rncmcId_kBfAqpc1Ts0c84iSCsuJGMeVqZYRuu8bZmByVdj6SuqdZ8U06LplE620Bd9Y8_od-hs1XE15Nh1UZEkjG8hEnw22bzd81RuALf1_BJm0.K-oz1teUq8dHFd5xEFGd7-M8PSM7fA3GOKS-omuDMb8&dib_tag=se&keywords=data%2Banalyst%2Bshirt&qid=1765881696&sr=8-17&th=1&psc=1'
headers = {"User-Agent": "Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/143.0.0.0 Safari/537.36", "Accept-Encoding":"gzip, deflate", "Accept":"text/html,application/xhtml+xml,application/xml;q=0.9,*/*;q=0.8", "DNT":"1","Connection":"close", "Upgrade-Insecure-Requests":"1"}
page = requests.get(URL, headers=headers)

soup1 = BeautifulSoup(page.content, "html.parser")
soup2 = BeautifulSoup(soup1.prettify(), "html.parser")

title = soup2.find(id='productTitle').get_text()

## price = soup2.find(id='priceblock_ourprice').get_text()
### Amazon no longer provides product prices as a single HTML element. 
### The visible price is composed of multiple nested spans, requiring extraction from a parent container rather than a fixed ID.

price_container = soup2.find("span", class_="a-price")

if price_container:
    symbol = price_container.select_one("span.a-price-symbol").get_text(strip=True)
    whole = price_container.select_one("span.a-price-whole").get_text(strip=True)
    fraction = price_container.select_one("span.a-price-fraction").get_text(strip=True)
    price = f"{symbol}{whole}{fraction}"
else:
    price = None

print(title)
print(price)


                 Data Analyst Gift - Data Analysis Shirt - Data Scientist Present - Funny Data Science - Data Whisperer - Unisex Tee
                
$28.50


### Why `priceblock_ourprice` does not work anymore

Earlier Amazon pages exposed the product price as a single HTML element, which allowed simple extraction like:

```python
price = soup2.find(id='priceblock_ourprice').get_text()
```

On this product page, that ID no longer exists. Amazon now splits the visible price into multiple nested elements and assembles it in the browser. Because of this structural change, there is no single element that contains the full price text in the raw HTML.

### Why the new approach works

The price is wrapped inside a parent container (`span.a-price`) and divided into separate elements for the symbol, whole value, and fractional value. By extracting these parts individually and reconstructing them, the code aligns with how the price is actually stored in the page structure.

This approach is more reliable for modern Amazon pages and reflects a real-world data extraction challenge where values are fragmented rather than stored in a single field.

In [4]:
# cleaning up the data a little bit

price = price.strip()[1:]
title = title.strip()

print(title)
print(price)

Data Analyst Gift - Data Analysis Shirt - Data Scientist Present - Funny Data Science - Data Whisperer - Unisex Tee
28.50


In [5]:
# creating a Timestamp for our output to track when data was collected

today = datetime.date.today()
today

datetime.date(2025, 12, 16)

In [37]:
# creating CSV and writing headers and data into the file

import csv 

header = ['Title', 'Price', 'Date']
data = [title, price, today]

# this block is meant to run only once
# it creates the CSV file and writes the header row
# if the kernel or system restarts, the full notebook should be run again,
# but this block should remain commented to avoid overwriting the file.

#with open('AmazonWebScraperDataset.csv', 'w', newline='', encoding='UTF8') as f:
    #writer = csv.writer(f)
    #writer.writerow(header)
    #writer.writerow(data) 


In [29]:
import pandas as pd

df = pd.read_csv(r'C:\Users\HP\AmazonWebScraperDataset.csv')
df

Unnamed: 0,Title,Price,Date
0,Data Analyst Gift - Data Analysis Shirt - Data...,28.5,2025-12-16
1,Data Analyst Gift - Data Analysis Shirt - Data...,28.5,2025-12-16


In [8]:
# now we are appending data to the csv

with open('AmazonWebScraperDataset.csv', 'a+', newline='', encoding='UTF8') as f:
    writer = csv.writer(f)
    writer.writerow(data)

In [9]:
#combining all of the above code into one function

def check_price():
    URL = 'https://www.amazon.com/realpeoplegoods-Data-Analyst-Gift-Scientist/dp/B0934L7B2C/ref=sr_1_17?dib=eyJ2IjoiMSJ9.MPm9HIcpsXjGqAHcXiIxey6mJiozpt19oT821OKPkKah-JeIkA6i48IoDpFRM7I53YfuLQIk3Z2CjgA4FETZo6UTcQ-SHvBcAAuk4fYgFIQpMD6wXHTLAsYv6ikm2cs1SFLTP4dd3NEIEYRcg4qA2_XI7ABIOR2RkUTtvClIFb4mhTfrjoSEAhmYzB7rncmcId_kBfAqpc1Ts0c84iSCsuJGMeVqZYRuu8bZmByVdj6SuqdZ8U06LplE620Bd9Y8_od-hs1XE15Nh1UZEkjG8hEnw22bzd81RuALf1_BJm0.K-oz1teUq8dHFd5xEFGd7-M8PSM7fA3GOKS-omuDMb8&dib_tag=se&keywords=data%2Banalyst%2Bshirt&qid=1765881696&sr=8-17&th=1&psc=1'
    headers = {"User-Agent": "Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/143.0.0.0 Safari/537.36", "Accept-Encoding":"gzip, deflate", "Accept":"text/html,application/xhtml+xml,application/xml;q=0.9,*/*;q=0.8", "DNT":"1","Connection":"close", "Upgrade-Insecure-Requests":"1"}
    page = requests.get(URL, headers=headers)
    soup1 = BeautifulSoup(page.content, "html.parser")
    soup2 = BeautifulSoup(soup1.prettify(), "html.parser")
    title = soup2.find(id='productTitle').get_text()
    price_container = soup2.find("span", class_="a-price")
    if price_container:
        symbol = price_container.select_one("span.a-price-symbol").get_text(strip=True)
        whole = price_container.select_one("span.a-price-whole").get_text(strip=True)
        fraction = price_container.select_one("span.a-price-fraction").get_text(strip=True)
        price = f"{symbol}{whole}{fraction}"
    else:
        price = None

    price = price.strip()[1:]
    title = title.strip()

    import datetime

    today = datetime.date.today()
    
    import csv 

    header = ['Title', 'Price', 'Date']
    data = [title, price, today]

    with open('AmazonWebScraperDataset.csv', 'a+', newline='', encoding='UTF8') as f:
        writer = csv.writer(f)
        writer.writerow(data)

In [31]:
# it runs 'check_price' after a set time and inputs data into your CSV

while(True):
    check_price()
    time.sleep(5) #this goes by seconds

KeyboardInterrupt: 

In [35]:
import pandas as pd

df = pd.read_csv(r'C:\Users\HP\AmazonWebScraperDataset.csv')
df

Unnamed: 0,Title,Price,Date
0,Data Analyst Gift - Data Analysis Shirt - Data...,28.5,2025-12-16
1,Data Analyst Gift - Data Analysis Shirt - Data...,28.5,2025-12-16
2,Data Analyst Gift - Data Analysis Shirt - Data...,28.5,2025-12-16
3,Data Analyst Gift - Data Analysis Shirt - Data...,28.5,2025-12-16
4,Data Analyst Gift - Data Analysis Shirt - Data...,28.5,2025-12-16
5,Data Analyst Gift - Data Analysis Shirt - Data...,28.5,2025-12-16


In [39]:
# optional feature:
# send yourself an email alert when the product price drops below a target value
# this is just for experimentation and learning purposes

def send_price_alert_mail():
    with smtplib.SMTP_SSL("smtp.gmail.com", 465) as server:
        server.login("towhid9987@gmail.com", "xxxxxxxxxxxxxx")

        subject = "Price Alert: The shirt is now under $15!"
        body = (
            "Towhid,\n\n"
            "The price has dropped below your target.\n"
            "This might be a good time to buy.\n\n"
            "Product link:\n"
            "https://www.amazon.com/realpeoplegoods-Data-Analyst-Gift-Scientist/dp/B0934L7B2C"
        )

        message = f"Subject: {subject}\n\n{body}"

        server.sendmail(
            "towhid9987@gmail.com",
            "towhid9987@gmail.com",
            message
        )