# Task 1: Data Collection

## Step 1: Import Required Libraries
We use `requests` to fetch webpages, `BeautifulSoup` to parse HTML, and `pandas` to store data.

In [12]:
import requests
from bs4 import BeautifulSoup
import pandas as pd

## Step 2: Define URLs and Loop Over Pages
We loop through each year and page to collect all property listings.

In [13]:
base_url = "http://mlg.ucd.ie/modules/python/assign1/property/"
all_data = []

pages_per_year = {
    2021: 17,
    2022: 16,
    2023: 18,
    2024: 23
}

for year, num_pages in pages_per_year.items():
    for i in range(1, num_pages + 1):
        page_num = f"0{i}" if i < 10 else str(i)
        page = f"{year}-page{page_num}.html"
        url = base_url + page
        print(f"Scraping {url}")
        
        response = requests.get(url)
        if response.status_code != 200:
            print(f"Failed to access {url}")
            continue

        soup = BeautifulSoup(response.text, "html.parser")
        properties = soup.find_all("li")

        for prop in properties:
            sold_span = prop.find("span", class_="sold")
            table = prop.find("table", class_="sale")
            if not sold_span or not table:
                continue

            sold_date = sold_span.text.strip()

            rows = table.find_all("tr")
            data = {"Sold Date": sold_date, "Year": year}

            for row in rows:
                tds = row.find_all("td")
                label = tds[0].text.strip().replace(":", "")
                value = tds[1].text.strip()
                data[label] = value

            all_data.append(data)

Scraping http://mlg.ucd.ie/modules/python/assign1/property/2021-page01.html
Scraping http://mlg.ucd.ie/modules/python/assign1/property/2021-page02.html
Scraping http://mlg.ucd.ie/modules/python/assign1/property/2021-page03.html
Scraping http://mlg.ucd.ie/modules/python/assign1/property/2021-page04.html
Scraping http://mlg.ucd.ie/modules/python/assign1/property/2021-page05.html
Scraping http://mlg.ucd.ie/modules/python/assign1/property/2021-page06.html
Scraping http://mlg.ucd.ie/modules/python/assign1/property/2021-page07.html
Scraping http://mlg.ucd.ie/modules/python/assign1/property/2021-page08.html
Scraping http://mlg.ucd.ie/modules/python/assign1/property/2021-page09.html
Scraping http://mlg.ucd.ie/modules/python/assign1/property/2021-page10.html
Scraping http://mlg.ucd.ie/modules/python/assign1/property/2021-page11.html
Scraping http://mlg.ucd.ie/modules/python/assign1/property/2021-page12.html
Scraping http://mlg.ucd.ie/modules/python/assign1/property/2021-page13.html
Scraping htt

## Step 3: Save Raw Data
We save the collected raw dataset to a CSV file. Cleaning and preprocessing will be done in Task 2.

In [14]:
df = pd.DataFrame(all_data)
df.to_csv("properties.csv", index=False)
print("Saved raw data to properties.csv")

Saved raw data to properties.csv
