**Web crawling from www.timeanddate.com to extract holidays in Singapore**

This section of the script scrapes holiday data from the "timeanddate.com" website for singapore for different years, and saves the results in an Excel file.


1.   **Imports Libraries**: It uses requests for making HTTP requests, BeautifulSoup for parsing HTML, pandas for organizing data, and tqdm for showing a progress bar.
2.    **Prepares for Data Collection**: It sets up a dictionary to hold holiday data for each year.
3.   **Scrapes Data**: For each year, it sends a request, extracts the holiday data from the webpage, and adds it to the dictionary.
4.   **Stores and Saves Data**: The collected data is stored in a pandas DataFrame and saved to an Excel file.

In [None]:
pip install requests beautifulsoup4 pandas openpyxl



Extract Holidays in Singapore

In [1]:
import requests
from bs4 import BeautifulSoup
import pandas as pd
from tqdm import tqdm
from datetime import datetime
import itertools

# Base URL template
BASE_URL = "https://www.timeanddate.com/calendar/custom.html?year={year}&country={country}&cols=3&df=1&hol=1&lang=en"

# Set the country Singapore
country_code = "63"

# Set the years to extract 2000-2025
years = list(range(2000, 2026))

# List to store extracted holiday data
holiday_data_sg = []

def format_date(dd_mmm, year):
    try:
        # Convert "1 Jan" to a datetime object
        date_obj = datetime.strptime(f"{dd_mmm} {year}", "%d %b %Y")

        # Convert to required formats
        full_date = date_obj.strftime("%Y-%m-%d")  # YYYY-MM-DD

        return full_date
    except ValueError:
        return None, None  # Handle unexpected formats

# Iterate over each year
for year in tqdm(years, desc="Scraping holidays"):
    url = BASE_URL.format(year=year, country=country_code)
    resp = requests.get(url)

    if resp.status_code == 200:
        soup = BeautifulSoup(resp.text, "html.parser")
        holiday_table = soup.find("table", {"class": "cl1h"})

        if holiday_table:
            for holiday_row in holiday_table.find_all("tr"):
                date_span = holiday_row.find("span", {"class": "co1"})
                name_td = holiday_row.find("a")

                if date_span and name_td:
                    raw_date = date_span.text.strip()
                    full_date = format_date(raw_date, year)  # Format date
                    holiday_name = name_td.text

                    if full_date:
                        # Append row to list
                        holiday_data_sg.append([full_date, holiday_name])

# Convert to DataFrame
df = pd.DataFrame(holiday_data_sg, columns=["Date", "Event"])

# Save to Excel
df.to_excel("singapore_holidays_00_25.xlsx", index=False)

print("Data successfully saved to singapore_holidays.xlsx")

Scraping holidays: 100%|██████████| 26/26 [00:15<00:00,  1.67it/s]


Data successfully saved to singapore_holidays.xlsx


Removing duplicate values

In [2]:
# Load the excel file
file_path = "singapore_holidays_00_25.xlsx"
df = pd.read_excel(file_path)

# Remove duplicate rows
df_cleaned = df.drop_duplicates()

# Save the cleaned data back to an Excel file
cleaned_file_path = "singapore_holidays_00_25.xlsx"
df_cleaned.to_excel(cleaned_file_path, index=False)

print("Duplicates removed. Cleaned data saved to:", cleaned_file_path)

Duplicates removed. Cleaned data saved to: singapore_holidays_00_25.xlsx


Calculate number of holidays in each month

In [3]:
# Load the Excel file
file_path = "singapore_holidays_00_25.xlsx"
df = pd.read_excel(file_path)

# 'Date' column to string format
df["Date"] = df["Date"].astype(str)

# Extract Year and Month from Date
df["Year"] = df["Date"].str[:4]  # First 4 characters (YYYY)
df["Month"] = df["Date"].str[5:7]  # Characters 6-7 (MM)

# Count number of holidays per (Year, Month)
holiday_counts = df.groupby(["Year", "Month"]).size().reset_index(name="Total Holidays")

# Create a complete Year-Month grid
all_years = df["Year"].unique()
all_months = [f"{m:02d}" for m in range(1, 13)]

full_index = pd.DataFrame(itertools.product(all_years, all_months), columns=["Year", "Month"])

final_df = full_index.merge(holiday_counts, on=["Year", "Month"], how="left").fillna(0)

# 'Total Holidays' integer type
final_df["Total Holidays"] = final_df["Total Holidays"].astype(int)

# Save the transformed data to a new Excel file
output_path = "singapore_holidays_00_25_month.xlsx"
final_df.to_excel(output_path, index=False)

print(f"Modified file saved as: {output_path}")

Modified file saved as: singapore_holidays_00_25_month.xlsx
