> Part of a series on auto-updating websites using GitHub Actions and GitHub Pages

# Air Quality Updater: Complete dataset copier

In this section, we are going to download the [AQI data of major cities from IQAir](https://www.iqair.com/us/world-air-quality-ranking) and save it as a CSV file.

The URL is 'https://www.iqair.com/us/thailand/chiang-mai'.

This approach is useful if you are looking to **directly copy a full dataset from the web** and use it to update a page or graphic. The alternate would be saving historical data over time, which I'll cover in another video.


In [1]:
import requests
from bs4 import BeautifulSoup
import pandas as pd
from datetime import datetime
import os
from io import StringIO
from datetime import datetime

In [2]:
# Get the current date
current_date = datetime.now().strftime('%Y-%m-%d')
print(f"Current Date: {current_date}")

Current Date: 2024-06-05


In [3]:

# Fetch AQI data from the website
url = 'https://www.iqair.com/us/thailand/chiang-mai'

try:
    response = requests.get(url)
    response.raise_for_status()  # Check for request errors
except requests.exceptions.RequestException as e:
    print(f"Error fetching data: {e}")
    exit()


In [4]:
# Parse the HTML content
soup = BeautifulSoup(response.content, 'html.parser')

In [5]:
# Extract tables using StringIO to avoid the FutureWarning
tables = pd.read_html(StringIO(str(soup)))

In [6]:
# Check if the desired table is in the response
if len(tables) > 3:
    df = tables[3]  # Assuming the required table is at index 3
else:
    print("Error: Expected table not found.")
    exit()

In [7]:
# Add the current date to the dataframe
df['date_pulled'] = current_date


In [8]:
# Clean the AQI column to retain only the number
if 'Air quality index' in df.columns:
    df['Air quality index'] = df['Air quality index'].str.extract(r'(\d+)').astype(int)  # Use raw string

In [9]:
# Reorder the columns to make 'date_pulled' the first column
first_column = df.pop('date_pulled')
df.insert(0, 'date_pulled', first_column)

In [10]:
# File path for the CSV file
file_path = 'aqi_data.csv'

In [11]:
# Check if the file exists
if os.path.exists(file_path):
    # Read the existing data from the file
    existing_df = pd.read_csv(file_path)
    # Append the new data to the existing data
    updated_df = pd.concat([existing_df, df])
else:
    # If the file does not exist, use the new data as the initial dataframe
    updated_df = df

In [12]:
# Save the updated dataframe to the CSV file
updated_df.to_csv(file_path, index=False)

In [13]:
# Display the dataframe
print(updated_df)

  date_pulled Air pollution level  Air quality index Main pollutant
0  2024-05-31            Moderate                 98          PM2.5
1  2024-05-30            Moderate                 98          PM2.5
2  2024-05-31            Moderate                 94          PM2.5
3  2024-06-01            Moderate                 85          PM2.5
4  2024-06-02                Good                 50          PM2.5
5  2024-06-03                Good                 10          PM2.5
6  2024-06-04            Moderate                 53          PM2.5
0  2024-06-05            Moderate                 70          PM2.5
