## Air Quality Simulation Validation Analysis

This Jupyter notebook presents a step-by-step analysis to validate an air quality simulation against empirical data collected from field readings. The simulation predicts different air quality levels (AQI) based on distances from a control burn source, while the field readings provide actual AQI values at various distances.


In [2]:
%pip install -r requirements.txt

Collecting openpyxl (from -r requirements.txt (line 2))
  Using cached openpyxl-3.1.2-py2.py3-none-any.whl (249 kB)
Collecting et-xmlfile (from openpyxl->-r requirements.txt (line 2))
  Using cached et_xmlfile-1.1.0-py3-none-any.whl (4.7 kB)
Installing collected packages: et-xmlfile, openpyxl
Successfully installed et-xmlfile-1.1.0 openpyxl-3.1.2
Note: you may need to restart the kernel to use updated packages.


### Importing the Field Data

In [3]:
import pandas as pd

# Load the data from the provided Excel file
file_path = 'Control Burn AQM Data 27Jan23.xlsx'
data = pd.read_excel(file_path, skiprows=2)

# Splitting and cleaning the data for Site 1
site_1_data = data.iloc[:, 1:9]  # Selecting columns for Site 1
site_1_columns = ['Time', 'Sensor A', 'Sensor B', 'AVG A', 'AVG B', 'Distance', 'Wind Direction', 'Wind Speed']
site_1_data.columns = site_1_columns
site_1_data = site_1_data[site_1_data['Time'] != 'Time']
site_1_data['AVG Reading'] = site_1_data[['AVG A', 'AVG B']].mean(axis=1)
site_1_data['Distance'] = pd.to_numeric(site_1_data['Distance'], errors='coerce')

# Displaying the cleaned data for Site 1
site_1_data.head()

Unnamed: 0,Time,Sensor A,Sensor B,AVG A,AVG B,Distance,Wind Direction,Wind Speed,AVG Reading
0,01:00:00,766.0,821.0,848.0,899.0,100.0,S,5.0,873.5
1,01:07:00,416.0,476.0,482.0,477.0,150.0,S,5.0,479.5
2,01:11:00,251.0,252.0,275.0,265.0,200.0,S,5.0,270.0
3,01:20:00,338.0,302.0,235.0,224.0,200.0,,,229.5
4,,,,,,,,,


In [4]:
# Calculating the correlation coefficient for Site 1
correlation_site_1 = site_1_data['AVG Reading'].corr(site_1_data['Distance'])
correlation_site_1

-0.4747061412905016

We note a **negative** correlation between distance and AQI, which is expected as the smoke disperses and the air quality generally gets better further away from the fire source.

### Comparing Simulation to Actual

We note that our simulation can be retrieved from:

https://weather.gfc.state.ga.us/googlevsmoke/cgi-bin/runvsmoke.py?lat=30.176749&lon=-97.870600&acres=0.3&erate=0.6818181818181818&hrate=0.739949494949495&mix=2000&wspd=5&wdir=0.1&stclass=2&frise=-0.50

which has the parameters defined in the URL.

Measuring the simulation linearly to correspond to our air sensors puts the hazardous region at approximately 0-120ft, very unhealthy at 120-210ft, unhealthy at 470-700ft, and hazardous for sensitive groups at 700-1200ft.

In [7]:
# Defining AQI categories and assigning them to the readings
aqi_categories = {
    'Hazardous': (301, 500, 526),
    'Very Unhealthy': (201, 300, 351),
    'Unhealthy': (151, 200, 138),
    'Unhealthy for Sensitive Groups': (101, 150, 88),
    'Moderate': (51, 100, 38),
    'Good': (0, 50, 12)
}

def assign_aqi_category(pm_value):
    for category, (aqi_low, aqi_high, pm_high) in aqi_categories.items():
        if pm_value <= pm_high:
            return category
    return 'Beyond Index'

site_1_data['AQI Category'] = site_1_data['AVG Reading'].apply(assign_aqi_category)

# Displaying the data with assigned AQI categories
site_1_data[['Distance', 'AVG Reading', 'AQI Category']].head()
""")

# Cell 5: Validating Simulation Accuracy
cell5 = nbf.v4.new_code_cell("""
# Comparing the AQI categories with the simulation distances
distance_categories = {
    'Hazardous': 120,
    'Very Unhealthy': 210,
    'Unhealthy': 470,
    'Unhealthy for Sensitive Groups': 700,
    'Moderate': 1200,
}

def does_reading_match_distance(aqi_category, distance):
    if pd.isnull(distance):
        return 'Unknown'
    sim_distance = distance_categories.get(aqi_category, float('inf'))
    return distance <= sim_distance

site_1_data['Matches Simulation'] = site_1_data.apply(
    lambda row: does_reading_match_distance(row['AQI Category'], row['Distance']), axis=1)

match_counts = site_1_data['Matches Simulation'].value_counts()
accuracy_percentage = (match_counts[True] / (match_counts[True] + match_counts[False])) * 100

accuracy_percentage


72.72727272727273

In [6]:
# Calculating how far off the simulation is for mismatched cases
def calculate_distance_difference(aqi_category, actual_distance):
    # If the actual distance or AQI category is unknown, return None
    if pd.isnull(actual_distance) or aqi_category == 'Beyond Index' or aqi_category == 'Unknown':
        return None
    # Get the simulation boundary distance for the AQI category
    sim_distance = distance_categories.get(aqi_category, 0)
    # Calculate the difference
    return actual_distance - sim_distance

mismatched_cases = site_1_data[site_1_data['Matches Simulation'] == False]
mismatched_cases['Distance Difference'] = mismatched_cases.apply(
    lambda row: calculate_distance_difference(row['AQI Category'], row['Distance']), axis=1)

mismatched_cases[['Distance', 'AVG Reading', 'AQI Category', 'Matches Simulation', 'Distance Difference']]


A value is trying to be set on a copy of a slice from a DataFrame.
Try using .loc[row_indexer,col_indexer] = value instead

See the caveats in the documentation: https://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy
  mismatched_cases['Distance Difference'] = mismatched_cases.apply(


Unnamed: 0,Distance,AVG Reading,AQI Category,Matches Simulation,Distance Difference
1,150.0,479.5,Hazardous,False,30.0
2,200.0,270.0,Hazardous,False,80.0
3,200.0,229.5,Hazardous,False,80.0


Our simulation only gets around 72% of air quality readings correct, with only a maximum deviation of 80ft off in the readings, which is below the size of any subgroup in the data, meaning we are at most a single subgroup off in our predictions.

#### Request for Contribution

Any other contributions or validations are welcome! 
Please commit them to this same validation folder.