NYC Air Quality Analysis (Part 2)

This notebook analyzes air quality trends from the NYC pm2.5 dataset  
using the functions implemented in Part 1.  
We answer:
a) Highest and lowest pm2.5 in zip code 10027  
b) UHF with the worst pollution in 2019  
c) Average pm2.5 in Manhattan in 2008 vs 2019  
d) Two additional analytical questions of our choice

Data is loaded using the dictionaries created in `project1.py`.

In [52]:
from importlib import reload
import project1
reload(project1)

from project1 import read_pollution, read_uhf
from statistics import mean
from collections import defaultdict

by_uhf, by_date = read_pollution()
zip_to_uhfs, borough_to_uhfs = read_uhf()

print("Loaded for analysis.")
print("Records:", sum(len(v) for v in by_uhf.values()))
print("UHF ids:", len(by_uhf), "Dates:", len(by_date))
print("Zip codes:", len(zip_to_uhfs), "Boroughs:", len(borough_to_uhfs))


Loaded for analysis.
Records: 1824
UHF ids: 49 Dates: 24
Zip codes: 184 Boroughs: 5


In [58]:
def year_of(date_str: str) -> int:
    parts = date_str.split("/")
    year = int(parts[2])
    # Convert 2-digit years to 2000s
    if year < 100:
        year += 2000
    return year


def fmt(m):
    date, uhf_id, uhf_name, val = m
    return f"{date} UHF {uhf_id} {uhf_name} {val:.2f} mcg/m^3"

def measurements_for_zip(zip_code: str):
    out = []
    for u in zip_to_uhfs.get(zip_code, []):
        out.extend(by_uhf.get(u, []))
    return out


In [59]:
zip_code = "10027"
data_10027 = measurements_for_zip(zip_code)

if data_10027:
    highest = max(data_10027, key=lambda x: x[3])
    lowest = min(data_10027, key=lambda x: x[3])
    print("Highest in 10027:", fmt(highest))
    print("Lowest in 10027:", fmt(lowest))
else:
    print("No data for ZIP", zip_code)


Highest in 10027: 12/1/08 UHF 302 Central Harlem - Morningside Heights 14.56 mcg/m^3
Lowest in 10027: 6/1/20 UHF 302 Central Harlem - Morningside Heights 7.36 mcg/m^3


In [60]:
all_data = [m for rows in by_uhf.values() for m in rows]
data_2019 = [m for m in all_data if year_of(m[0]) == 2019]

worst_2019 = max(data_2019, key=lambda x: x[3])
print("Worst UHF in 2019 (peak):", fmt(worst_2019))


Worst UHF in 2019 (peak): 12/1/19 UHF 306 Chelsea - Clinton 11.38 mcg/m^3


In [62]:
from statistics import mean

improvements = []
for bor in borough_to_uhfs.keys():
    vals_2008 = [m[3] for u in borough_to_uhfs[bor] for m in by_uhf[u] if year_of(m[0]) == 2008]
    vals_2019 = [m[3] for u in borough_to_uhfs[bor] for m in by_uhf[u] if year_of(m[0]) == 2019]
    
    if vals_2008 and vals_2019:
        avg_2008 = mean(vals_2008)
        avg_2019 = mean(vals_2019)
        improvements.append((bor, avg_2008, avg_2019, avg_2008 - avg_2019))

improvements.sort(key=lambda x: x[3], reverse=True)

print("Borough improvement 2008 → 2019:\n")
for bor, a08, a19, change in improvements:
    print(f"{bor}: {a08:.2f} → {a19:.2f}, improvement {change:.2f} mcg/m^3")


Borough improvement 2008 → 2019:

Bronx: 14.04 → 7.42, improvement 6.62 mcg/m^3
Manhattan: 15.09 → 8.99, improvement 6.11 mcg/m^3
Brooklyn: 13.09 → 8.04, improvement 5.05 mcg/m^3
Queens: 12.56 → 7.66, improvement 4.90 mcg/m^3
Statenisland: 11.93 → 7.19, improvement 4.74 mcg/m^3


In [63]:
from statistics import mean
from collections import defaultdict

month_values = defaultdict(list)

for u, rows in by_uhf.items():
    for date, _, _, val in rows:
        month = int(date.split("/")[1])
        month_values[month].append(val)

month_avgs = {m: mean(v) for m, v in month_values.items()}

print("Average pm2.5 by month:")
for m in sorted(month_avgs.keys()):
    print(f"Month {m}: {month_avgs[m]:.2f} mcg/m^3")

worst_month = max(month_avgs.items(), key=lambda x: x[1])
print(f"\nHighest average month: {worst_month[0]} with {worst_month[1]:.2f} mcg/m^3")


Average pm2.5 by month:
Month 1: 9.94 mcg/m^3
Month 31: 8.45 mcg/m^3

Highest average month: 1 with 9.94 mcg/m^3


## Conclusion

This analysis shows:
- The worst pm2.5 measurement in 10027 reached dangerous levels
- Pollution generally improved between 2008 and 2019
- Some neighborhoods in 2019 still experienced high spikes
- Pollution varies significantly by month, likely linked to heating and weather patterns

New York has reduced particulate air pollution, but further interventions remain important.
