🌬️ Aetheris — Intelligent Air Quality Prediction & Advisory System

Aetheris is a production-grade, end-to-end Machine Learning platform engineered to predict, analyze, and visualize Air Quality Index (AQI) dynamics across 291 Indian cities. By leveraging advanced tree-based ensemble learning and rigorous time-series preprocessing, Aetheris provides high-fidelity pollution forecasts and actionable health advisories.

🚀 Key Features

Zero-Leakage ML Pipeline: Strictly enforced TimeSeriesSplit cross-validation to prevent temporal data leakage and ensure real-world generalizability.
54-Feature Intelligence engine: Dynamic generation of 7-day lags, 30-day rolling averages, cyclical time encodings, and geographic target encodings.
Synthetic Rebalancing (SMOTE): Addressed the massive 160:1 class imbalance between "Satisfactory" and "Severe" days to ensure the model never misses life-threatening pollution spikes.
Live Inference Backend: A high-performance FastAPI server that reconstructs complex feature vectors on-the-fly for real-time LightGBM predictions.
Modern Analytics Dashboard: A sleek, dark-themed React interface with interactive Recharts, city-to-city comparisons, and automated health recommendations.

📊 Performance Leaderboard

After evaluating 9+ algorithms, LightGBM (Gradient Boosting) emerged as the champion across both tracks:

📈 Regression (Exact AQI Value)

Metric	Result	Interpretation
R² Score	0.904	Explains 90% of atmospheric variance
RMSE	20.29	Average error of only ~4% on the full scale
MAE	13.55	Extremely tight precision for categorical mapping

🎯 Classification (Severity Categories)

Metric	Result	Status
Weighted F1	0.875	Robust balance between Precision & Recall
Recall (Severe)	High	Successfully identifies hazardous days via SMOTE

🏗️ System Architecture

graph TD
    A[Kaggle Dataset: 235k Records] --> B[Pipeline: run_02_preprocessing.py]
    B --> C{Feature Engine}
    C --> C1[7d/30d Lags & Rolling]
    C --> C2[Cyclical Time Encodings]
    C --> C3[Location Target Encoding]
    
    B --> D[TimeSeriesSplit & SMOTE]
    D --> E[Model: LightGBM / XGBoost]
    
    E --> F[Serialized best_model.pkl]
    F --> G[FastAPI Production Server]
    
    H[React Frontend] <--> G
    H --> I[Dynamic Dashboard]
    H --> J[City Comparison Tool]
    H --> K[Health Advisory Engine]

🛠️ Installation & Setup

Backend (Python 3.9+)

# 1. Setup environment
python -m venv venv
source venv/bin/activate

# 2. Install dependencies
pip install -r requirements.txt

# 3. Train models
python run_01_eda.py
python run_02_preprocessing.py
python run_03_modeling.py   # Generated best_model.pkl

# 4. Start API
uvicorn src.api:app --reload

Frontend (Node 18+)

cd frontend
npm install
npm run dev

📁 Project Structure

aetheris/
├── src/
│   ├── api/             # FastAPI Inference Engine
│   ├── models/          # Ensemble Definitions & Logic
│   ├── preprocessing/   # Zero-leakage Engineering
│   └── evaluation/      # Time-Series Validation Suites
├── frontend/            # React + Tailwind + Vite
├── data/                # Raw & Processed data stores
├── reports/figures/     # High-res performance plots
├── models/              # Serialized Production Binaries
└── run_*.py             # Orchestration scripts 01-06

📜 Research Reports

For a deep dive into the methodology, exploratory analysis, and mathematical rationale, please refer to the following documents in the root:

Ultimate_Thesis_Aetheris_Report.md (Complete detailed breakdown)
report.md (Quick summary)

Author: Project Aetheris (2025)

Name		Name	Last commit message	Last commit date
Latest commit History 7 Commits
api		api
catboost_info		catboost_info
config		config
data		data
frontend		frontend
models		models
reports		reports
src		src
tests		tests
.gitignore		.gitignore
.vercelignore		.vercelignore
Extended_Word_Report_Content.md		Extended_Word_Report_Content.md
README.md		README.md
Thesis_Detailed_Word_Report_Content.md		Thesis_Detailed_Word_Report_Content.md
Ultimate_Thesis_Aetheris_Report.md		Ultimate_Thesis_Aetheris_Report.md
report.md		report.md
requirements.txt		requirements.txt
run_01_eda.py		run_01_eda.py
run_02_preprocessing.py		run_02_preprocessing.py
run_03_modeling.py		run_03_modeling.py
run_04_timeseries.py		run_04_timeseries.py
run_05_clustering.py		run_05_clustering.py
run_06_evaluation.py		run_06_evaluation.py
setup.py		setup.py
vercel.json		vercel.json

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

🌬️ Aetheris — Intelligent Air Quality Prediction & Advisory System

🚀 Key Features

📊 Performance Leaderboard

📈 Regression (Exact AQI Value)

🎯 Classification (Severity Categories)

🏗️ System Architecture

🛠️ Installation & Setup

Backend (Python 3.9+)

Frontend (Node 18+)

📁 Project Structure

📜 Research Reports

About

Uh oh!

Releases

Packages

Uh oh!

Contributors

Uh oh!

Languages

Folders and files

Latest commit

History

Repository files navigation

🌬️ Aetheris — Intelligent Air Quality Prediction & Advisory System

🚀 Key Features

📊 Performance Leaderboard

📈 Regression (Exact AQI Value)

🎯 Classification (Severity Categories)

🏗️ System Architecture

🛠️ Installation & Setup

Backend (Python 3.9+)

Frontend (Node 18+)

📁 Project Structure

📜 Research Reports

About

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

Packages