Skip to content

vansh007/aetheris

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

7 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

🌬️ Aetheris — Intelligent Air Quality Prediction & Advisory System

FastAPI React Scikit-Learn TailwindCSS

Aetheris is a production-grade, end-to-end Machine Learning platform engineered to predict, analyze, and visualize Air Quality Index (AQI) dynamics across 291 Indian cities. By leveraging advanced tree-based ensemble learning and rigorous time-series preprocessing, Aetheris provides high-fidelity pollution forecasts and actionable health advisories.


🚀 Key Features

  • Zero-Leakage ML Pipeline: Strictly enforced TimeSeriesSplit cross-validation to prevent temporal data leakage and ensure real-world generalizability.
  • 54-Feature Intelligence engine: Dynamic generation of 7-day lags, 30-day rolling averages, cyclical time encodings, and geographic target encodings.
  • Synthetic Rebalancing (SMOTE): Addressed the massive 160:1 class imbalance between "Satisfactory" and "Severe" days to ensure the model never misses life-threatening pollution spikes.
  • Live Inference Backend: A high-performance FastAPI server that reconstructs complex feature vectors on-the-fly for real-time LightGBM predictions.
  • Modern Analytics Dashboard: A sleek, dark-themed React interface with interactive Recharts, city-to-city comparisons, and automated health recommendations.

📊 Performance Leaderboard

After evaluating 9+ algorithms, LightGBM (Gradient Boosting) emerged as the champion across both tracks:

📈 Regression (Exact AQI Value)

Metric Result Interpretation
R² Score 0.904 Explains 90% of atmospheric variance
RMSE 20.29 Average error of only ~4% on the full scale
MAE 13.55 Extremely tight precision for categorical mapping

🎯 Classification (Severity Categories)

Metric Result Status
Weighted F1 0.875 Robust balance between Precision & Recall
Recall (Severe) High Successfully identifies hazardous days via SMOTE

🏗️ System Architecture

graph TD
    A[Kaggle Dataset: 235k Records] --> B[Pipeline: run_02_preprocessing.py]
    B --> C{Feature Engine}
    C --> C1[7d/30d Lags & Rolling]
    C --> C2[Cyclical Time Encodings]
    C --> C3[Location Target Encoding]
    
    B --> D[TimeSeriesSplit & SMOTE]
    D --> E[Model: LightGBM / XGBoost]
    
    E --> F[Serialized best_model.pkl]
    F --> G[FastAPI Production Server]
    
    H[React Frontend] <--> G
    H --> I[Dynamic Dashboard]
    H --> J[City Comparison Tool]
    H --> K[Health Advisory Engine]
Loading

🛠️ Installation & Setup

Backend (Python 3.9+)

# 1. Setup environment
python -m venv venv
source venv/bin/activate

# 2. Install dependencies
pip install -r requirements.txt

# 3. Train models
python run_01_eda.py
python run_02_preprocessing.py
python run_03_modeling.py   # Generated best_model.pkl

# 4. Start API
uvicorn src.api:app --reload

Frontend (Node 18+)

cd frontend
npm install
npm run dev

📁 Project Structure

aetheris/
├── src/
│   ├── api/             # FastAPI Inference Engine
│   ├── models/          # Ensemble Definitions & Logic
│   ├── preprocessing/   # Zero-leakage Engineering
│   └── evaluation/      # Time-Series Validation Suites
├── frontend/            # React + Tailwind + Vite
├── data/                # Raw & Processed data stores
├── reports/figures/     # High-res performance plots
├── models/              # Serialized Production Binaries
└── run_*.py             # Orchestration scripts 01-06

📜 Research Reports

For a deep dive into the methodology, exploratory analysis, and mathematical rationale, please refer to the following documents in the root:


Author: Project Aetheris (2025)

About

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

 
 
 

Contributors