⚙️ MLForge

End-to-end MLOps platform for model lifecycle management

Upload a CSV → get a deployed, monitored ML model. Everything runs locally or on free hosting tiers. No credit card required, ever.

Architecture

┌─────────────────────────────────────────────────────────────────────┐
│                         MLForge Platform                            │
│                                                                     │
│  ┌──────────┐    ┌──────────┐    ┌──────────┐    ┌──────────────┐  │
│  │  Upload  │───▶│ Validate │───▶│ Profile  │───▶│  Feature Eng │  │
│  │  CSV     │    │ (Rules)  │    │ (Stats)  │    │ (Scale/OHE)  │  │
│  └──────────┘    └──────────┘    └──────────┘    └──────┬───────┘  │
│                                                          │          │
│  ┌──────────────────────────────────────────────────────▼───────┐  │
│  │                    Train 4 Models                             │  │
│  │   LogisticRegression  │  RandomForest  │  XGBoost  │  LGBM   │  │
│  └──────────────────────────────┬────────────────────────────────┘  │
│                                 │                                   │
│  ┌──────────────────────────────▼────────────────────────────────┐  │
│  │           Evaluate & Compare (Acc / F1 / AUC-ROC)             │  │
│  │                     Pick Best Model                           │  │
│  └──────────────────────────────┬────────────────────────────────┘  │
│                                 │                                   │
│         ┌───────────────────────┼──────────────────────┐           │
│         ▼                       ▼                      ▼           │
│  ┌─────────────┐    ┌──────────────────┐    ┌──────────────────┐   │
│  │   MLflow    │    │  FastAPI Serving  │    │   Drift Monitor  │   │
│  │  Registry   │    │  /predict        │    │  KS-test + PSI   │   │
│  │  (local)    │    │  /health         │    │  → Alert/Retrain │   │
│  └─────────────┘    └──────────────────┘    └──────────────────┘   │
│                                                                     │
│  ┌─────────────────────────────────────────────────────────────┐   │
│  │           Streamlit Dashboard (upload → monitor)             │   │
│  └─────────────────────────────────────────────────────────────┘   │
│                                                                     │
│  Storage: SQLite (metadata) + Local filesystem (models + MLruns)   │
└─────────────────────────────────────────────────────────────────────┘

Quick Start (3 commands)

git clone https://github.com/yourusername/mlforge.git && cd mlforge
pip install -r requirements.txt && python data/generate_sample.py
streamlit run ui/app.py

Open http://localhost:8501 → upload CSV → train → deploy → monitor.

Detailed Setup

1. Clone & Install

git clone https://github.com/yourusername/mlforge.git
cd mlforge
pip install -r requirements.txt

2. Generate Sample Data

python data/generate_sample.py
# → creates data/sample.csv (1000 rows, 15 columns, credit scoring)

Or use make data.

3. Start MLflow Server (free, local)

In a separate terminal:

pip install mlflow  # already in requirements.txt
mlflow server \
  --host 0.0.0.0 \
  --port 5000 \
  --backend-store-uri sqlite:///mlforge_meta.db \
  --default-artifact-root ./mlruns

Or use make mlflow.

MLflow UI → http://localhost:5000 (free, runs entirely on your machine)

4. Start the Dashboard

make ui
# or
streamlit run ui/app.py

5. Start the API Server

make serve
# or
uvicorn serving.app:app --host 0.0.0.0 --port 8000 --reload

6. Configure Environment (optional)

cp .env.example .env
# Edit .env for Slack alerts, email alerts, etc.

How To Use

Full Workflow

Step 1 — Upload & Profile

Open http://localhost:8501
Navigate to Data Profiling
Upload your CSV or check "Use built-in sample dataset"
Set your target column name (default: default)
Click Run Full Profile to see stats, missing values, correlations

Step 2 — Train Models

Navigate to Train Models
Adjust test set size and CV folds
Click Start Training — trains 4 models with 5-fold cross-validation
Models are automatically saved to models/ and logged to MLflow

Step 3 — Evaluate

Navigate to Evaluate
See side-by-side comparison: Accuracy, F1, ROC-AUC, Avg Precision
View confusion matrix for the best model
Check cross-validation scores for reliability

Step 4 — Predict

Navigate to Deploy & Predict
Use the API (if make serve is running) or in-memory prediction
Send JSON payload, get back predictions + probabilities + latency

Step 5 — Monitor

Navigate to Monitor Drift
Upload a sample of recent production data
Run drift check → see per-feature KS statistics and PSI values
If drift detected → alerts fire + retrain recommendation shown

API Documentation

`GET /health`

Health check. Returns model load status.

{
  "status": "ok",
  "model_loaded": true,
  "model_loaded_at": 1714156800.0
}

`POST /predict`

Run inference on one or more rows.

Request:

{
  "data": [
    {
      "age": 35,
      "annual_income": 65000,
      "loan_amount": 12000,
      "loan_term_months": 36,
      "credit_score": 720,
      "num_credit_lines": 5,
      "debt_to_income_ratio": 0.25,
      "employment_years": 7.5,
      "num_late_payments": 0,
      "num_inquiries": 2,
      "home_ownership": "MORTGAGE",
      "employment_status": "EMPLOYED",
      "loan_purpose": "DEBT_CONSOLIDATION",
      "has_cosigner": 0
    }
  ]
}

Response:

{
  "predictions": [0],
  "probabilities": [0.0823],
  "model_name": "RandomForestClassifier",
  "latency_ms": 2.4
}

`GET /model/info`

Returns info about the currently loaded model: type, parameters, selected features.

Drift Detection

MLForge uses two complementary statistical tests to detect when your production data has drifted away from the training distribution.

KS-Test (Kolmogorov-Smirnov)

Compares the empirical cumulative distribution functions of two samples.

p-value < 0.05 → distributions are significantly different → drift detected
Fast, non-parametric, no assumptions about distribution shape
Implemented via scipy.stats.ks_2samp

PSI (Population Stability Index)

Industry standard metric from banking/insurance. Measures the magnitude of distribution shift.

PSI = Σ (Prod% - Ref%) × ln(Prod% / Ref%)

PSI Value	Interpretation
< 0.1	No significant change
0.1–0.2	Moderate change — monitor
> 0.2	Significant drift — retrain

Overall Drift Decision

MLForge flags overall drift if >20% of features show drift in either test. This avoids false alarms from a single noisy feature.

Retraining Trigger

When overall drift is detected:

Streamlit dashboard shows 🚨 DRIFT DETECTED banner
Console alert is logged
Slack webhook fires (if configured)
Email alert sent (if SMTP configured)
"Retrain" recommendation shown with link to Train Models tab

Running Tests

# all tests
make test

# specific test file
pytest tests/test_drift.py -v

# with coverage
make coverage
# → opens htmlcov/index.html

# fast tests only (skips CSV I/O)
make test-fast

Test coverage:

tests/test_pipeline.py — ingestion, profiling, feature engineering, training, evaluation
tests/test_serving.py — FastAPI endpoints (with mocked model)
tests/test_drift.py — KS-test and PSI correctness
tests/test_validation.py — data validation rules

Docker

Using docker-compose (recommended)

# start everything: MLflow + API + Streamlit
make docker-up

# check logs
make docker-logs

# stop everything
make docker-down

Services:

MLflow: http://localhost:5000
FastAPI: http://localhost:8000
Streamlit: http://localhost:8501

Manual Docker

docker build --target runtime -t mlforge:latest .
docker run -p 8000:8000 -v $(pwd)/models:/app/models mlforge:latest

Deployment Guide (Free Hosting)

FastAPI → Render Free Tier

Push to GitHub
Create account at https://render.com (free tier available)
New → Web Service → connect your GitHub repo
Settings:
- Build command: pip install -r requirements.txt
- Start command: uvicorn serving.app:app --host 0.0.0.0 --port $PORT
- Environment: Add MLFLOW_TRACKING_URI=file:///app/mlruns
Deploy → get a public HTTPS URL

Note: Render free tier sleeps after 15min inactivity. For always-on, use Railway free tier (500 hrs/month).

Streamlit UI → Streamlit Cloud (Free)

Push to GitHub (include data/sample.csv — untrack models in .gitignore)
Go to https://share.streamlit.io
Click New app → connect repo
Main file path: ui/app.py
Set environment variables in the Secrets section (same as .env)
Deploy → free subdomain at yourapp.streamlit.app

MLflow → Local Only

MLflow tracking server is meant to run locally. For a team setup, you can run it on any free VPS (Oracle Cloud Free Tier has always-free VMs). Just point MLFLOW_TRACKING_URI to that server.

Tech Stack

Component	Library	Version	Cost
ML Models	scikit-learn, XGBoost, LightGBM	Latest	Free
Experiment Tracking	MLflow	≥2.12	Free
API Serving	FastAPI + Uvicorn	Latest	Free
Dashboard	Streamlit	≥1.34	Free
Data Processing	Pandas, NumPy, SciPy	Latest	Free
Storage	SQLite	Built-in	Free
Containerization	Docker	Latest	Free
CI/CD	GitHub Actions	Latest	Free
Testing	pytest	≥8.1	Free

Total infrastructure cost: $0.00

Makefile Commands

make install       # install dependencies
make data          # generate sample.csv
make mlflow        # start MLflow tracking server
make train         # train models on sample.csv
make serve         # start FastAPI server
make ui            # start Streamlit dashboard
make monitor       # run drift check (simulates drift on sample data)
make test          # run all tests
make test-fast     # run tests (skip I/O-heavy ones)
make coverage      # tests with HTML coverage report
make lint          # flake8 linting
make format        # black + isort formatting
make docker-up     # start all services via docker-compose
make docker-down   # stop all docker services
make clean         # remove __pycache__ and build artifacts

Project Structure

mlforge/
├── pipeline/
│   ├── ingestion.py          # CSV loading, column type inference
│   ├── profiler.py           # data statistics, missing values, correlations
│   ├── feature_engineering.py # scaling, OHE, label encoding, feature selection
│   ├── trainer.py            # trains LR, RF, XGBoost, LightGBM
│   ├── evaluator.py          # accuracy, F1, AUC-ROC, confusion matrix
│   └── drift_detector.py     # KS-test + PSI drift detection
├── serving/
│   ├── app.py                # FastAPI /predict and /health endpoints
│   └── model_loader.py       # loads model from MLflow or local fallback
├── registry/
│   └── mlflow_manager.py     # MLflow experiment logging and registry
├── monitoring/
│   ├── drift_monitor.py      # SQLite-backed drift check history
│   └── alerts.py             # console / Slack / email alerts
├── validation/
│   └── data_validator.py     # data quality rules engine
├── ui/
│   └── app.py                # Streamlit dashboard (all 5 pages)
├── config/
│   └── settings.py           # all configuration in one place
├── tests/                    # pytest test suite
├── data/
│   ├── sample.csv            # 1000-row credit scoring dataset
│   └── generate_sample.py    # script to regenerate sample data
├── Dockerfile                # multi-stage Docker build
├── docker-compose.yml        # MLflow + API + UI
├── .github/workflows/ci.yml  # GitHub Actions: test on push
├── Makefile                  # make install/train/serve/test/...
└── requirements.txt          # all free, open-source dependencies

License

MIT License — see LICENSE

Copyright (c) 2024 MLForge Contributors

Permission is hereby granted, free of charge, to any person obtaining a copy
of this software and associated documentation files (the "Software"), to deal
in the Software without restriction...

Contributing

Fork the repo
Create a feature branch: git checkout -b feature/my-feature
Run tests: make test
Submit a PR

Ideas welcome: Optuna hyperparameter tuning, multi-class support, categorical drift detection, SHAP explainability, model A/B testing.

Name		Name	Last commit message	Last commit date
Latest commit History 1 Commit
.github/workflows		.github/workflows
config		config
data		data
monitoring		monitoring
pipeline		pipeline
registry		registry
serving		serving
tests		tests
ui		ui
validation		validation
.env.example		.env.example
.gitignore		.gitignore
Dockerfile		Dockerfile
LICENSE		LICENSE
Makefile		Makefile
README.md		README.md
docker-compose.yml		docker-compose.yml
pytest.ini		pytest.ini
requirements.txt		requirements.txt
setup.cfg		setup.cfg

Folders and files

Latest commit

History

Repository files navigation

⚙️ MLForge

Architecture

Quick Start (3 commands)

Detailed Setup

1. Clone & Install

2. Generate Sample Data

3. Start MLflow Server (free, local)

4. Start the Dashboard

5. Start the API Server

6. Configure Environment (optional)

How To Use

Full Workflow

API Documentation

GET /health

POST /predict

GET /model/info

Drift Detection

KS-Test (Kolmogorov-Smirnov)

PSI (Population Stability Index)

Overall Drift Decision

Retraining Trigger

Running Tests

Docker

Using docker-compose (recommended)

Manual Docker

Deployment Guide (Free Hosting)

FastAPI → Render Free Tier

Streamlit UI → Streamlit Cloud (Free)

MLflow → Local Only

Tech Stack

Makefile Commands

Project Structure

License

Contributing

About

Resources

License

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

`GET /health`

`POST /predict`

`GET /model/info`

Packages