End-to-end MLOps platform for model lifecycle management
Upload a CSV → get a deployed, monitored ML model. Everything runs locally or on free hosting tiers. No credit card required, ever.
┌─────────────────────────────────────────────────────────────────────┐
│ MLForge Platform │
│ │
│ ┌──────────┐ ┌──────────┐ ┌──────────┐ ┌──────────────┐ │
│ │ Upload │───▶│ Validate │───▶│ Profile │───▶│ Feature Eng │ │
│ │ CSV │ │ (Rules) │ │ (Stats) │ │ (Scale/OHE) │ │
│ └──────────┘ └──────────┘ └──────────┘ └──────┬───────┘ │
│ │ │
│ ┌──────────────────────────────────────────────────────▼───────┐ │
│ │ Train 4 Models │ │
│ │ LogisticRegression │ RandomForest │ XGBoost │ LGBM │ │
│ └──────────────────────────────┬────────────────────────────────┘ │
│ │ │
│ ┌──────────────────────────────▼────────────────────────────────┐ │
│ │ Evaluate & Compare (Acc / F1 / AUC-ROC) │ │
│ │ Pick Best Model │ │
│ └──────────────────────────────┬────────────────────────────────┘ │
│ │ │
│ ┌───────────────────────┼──────────────────────┐ │
│ ▼ ▼ ▼ │
│ ┌─────────────┐ ┌──────────────────┐ ┌──────────────────┐ │
│ │ MLflow │ │ FastAPI Serving │ │ Drift Monitor │ │
│ │ Registry │ │ /predict │ │ KS-test + PSI │ │
│ │ (local) │ │ /health │ │ → Alert/Retrain │ │
│ └─────────────┘ └──────────────────┘ └──────────────────┘ │
│ │
│ ┌─────────────────────────────────────────────────────────────┐ │
│ │ Streamlit Dashboard (upload → monitor) │ │
│ └─────────────────────────────────────────────────────────────┘ │
│ │
│ Storage: SQLite (metadata) + Local filesystem (models + MLruns) │
└─────────────────────────────────────────────────────────────────────┘
git clone https://github.com/yourusername/mlforge.git && cd mlforge
pip install -r requirements.txt && python data/generate_sample.py
streamlit run ui/app.pyOpen http://localhost:8501 → upload CSV → train → deploy → monitor.
git clone https://github.com/yourusername/mlforge.git
cd mlforge
pip install -r requirements.txtpython data/generate_sample.py
# → creates data/sample.csv (1000 rows, 15 columns, credit scoring)Or use make data.
In a separate terminal:
pip install mlflow # already in requirements.txt
mlflow server \
--host 0.0.0.0 \
--port 5000 \
--backend-store-uri sqlite:///mlforge_meta.db \
--default-artifact-root ./mlrunsOr use make mlflow.
MLflow UI → http://localhost:5000 (free, runs entirely on your machine)
make ui
# or
streamlit run ui/app.pymake serve
# or
uvicorn serving.app:app --host 0.0.0.0 --port 8000 --reloadcp .env.example .env
# Edit .env for Slack alerts, email alerts, etc.Step 1 — Upload & Profile
- Open http://localhost:8501
- Navigate to Data Profiling
- Upload your CSV or check "Use built-in sample dataset"
- Set your target column name (default:
default) - Click Run Full Profile to see stats, missing values, correlations
Step 2 — Train Models
- Navigate to Train Models
- Adjust test set size and CV folds
- Click Start Training — trains 4 models with 5-fold cross-validation
- Models are automatically saved to
models/and logged to MLflow
Step 3 — Evaluate
- Navigate to Evaluate
- See side-by-side comparison: Accuracy, F1, ROC-AUC, Avg Precision
- View confusion matrix for the best model
- Check cross-validation scores for reliability
Step 4 — Predict
- Navigate to Deploy & Predict
- Use the API (if
make serveis running) or in-memory prediction - Send JSON payload, get back predictions + probabilities + latency
Step 5 — Monitor
- Navigate to Monitor Drift
- Upload a sample of recent production data
- Run drift check → see per-feature KS statistics and PSI values
- If drift detected → alerts fire + retrain recommendation shown
Health check. Returns model load status.
{
"status": "ok",
"model_loaded": true,
"model_loaded_at": 1714156800.0
}Run inference on one or more rows.
Request:
{
"data": [
{
"age": 35,
"annual_income": 65000,
"loan_amount": 12000,
"loan_term_months": 36,
"credit_score": 720,
"num_credit_lines": 5,
"debt_to_income_ratio": 0.25,
"employment_years": 7.5,
"num_late_payments": 0,
"num_inquiries": 2,
"home_ownership": "MORTGAGE",
"employment_status": "EMPLOYED",
"loan_purpose": "DEBT_CONSOLIDATION",
"has_cosigner": 0
}
]
}Response:
{
"predictions": [0],
"probabilities": [0.0823],
"model_name": "RandomForestClassifier",
"latency_ms": 2.4
}Returns info about the currently loaded model: type, parameters, selected features.
MLForge uses two complementary statistical tests to detect when your production data has drifted away from the training distribution.
Compares the empirical cumulative distribution functions of two samples.
- p-value < 0.05 → distributions are significantly different → drift detected
- Fast, non-parametric, no assumptions about distribution shape
- Implemented via
scipy.stats.ks_2samp
Industry standard metric from banking/insurance. Measures the magnitude of distribution shift.
PSI = Σ (Prod% - Ref%) × ln(Prod% / Ref%)
| PSI Value | Interpretation |
|---|---|
| < 0.1 | No significant change |
| 0.1–0.2 | Moderate change — monitor |
| > 0.2 | Significant drift — retrain |
MLForge flags overall drift if >20% of features show drift in either test. This avoids false alarms from a single noisy feature.
When overall drift is detected:
- Streamlit dashboard shows
🚨 DRIFT DETECTEDbanner - Console alert is logged
- Slack webhook fires (if configured)
- Email alert sent (if SMTP configured)
- "Retrain" recommendation shown with link to Train Models tab
# all tests
make test
# specific test file
pytest tests/test_drift.py -v
# with coverage
make coverage
# → opens htmlcov/index.html
# fast tests only (skips CSV I/O)
make test-fastTest coverage:
tests/test_pipeline.py— ingestion, profiling, feature engineering, training, evaluationtests/test_serving.py— FastAPI endpoints (with mocked model)tests/test_drift.py— KS-test and PSI correctnesstests/test_validation.py— data validation rules
# start everything: MLflow + API + Streamlit
make docker-up
# check logs
make docker-logs
# stop everything
make docker-downServices:
- MLflow: http://localhost:5000
- FastAPI: http://localhost:8000
- Streamlit: http://localhost:8501
docker build --target runtime -t mlforge:latest .
docker run -p 8000:8000 -v $(pwd)/models:/app/models mlforge:latest- Push to GitHub
- Create account at https://render.com (free tier available)
- New → Web Service → connect your GitHub repo
- Settings:
- Build command:
pip install -r requirements.txt - Start command:
uvicorn serving.app:app --host 0.0.0.0 --port $PORT - Environment: Add
MLFLOW_TRACKING_URI=file:///app/mlruns
- Build command:
- Deploy → get a public HTTPS URL
Note: Render free tier sleeps after 15min inactivity. For always-on, use Railway free tier (500 hrs/month).
- Push to GitHub (include
data/sample.csv— untrack models in.gitignore) - Go to https://share.streamlit.io
- Click New app → connect repo
- Main file path:
ui/app.py - Set environment variables in the Secrets section (same as
.env) - Deploy → free subdomain at
yourapp.streamlit.app
MLflow tracking server is meant to run locally. For a team setup, you can run it on any free VPS (Oracle Cloud Free Tier has always-free VMs). Just point MLFLOW_TRACKING_URI to that server.
| Component | Library | Version | Cost |
|---|---|---|---|
| ML Models | scikit-learn, XGBoost, LightGBM | Latest | Free |
| Experiment Tracking | MLflow | ≥2.12 | Free |
| API Serving | FastAPI + Uvicorn | Latest | Free |
| Dashboard | Streamlit | ≥1.34 | Free |
| Data Processing | Pandas, NumPy, SciPy | Latest | Free |
| Storage | SQLite | Built-in | Free |
| Containerization | Docker | Latest | Free |
| CI/CD | GitHub Actions | Latest | Free |
| Testing | pytest | ≥8.1 | Free |
Total infrastructure cost: $0.00
make install # install dependencies
make data # generate sample.csv
make mlflow # start MLflow tracking server
make train # train models on sample.csv
make serve # start FastAPI server
make ui # start Streamlit dashboard
make monitor # run drift check (simulates drift on sample data)
make test # run all tests
make test-fast # run tests (skip I/O-heavy ones)
make coverage # tests with HTML coverage report
make lint # flake8 linting
make format # black + isort formatting
make docker-up # start all services via docker-compose
make docker-down # stop all docker services
make clean # remove __pycache__ and build artifactsmlforge/
├── pipeline/
│ ├── ingestion.py # CSV loading, column type inference
│ ├── profiler.py # data statistics, missing values, correlations
│ ├── feature_engineering.py # scaling, OHE, label encoding, feature selection
│ ├── trainer.py # trains LR, RF, XGBoost, LightGBM
│ ├── evaluator.py # accuracy, F1, AUC-ROC, confusion matrix
│ └── drift_detector.py # KS-test + PSI drift detection
├── serving/
│ ├── app.py # FastAPI /predict and /health endpoints
│ └── model_loader.py # loads model from MLflow or local fallback
├── registry/
│ └── mlflow_manager.py # MLflow experiment logging and registry
├── monitoring/
│ ├── drift_monitor.py # SQLite-backed drift check history
│ └── alerts.py # console / Slack / email alerts
├── validation/
│ └── data_validator.py # data quality rules engine
├── ui/
│ └── app.py # Streamlit dashboard (all 5 pages)
├── config/
│ └── settings.py # all configuration in one place
├── tests/ # pytest test suite
├── data/
│ ├── sample.csv # 1000-row credit scoring dataset
│ └── generate_sample.py # script to regenerate sample data
├── Dockerfile # multi-stage Docker build
├── docker-compose.yml # MLflow + API + UI
├── .github/workflows/ci.yml # GitHub Actions: test on push
├── Makefile # make install/train/serve/test/...
└── requirements.txt # all free, open-source dependencies
MIT License — see LICENSE
Copyright (c) 2024 MLForge Contributors
Permission is hereby granted, free of charge, to any person obtaining a copy
of this software and associated documentation files (the "Software"), to deal
in the Software without restriction...
- Fork the repo
- Create a feature branch:
git checkout -b feature/my-feature - Run tests:
make test - Submit a PR
Ideas welcome: Optuna hyperparameter tuning, multi-class support, categorical drift detection, SHAP explainability, model A/B testing.