This project is a Streamlit-based machine learning application that clusters the world's countries based on live demographic and economic statistics. It uses the World Bank API (wbgapi) to fetch the last 10 years of data for various indicators.
The application applies:
- K-Nearest Neighbors (KNN) to impute missing values.
- Log-transformations on skewed features.
- Standard Scaling to normalize data.
- K-Means or a Custom Gaussian Mixture Model (GMM) with uniform priors for clustering.
- Live Data Fetching: Retrieves real-time data from the World Bank API.
- Custom Machine Learning: Includes a custom
UniformGMMwhich strictly enforces uniform priors. - Interactive UI: A dashboard built with Streamlit and Plotly Express for exploring global demographic clusters through interactive maps and data tables.
- Dynamic Configurations: Adjust features, k-Neighbors for imputation, the number of clusters (K), and the clustering algorithm straight from the UI.
This project uses uv for package and environment management.
- Create the virtual environment using
uv:uv venv
- Activate the environment:
source .venv/bin/activate - Install dependencies:
uv pip install -r requirements.txt
To run the Streamlit dashboard, use the following command while the virtual environment is activated:
streamlit run app.pyThen, open your browser and navigate to the provided local URL (typically http://localhost:8501).