Skip to content

wl3223/Clustering-Countries

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

2 Commits
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

Global Demographics & Economic Clustering

This project is a Streamlit-based machine learning application that clusters the world's countries based on live demographic and economic statistics. It uses the World Bank API (wbgapi) to fetch the last 10 years of data for various indicators.

The application applies:

  1. K-Nearest Neighbors (KNN) to impute missing values.
  2. Log-transformations on skewed features.
  3. Standard Scaling to normalize data.
  4. K-Means or a Custom Gaussian Mixture Model (GMM) with uniform priors for clustering.

Features

  • Live Data Fetching: Retrieves real-time data from the World Bank API.
  • Custom Machine Learning: Includes a custom UniformGMM which strictly enforces uniform priors.
  • Interactive UI: A dashboard built with Streamlit and Plotly Express for exploring global demographic clusters through interactive maps and data tables.
  • Dynamic Configurations: Adjust features, k-Neighbors for imputation, the number of clusters (K), and the clustering algorithm straight from the UI.

Requirements & Setup

This project uses uv for package and environment management.

  1. Create the virtual environment using uv:
    uv venv
  2. Activate the environment:
    source .venv/bin/activate
  3. Install dependencies:
    uv pip install -r requirements.txt

Running the Application

To run the Streamlit dashboard, use the following command while the virtual environment is activated:

streamlit run app.py

Then, open your browser and navigate to the provided local URL (typically http://localhost:8501).

About

project for Fundamentals of Machine Learning course

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

 
 
 

Contributors

Languages