Skip to content

msakarvadia/topology_aware_learning

Repository files navigation

Topology-Aware Knowledge Propagation in Decentralized Learning

Link to paper

Decentralized learning enables collaborative training of models across naturally distributed data without centralized coordination or maintenance of a global model. Instead, devices are organized in arbitrary communication topologies, in which they can only communicate with neighboring devices. Each device maintains its own local model by training on its local data and integrating new knowledge via model aggregation with neighbors. Therefore, knowledge is propagated across the topology via successive aggregation rounds. We study, in particular, the propagation of out-of-distribution (OOD) knowledge. We find that popular decentralized learning algorithms struggle to propagate OOD knowledge effectively to all devices. Further, we find that both the location of OOD data within a topology, and the topology itself, significantly impact OOD knowledge propagation. We then propose topology-aware aggregation strategies to accelerate (OOD) knowledge propagation across devices. These strategies improve OOD data accuracy, compared to topology-unaware baselines, by 123% on average across models in a topology.

overview figure

This repo is a test Bed for Prototyping Fully-Distributed ML Experiments. The provided expeirmental scripts accompany the paper.

Note:

All scripts have been configured/parallelized to run on the Aurora supercomputer. Aurora has Intel GPUs. This code has only been tested on Intel GPUs. We have built in support for running on nodes w/ Nvidia GPU's (tested on the Polaris supercomputer). We use the parsl Python parallelization framework. Thereofore, to run on your machine, you must first set up a Parsl config in parsl_setup.py.

Generate Topologies

Simple Decentralized learning demo

# set up and activate python environment
# first configure your parsl config in parsl_setup.py
python ../create_topo/create_topologies.py # create and save some topologies
python decentralized_main.py --help # to see all argument options
python decentralized_main.py # to run w/ default args on Aurora (for polaris set this following flag: --parsl_executor polaris_experiment_per_node)

Run All Paper Experiments

# first configure your parsl config in parsl_setup.py
# set up and activate python environment
# first configure your parsl config in parsl_setup.py
python bd_scheduler.py --rounds 40 

Installation

Requirements:

  • python >=3.7,<3.11
git clone https://github.com/msakarvadia/topology_aware_learning.git
cd topology_aware_learning
conda create -p env python==3.10
conda activate env
pip install -r requirements.txt
pip install -e .

Setting Up Pre-Commit Hooks (for nice code formatting)

Black

To maintain consistent formatting, we take advantage of black via pre-commit hooks. There will need to be some user-side configuration. Namely, the following steps:

  1. Install black via pip install black (included in requirements.txt).
  2. Install pre-commit via pip install pre-commit (included in requirements.txt).
  3. Run pre-commit install to setup the pre-commit hooks.

Once these steps are done, you just need to add files to be committed and pushed and the hook will reformat any Python file that does not meet Black's expectations and remove them from the commit. Just re-commit the changes and it'll be added to the commit before pushing.

A Simple Example of Decentralized Learning

2_recall

The above animation is an example of a fully-connected topology. Nodes are models and edges are commuincation links between models. Each model is given a subset of the MNIST dataset to train over. We visual the accuracies of the "2" label over training time.

Please cite this work as:

@article{sakarvadia2025topology,
      title={Topology-Aware Knowledge Propagation in Decentralized Learning}, 
      author={Mansi Sakarvadia and Nathaniel Hudson and Tian Li and Ian Foster and Kyle Chard},
      year={2025},
      eprint={2505.11760},
      url={https://arxiv.org/abs/2505.11760}, 
}

About

A Test Bed for Prototyping Fully-Decentralized ML Experiments

Topics

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published