Topology-Aware Knowledge Propagation in Decentralized Learning

Link to paper

Decentralized learning enables collaborative training of models across naturally distributed data without centralized coordination or maintenance of a global model. Instead, devices are organized in arbitrary communication topologies, in which they can only communicate with neighboring devices. Each device maintains its own local model by training on its local data and integrating new knowledge via model aggregation with neighbors. Therefore, knowledge is propagated across the topology via successive aggregation rounds. We study, in particular, the propagation of out-of-distribution (OOD) knowledge. We find that popular decentralized learning algorithms struggle to propagate OOD knowledge effectively to all devices. Further, we find that both the location of OOD data within a topology, and the topology itself, significantly impact OOD knowledge propagation. We then propose topology-aware aggregation strategies to accelerate (OOD) knowledge propagation across devices. These strategies improve OOD data accuracy, compared to topology-unaware baselines, by 123% on average across models in a topology.

This repo is a test Bed for Prototyping Fully-Distributed ML Experiments. The provided expeirmental scripts accompany the paper.

Note:

All scripts have been configured/parallelized to run on the Aurora supercomputer. Aurora has Intel GPUs. This code has only been tested on Intel GPUs. We have built in support for running on nodes w/ Nvidia GPU's (tested on the Polaris supercomputer). We use the parsl Python parallelization framework. Thereofore, to run on your machine, you must first set up a Parsl config in parsl_setup.py.

Generate Topologies

src/create_topo directory for all topology creation scripts (several scripts for generating differen types of topologies)
src/create_topo/create_topologies.py example topologies
src/create_topo/backdoor_topo.py All topologies that we used in official paper experiments

Simple Decentralized learning demo

src/experiments/decentralized_main.py script to run a single configurable decentralized trianing experiment for a single topology How to configure Parsl:
- We provide a tested example parsl config for parallelizing your workflow across a single Aurora node.
- We provide an untested example parsl config for parallelizing your workflow across a single Polaris node. How to run:

# set up and activate python environment
# first configure your parsl config in parsl_setup.py
python ../create_topo/create_topologies.py # create and save some topologies
python decentralized_main.py --help # to see all argument options
python decentralized_main.py # to run w/ default args on Aurora (for polaris set this following flag: --parsl_executor polaris_experiment_per_node)

Run All Paper Experiments

src/experiments/bd_scheduler.py script runs every single experiment in the paper
- This code relies on 2 levels of parallelization:
  - Parallelization within experiments:
    - We provide a tested example parsl config for parallelizing individual experiments across a single Aurora node.
    - We provide an untested example parsl config for parallelizing individual experiments across a single Polaris node.
  - Parallelization across experiments:
    - We provide a tested example parsl config for parallelizing your workflow across multiple Aurora nodes.
    - We provide an untested example parsl config for parallelizing your workflow across multiple Polaris nodes. How to run:

# first configure your parsl config in parsl_setup.py
# set up and activate python environment
# first configure your parsl config in parsl_setup.py
python bd_scheduler.py --rounds 40

Installation

Requirements:

python >=3.7,<3.11

git clone https://github.com/msakarvadia/topology_aware_learning.git
cd topology_aware_learning
conda create -p env python==3.10
conda activate env
pip install -r requirements.txt
pip install -e .

Setting Up Pre-Commit Hooks (for nice code formatting)

Black

To maintain consistent formatting, we take advantage of black via pre-commit hooks. There will need to be some user-side configuration. Namely, the following steps:

Install black via pip install black (included in requirements.txt).
Install pre-commit via pip install pre-commit (included in requirements.txt).
Run pre-commit install to setup the pre-commit hooks.

Once these steps are done, you just need to add files to be committed and pushed and the hook will reformat any Python file that does not meet Black's expectations and remove them from the commit. Just re-commit the changes and it'll be added to the commit before pushing.

A Simple Example of Decentralized Learning

The above animation is an example of a fully-connected topology. Nodes are models and edges are commuincation links between models. Each model is given a subset of the MNIST dataset to train over. We visual the accuracies of the "2" label over training time.

Please cite this work as:

@article{sakarvadia2025topology,
      title={Topology-Aware Knowledge Propagation in Decentralized Learning}, 
      author={Mansi Sakarvadia and Nathaniel Hudson and Tian Li and Ian Foster and Kyle Chard},
      year={2025},
      eprint={2505.11760},
      url={https://arxiv.org/abs/2505.11760}, 
}

Name		Name	Last commit message	Last commit date
Latest commit History 537 Commits
scripts		scripts
src		src
static		static
.gitignore		.gitignore
.pre-commit-config.yaml		.pre-commit-config.yaml
2_recall.gif		2_recall.gif
README.md		README.md
aurora_requirements.txt		aurora_requirements.txt
index.html		index.html
requirements.txt		requirements.txt
setup.py		setup.py

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Repository files navigation

Topology-Aware Knowledge Propagation in Decentralized Learning

Note:

Generate Topologies

Simple Decentralized learning demo

Run All Paper Experiments

Installation

Setting Up Pre-Commit Hooks (for nice code formatting)

Black

A Simple Example of Decentralized Learning

About

Uh oh!

Releases

Packages

Languages

msakarvadia/topology_aware_learning

Folders and files

Latest commit

History

Repository files navigation

Topology-Aware Knowledge Propagation in Decentralized Learning

Note:

Generate Topologies

Simple Decentralized learning demo

Run All Paper Experiments

Installation

Setting Up Pre-Commit Hooks (for nice code formatting)

Black

A Simple Example of Decentralized Learning

About

Topics

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages