Venn is a federated learning (FL) resource manager that efficiently schedules ephemeral, heterogeneous edge devices among multiple FL jobs, reducing average job completion time.
To set up the conda environment, follow these steps:
- Download and install Miniconda:
wget https://repo.anaconda.com/miniconda/Miniconda3-latest-Linux-x86_64.sh -O ~/miniconda.sh
bash ~/miniconda.sh -b -p $HOME/miniconda
eval "$($HOME/miniconda/bin/conda shell.bash hook)"
conda init
source ~/.bashrc
- Clone the repository:
git clone https://github.com/SymbioticLab/Venn
cd Venn
- Create the conda environment:
conda env create -f environment.yml
conda activate venn_env
- Download the FL job and client traces.
sudo apt update
sudo apt install git-lfs
git lfs install
git lfs pull
Run simulations for scheduling FL jobs using the following command:
python src/venn_event.py <Scheduler> <JobType> <ClientType> <NumJobs> <ClientAndJobTrace>
Commands to reproduce Table 1 line 1:
python src/venn_event.py SmallReqScheduler MixedJob VennClient 50 config/even_workload.yml
python src/venn_event.py FIFOScheduler MixedJob VennClient 50 config/even_workload.yml
python src/venn_event.py VennReqScheduler MixedJob VennClient 50 config/even_workload.yml
-
Expected results (check out
fig/
):FIFOScheduler
: Average queuing delay: 246.436; Average job completion time (JCT): 4233961.563; Total makespan: 9069573.109SmallReqScheduler
: Average queuing delay: 345.579; Average job completion time (JCT): 3445766.367; Total makespan: 10739668.501VennReqScheduler
: Average queuing delay: 321.743; Average job completion time (JCT): 3133770.735; Total makespan: 8742415.026
-
Each experiment is expected to take approximately 1 hour to complete, depending on your machine's specifications.
-
To reproduce additional results from the paper, refer to the config directory and replace with the corresponding configuration files.
The Venn project is organized as follows:
- venn/: Contains the core logic for the Venn resource management tool.
venn_event.py
: Main script to simulate the scheduling for multiple federated learning jobs.scheduler.py
: Implements different scheduling algorithms.client.py
: Defines client behavior and interactions.job.py
: Manages job definitions and lifecycle.
- config/: Contains configuration files for FL job traces (resource demands and FL job type distribution). Check out
config/test.yml
for more detailed explanations. - trace/: Contains configuration files for FL client traces (availability and eligiblity traces).
- testbed/: Contains the code to run the testbed FL job scheduling experiments. Please check separate instructions to setup Propius.
This project is licensed under the MIT License - see the LICENSE file for details.