ZipLine: an optimized algorithm for ElasticBSP

Dataset generators and source code (ElasticBSP is implemented in `C++11`)

We implement the dataset generators of the paper "ZipLine: an optimized algorithm for the elastic bulk synchronous parallel model" , which simulate the real distributed running environment where a parallel computing service is running on mutiple workers upon execution.

This implenetation is explained in details in Section 6.1 "ZipLine performance" of the paper. It generates the dataset mentioned in Table 2, "Synthetic datasets with varying number of n and R".

NOTE: Due to NDA, ZipLine source code is not available at the moment.

Paper

ZipLine: an optimized algorithm for the elastic bulk synchronous parallel model

Talk in IEEE DSAA 2021

Slides

Extended Abstract

The flow of prediction and synchronization of ElasticBSP

Use case

Prerequisite: `gcc 4.8+`

The data generators simulate the pull and push requests from mutiple workers and ouput the timestamps of push request of workers in a n x R matrix format (n: number of workers and R: the range of future iterations or the R future iterations of n workers). For example, there are 10 workers and we want to estimate their next 20 future iterations at some time point, the generators will output a dataset in 10 x 20 matrix format. The dataset has 10 entries and each entry has 20 timestampes of workers' push requsts.

To use these data generators to produce the datasets that are mentioned in Section 6.1 of the paper, please follow the steps below:

Under the downloaded/cloned directory, i.e., ElasticBSP/

Compile the two generators, workers_iterations_gen.exe and future_iterations_gen.exe.

g++ -std=c++11 -o workers_iterations_gen.exe workers_iteration_intervals_gen.cc

g++ -std=c++11 -o future_iterations_gen.exe future_iteration_intervals_gen.cc

Use workers_iterations_gen.exe to ouput a matrix of timestamps of workers which simulates n workers initiating a distributed services near a same starting time point and running up to a user-speficied time point (timestamp).
```
workers_iterations_gen.exe [number of workers] [number of timestamps] [output filename]
```
For example, we want to generate 10 initial timestamps (consider timestamps from the starting of the services till 10th push request inclusively) for 20 workers, and save them (a matrix) to an output file init_data_gen_matrix_worker20_t10.txt.
```
workers_iterations_gen.exe 20 10 init_data_gen_matrix_worker20_t10.txt
```
Use future_iterations_gen.exe to output a matrix of future timestamps of workers given the init timestamps output from step 2, and the number of future iterations of workers (it can detect the number of the workers by reading the init output file). The final output is a n x R matrix where n is the number of workers and R is the range of the future iterations (or R future iterations) for every worker. The output file name is auto-generated based on the input n and R.
```
future_iterations_gen.exe [init timestamps filename from step 2] [the range R]
```
Following the example from step 2, we want to generate 30 future iterations for all 20 workers.
```
future_iterations_gen.exe init_data_gen_matrix_worker20_t10.txt 30
```
It will output file data_gen_future_iteration_workers_n20_R30.txt which contains 20 entries and each entry has 30 estimated future timestamps (20 x 30 matrix).

Reference

@article{10.1007/s10994-021-06064-w, 
year = {2021}, 
title = {ZipLine: an optimized algorithm for the elastic bulk synchronous parallel model}, 
author = {Zhao, Xing and Papagelis, Manos and An, Aijun and Chen, Bao Xin and Liu, Junfeng and Hu, Yonggang}, 
journal = {Machine Learning}, 
issn = {0885-6125}, 
doi = {10.1007/s10994-021-06064-w}, 
pages = {1--37}
}

@inproceedings{zhao2019elastic,
  title={Elastic Bulk Synchronous Parallel Model for Distributed Deep Learning},
  author={Zhao, Xing and Papagelis, Manos and An, Aijun and Chen, Bao Xin and Liu, Junfeng and Hu, Yonggang},
  booktitle={19th IEEE International Conference on Data Mining (ICDM)},
  pages={1504-1509},
  year={2019},
}

Name		Name	Last commit message	Last commit date
Latest commit History 15 Commits
.gitignore		.gitignore
DSAA21Talk.png		DSAA21Talk.png
LICENSE		LICENSE
README.md		README.md
ZipLine_4EBSP.pdf		ZipLine_4EBSP.pdf
data_gen_future_iteration_workers_n10_R150.txt		data_gen_future_iteration_workers_n10_R150.txt
data_gen_matrix_w1000_R20.txt		data_gen_matrix_w1000_R20.txt
dist_list_w1000_R20.txt		dist_list_w1000_R20.txt
future_iteration_intervals_gen.cc		future_iteration_intervals_gen.cc
gcc-makefile.mak		gcc-makefile.mak
generator.cc		generator.cc
generator_main_test_exe.cc		generator_main_test_exe.cc
ieee-dsaa21-zipline-extended-abstract.pdf		ieee-dsaa21-zipline-extended-abstract.pdf
iofile.cc		iofile.cc
output.txt		output.txt
pointer.cc		pointer.cc
predictFutureRIterations.png		predictFutureRIterations.png
workers_iteration_intervals_gen.cc		workers_iteration_intervals_gen.cc

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

ZipLine: an optimized algorithm for ElasticBSP

Dataset generators and source code (ElasticBSP is implemented in `C++11`)

NOTE: Due to NDA, ZipLine source code is not available at the moment.

Paper

Talk in IEEE DSAA 2021

The flow of prediction and synchronization of ElasticBSP

Use case

Prerequisite: `gcc 4.8+`

To use these data generators to produce the datasets that are mentioned in Section 6.1 of the paper, please follow the steps below:

Reference

About

Releases

Packages

Languages

License

xingzhaoo/ElasticBSP

Folders and files

Latest commit

History

Repository files navigation

ZipLine: an optimized algorithm for ElasticBSP

Dataset generators and source code (ElasticBSP is implemented in C++11)

NOTE: Due to NDA, ZipLine source code is not available at the moment.

Paper

Talk in IEEE DSAA 2021

The flow of prediction and synchronization of ElasticBSP

Use case

Prerequisite: gcc 4.8+

To use these data generators to produce the datasets that are mentioned in Section 6.1 of the paper, please follow the steps below:

Reference

About

Resources

License

Stars

Watchers

Forks

Releases

Packages 0

Languages

Dataset generators and source code (ElasticBSP is implemented in `C++11`)

Prerequisite: `gcc 4.8+`

Packages