GitHub - yangpc615/WPipe: Implement WPipe from Paper: Group-based Interleaved Pipeline Parallelism for Large-scale DNN Training

This repository contains the source code implementation for the paper "Group-based Interleaved Pipeline Parallelism for Large-scale DNN Training"

Directory Structure

`lib`

WPipe's runtime, which implements model parallelism, input pipelining, as well as as comunication in PyTorch. This can be fused with data parallelism to give hybrid model and data parallelism, and input pipelining.

`cv`

Image classification task entry point, as well as splits of model

`nlp`

NLP task entry point, as well splits of model

`network_conf`

Experiments configurations

`experiments`

Experiments running scripts

`tool`

Some helper scripts

Setup

Software Dependencies

To run WPipe, you will need a NVIDIA GPU with CUDA 10.1, GPU driver version 418.67, nvidia-docker2, and Python 3. On a Linux server with NVIDIA GPU(s) and Ubuntu 16.04

All dependencies are in the pytorch/pytorch:1.4-cuda10.1-cudnn7-runtime container, which can be downloaded using:

docker pull pytorch/pytorch:1.4-cuda10.1-cudnn7-runtime

The PyTorch Docker Container can then be run using:

nvidia-docker run -it -v /mnt:/mnt --ipc=host --net=host pytorch/pytorch:1.4-cuda10.1-cudnn7-runtime /bin/bash

Initialization

Before runing wpipe program,

cd tool && sh init.sh

Data Prepare

CV

We run experiments for fine-tune using cifar10, cifar100 and oxford-flower-102, and the throughput experiments using oxford-flower-102 dataset

To download cifar10 and cifar100 from this website To download oxfordflowers102 from this website

NLP

We run experiments for fine-tune and throughput using a subset of the GLUE dataset(QQP and MNLI). To download the GLUE dataset use this script.

Run experiments

All experiments can be carried out using scripts in the experiments directory. You can perform an experiment as follows:

sh experiments/cv_throughput_single_node.sh

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

cv

cv

experiments

experiments

lib

lib

network_conf

network_conf

nlp

nlp

tool

tool

README.md

README.md

init.py

init.py

Repository files navigation

Directory Structure

`lib`

`cv`

`nlp`

`network_conf`

`experiments`

`tool`

Setup

Software Dependencies

Initialization

Data Prepare

CV

NLP

Run experiments

About

Releases

Packages

Languages

Name		Name	Last commit message	Last commit date
Latest commit History 2 Commits
cv		cv
experiments		experiments
lib		lib
network_conf		network_conf
nlp		nlp
tool		tool
README.md		README.md
__init__.py		__init__.py

yangpc615/WPipe

Folders and files

Latest commit

History

Repository files navigation

Directory Structure

lib

cv

nlp

network_conf

experiments

tool

Setup

Software Dependencies

Initialization

Data Prepare

CV

NLP

Run experiments

About

Resources

Stars

Watchers

Forks

Languages

`lib`

`cv`

`nlp`

`network_conf`

`experiments`

`tool`