Multi-gpu support for pytorch #153

SteffenCzolbe · 2020-02-21T19:44:42Z

The PR adds multi-GPUs support to the pytorch train script. This function is already present in the Tensorflow implementation, but was absen from the pytorch one.

Command-line arguments are unchanged, and in line with the Tensorflow implementation.

New supported functions:

Train on a single GPU, that isn't the first one. Example: command line arg "--gpu 3" trains only on Cuda device 3
Train on any number of GPUs. Example: command line arg "--gpu 0,1" trains on GPUs No. 0 and 1. Parallelism is achieved by splitting the batch among the first dimension. If the batch size is less than the number of GPUs, an error message is thrown (behavior in-line with TensorFlow backend).

Verification:

Implementation was verified by benchmarking training speed on synthetic data. An almost linear speedup was achieved when scaling from 1 to 4 GPUs.

…rmat interchangeable with single gpu/cpu models.

…LE_DEVICES environment variable.

adalca · 2020-02-24T13:22:25Z

Thank you @SteffenCzolbe , we'll take a look at this and do the pull in a bit.

SteffenCzolbe added 3 commits February 21, 2020 20:29

adds multi-gpu support to pytorch backend

8d665a6

adds model checkpointing support to multi-gpu pytorch models. Save fo…

39df9c6

…rmat interchangeable with single gpu/cpu models.

resolved conflict between pytorch cuda device ordering and CUDA_VISIB…

3489d6a

…LE_DEVICES environment variable.

adalca requested a review from ahoopes February 24, 2020 13:23

adalca added pytorch voxelmorph labels Feb 24, 2020

ahoopes approved these changes Feb 24, 2020

View reviewed changes

ahoopes merged commit f61ec34 into voxelmorph:redesign May 5, 2020

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Multi-gpu support for pytorch #153

Multi-gpu support for pytorch #153

SteffenCzolbe commented Feb 21, 2020

adalca commented Feb 24, 2020

Multi-gpu support for pytorch #153

Multi-gpu support for pytorch #153

Conversation

SteffenCzolbe commented Feb 21, 2020

New supported functions:

Verification:

adalca commented Feb 24, 2020