In this tutorial, we will learn how to use multiple GPUs using DataParallel.

It’s very easy to use GPUs with PyTorch. You can put the model on a GPU:

```
device = torch.device("cuda:0" if torch.cuda.is_available() else "cpu")
model.to(device)
```

Then, you can copy all your tensors to the GPU:

```
mytensor = my_tensor.to(device)
```

Please note that just calling *my_tensor.to(device)* returns a new copy of *my_tensor* on GPU instead of rewriting *my_tensor*. You need to assign it to a new tensor and use that tensor on the GPU.

It’s natural to execute your forward, backward propagations on multiple GPUs. However, Pytorch will only use one GPU by default. You can easily run your operations on multiple GPUs by making your model run parallelly using *DataParallel*:

```
model = nn.DataParallel(model)
```

That’s the core behind this tutorial. We will explore it in more detail below.

# Imports and parameters

Import PyTorch modules and define parameters.

In [1]:
import torch
import torch.nn as nn
from torch.utils.data import Dataset, DataLoader

# Parameters and DataLoaders
input_size = 5
output_size = 2

batch_size = 30
data_size = 100

# device
device = torch.device("cuda:0" if torch.cuda.is_available() else "cpu")

# Dummy DataSet

Make a dummy (random) dataset. You just need to implement the getitem

In [2]:
class RandomDataset(Dataset):

    def __init__(self, size, length):
        self.len = length
        self.data = torch.randn(length, size)

    def __getitem__(self, index):
        return self.data[index]

    def __len__(self):
        return self.len

rand_loader = DataLoader(dataset=RandomDataset(input_size, data_size),
                         batch_size=batch_size, shuffle=True)

# Simple Model

For the demo, our model just gets an input, performs a linear operation, and gives an output. However, you can use DataParallel on any model (CNN, RNN, Capsule Net etc.)

We’ve placed a print statement inside the model to monitor the size of input and output tensors. Please pay attention to what is printed at batch rank 0.

In [3]:
class Model(nn.Module):
    # Our model

    def __init__(self, input_size, output_size):
        super(Model, self).__init__()
        self.fc = nn.Linear(input_size, output_size)

    def forward(self, input):
        output = self.fc(input)
        print("\tIn Model: input size", input.size(),
              "output size", output.size())

        return output

# Create Model and DataParallel

This is the core part of the tutorial. First, we need to make a model instance and check if we have multiple GPUs. If we have multiple GPUs, we can wrap our model using *nn.DataParallel*. Then we can put our model on GPUs by *model.to(device)*

In [4]:
model = Model(input_size, output_size)
if torch.cuda.device_count() > 1:
    print("Let's use", torch.cuda.device_count(), "GPUs!")
    # dim = 0 [30, xxx] -> [10, ...], [10, ...], [10, ...] on 3 GPUs
    model = nn.DataParallel(model)

model.to(device)

Let's use 3 GPUs!


DataParallel(
  (module): Model(
    (fc): Linear(in_features=5, out_features=2, bias=True)
  )
)

# Run the Model

Now we can see the sizes of input and output tensors.

In [5]:
for data in rand_loader:
    input = data.to(device)
    output = model(input)
    print("Outside: input size", input.size(),
          "output_size", output.size())

RuntimeError: CUDA error: out of memory (malloc at /opt/conda/conda-bld/pytorch_1556653183467/work/c10/cuda/CUDACachingAllocator.cpp:241)
frame #0: c10::Error::Error(c10::SourceLocation, std::string const&) + 0x45 (0x7f05100d9dc5 in /home/anhaoran/anaconda3/lib/python3.6/site-packages/torch/lib/libc10.so)
frame #1: <unknown function> + 0x16901 (0x7f05087a4901 in /home/anhaoran/anaconda3/lib/python3.6/site-packages/torch/lib/libc10_cuda.so)
frame #2: <unknown function> + 0x17347 (0x7f05087a5347 in /home/anhaoran/anaconda3/lib/python3.6/site-packages/torch/lib/libc10_cuda.so)
frame #3: at::native::empty_cuda(c10::ArrayRef<long>, c10::TensorOptions const&) + 0x274 (0x7f0515c86f34 in /home/anhaoran/anaconda3/lib/python3.6/site-packages/torch/lib/libcaffe2_gpu.so)
frame #4: at::CUDAType::empty(c10::ArrayRef<long>, c10::TensorOptions const&) const + 0x19b (0x7f05146844cb in /home/anhaoran/anaconda3/lib/python3.6/site-packages/torch/lib/libcaffe2_gpu.so)
frame #5: torch::autograd::VariableType::empty(c10::ArrayRef<long>, c10::TensorOptions const&) const + 0x268 (0x7f05093e5bf8 in /home/anhaoran/anaconda3/lib/python3.6/site-packages/torch/lib/libtorch.so.1)
frame #6: at::native::to(at::Tensor const&, c10::TensorOptions const&, bool, bool) + 0x506 (0x7f0510aa85d6 in /home/anhaoran/anaconda3/lib/python3.6/site-packages/torch/lib/libcaffe2.so)
frame #7: at::TypeDefault::to(at::Tensor const&, c10::TensorOptions const&, bool, bool) const + 0x17 (0x7f0510d277c7 in /home/anhaoran/anaconda3/lib/python3.6/site-packages/torch/lib/libcaffe2.so)
frame #8: torch::autograd::VariableType::to(at::Tensor const&, c10::TensorOptions const&, bool, bool) const + 0x28c (0x7f05092d112c in /home/anhaoran/anaconda3/lib/python3.6/site-packages/torch/lib/libtorch.so.1)
frame #9: torch::cuda::scatter(at::Tensor const&, c10::ArrayRef<long>, c10::optional<std::vector<long, std::allocator<long> > > const&, long, c10::optional<std::vector<c10::optional<c10::cuda::CUDAStream>, std::allocator<c10::optional<c10::cuda::CUDAStream> > > > const&) + 0x489 (0x7f0509825689 in /home/anhaoran/anaconda3/lib/python3.6/site-packages/torch/lib/libtorch.so.1)
frame #10: <unknown function> + 0x5a7551 (0x7f0536321551 in /home/anhaoran/anaconda3/lib/python3.6/site-packages/torch/lib/libtorch_python.so)
frame #11: <unknown function> + 0x12ce4a (0x7f0535ea6e4a in /home/anhaoran/anaconda3/lib/python3.6/site-packages/torch/lib/libtorch_python.so)
frame #12: _PyCFunction_FastCallDict + 0x154 (0x55fa5d3229e4 in /home/anhaoran/anaconda3/bin/python)
frame #13: <unknown function> + 0x19cdfc (0x55fa5d3afdfc in /home/anhaoran/anaconda3/bin/python)
frame #14: _PyEval_EvalFrameDefault + 0x2fa (0x55fa5d3d494a in /home/anhaoran/anaconda3/bin/python)
frame #15: <unknown function> + 0x196206 (0x55fa5d3a9206 in /home/anhaoran/anaconda3/bin/python)
frame #16: <unknown function> + 0x1971cf (0x55fa5d3aa1cf in /home/anhaoran/anaconda3/bin/python)
frame #17: <unknown function> + 0x19ced5 (0x55fa5d3afed5 in /home/anhaoran/anaconda3/bin/python)
frame #18: _PyEval_EvalFrameDefault + 0x2fa (0x55fa5d3d494a in /home/anhaoran/anaconda3/bin/python)
frame #19: PyEval_EvalCodeEx + 0x329 (0x55fa5d3aacb9 in /home/anhaoran/anaconda3/bin/python)
frame #20: <unknown function> + 0x198ac4 (0x55fa5d3abac4 in /home/anhaoran/anaconda3/bin/python)
frame #21: PyObject_Call + 0x3e (0x55fa5d3227ee in /home/anhaoran/anaconda3/bin/python)
frame #22: THPFunction_apply(_object*, _object*) + 0x691 (0x7f0536129081 in /home/anhaoran/anaconda3/lib/python3.6/site-packages/torch/lib/libtorch_python.so)
frame #23: _PyCFunction_FastCallDict + 0x91 (0x55fa5d322921 in /home/anhaoran/anaconda3/bin/python)
frame #24: <unknown function> + 0x19cdfc (0x55fa5d3afdfc in /home/anhaoran/anaconda3/bin/python)
frame #25: _PyEval_EvalFrameDefault + 0x2fa (0x55fa5d3d494a in /home/anhaoran/anaconda3/bin/python)
frame #26: <unknown function> + 0x19662e (0x55fa5d3a962e in /home/anhaoran/anaconda3/bin/python)
frame #27: _PyFunction_FastCallDict + 0x1bc (0x55fa5d3aa67c in /home/anhaoran/anaconda3/bin/python)
frame #28: _PyObject_FastCallDict + 0x26f (0x55fa5d322daf in /home/anhaoran/anaconda3/bin/python)
frame #29: <unknown function> + 0x12afd2 (0x55fa5d33dfd2 in /home/anhaoran/anaconda3/bin/python)
frame #30: PyIter_Next + 0xe (0x55fa5d3665be in /home/anhaoran/anaconda3/bin/python)
frame #31: PySequence_Tuple + 0xf9 (0x55fa5d36b379 in /home/anhaoran/anaconda3/bin/python)
frame #32: _PyEval_EvalFrameDefault + 0x547b (0x55fa5d3d9acb in /home/anhaoran/anaconda3/bin/python)
frame #33: <unknown function> + 0x19662e (0x55fa5d3a962e in /home/anhaoran/anaconda3/bin/python)
frame #34: <unknown function> + 0x1971cf (0x55fa5d3aa1cf in /home/anhaoran/anaconda3/bin/python)
frame #35: <unknown function> + 0x19ced5 (0x55fa5d3afed5 in /home/anhaoran/anaconda3/bin/python)
frame #36: _PyEval_EvalFrameDefault + 0x2fa (0x55fa5d3d494a in /home/anhaoran/anaconda3/bin/python)
frame #37: <unknown function> + 0x19662e (0x55fa5d3a962e in /home/anhaoran/anaconda3/bin/python)
frame #38: <unknown function> + 0x1971cf (0x55fa5d3aa1cf in /home/anhaoran/anaconda3/bin/python)
frame #39: <unknown function> + 0x19ced5 (0x55fa5d3afed5 in /home/anhaoran/anaconda3/bin/python)
frame #40: _PyEval_EvalFrameDefault + 0x2fa (0x55fa5d3d494a in /home/anhaoran/anaconda3/bin/python)
frame #41: <unknown function> + 0x196206 (0x55fa5d3a9206 in /home/anhaoran/anaconda3/bin/python)
frame #42: <unknown function> + 0x1971cf (0x55fa5d3aa1cf in /home/anhaoran/anaconda3/bin/python)
frame #43: <unknown function> + 0x19ced5 (0x55fa5d3afed5 in /home/anhaoran/anaconda3/bin/python)
frame #44: _PyEval_EvalFrameDefault + 0x10c5 (0x55fa5d3d5715 in /home/anhaoran/anaconda3/bin/python)
frame #45: <unknown function> + 0x196f8b (0x55fa5d3a9f8b in /home/anhaoran/anaconda3/bin/python)
frame #46: <unknown function> + 0x19ced5 (0x55fa5d3afed5 in /home/anhaoran/anaconda3/bin/python)
frame #47: _PyEval_EvalFrameDefault + 0x2fa (0x55fa5d3d494a in /home/anhaoran/anaconda3/bin/python)
frame #48: <unknown function> + 0x196206 (0x55fa5d3a9206 in /home/anhaoran/anaconda3/bin/python)
frame #49: _PyFunction_FastCallDict + 0x3d7 (0x55fa5d3aa897 in /home/anhaoran/anaconda3/bin/python)
frame #50: _PyObject_FastCallDict + 0x26f (0x55fa5d322daf in /home/anhaoran/anaconda3/bin/python)
frame #51: _PyObject_Call_Prepend + 0x63 (0x55fa5d327a73 in /home/anhaoran/anaconda3/bin/python)
frame #52: PyObject_Call + 0x3e (0x55fa5d3227ee in /home/anhaoran/anaconda3/bin/python)
frame #53: _PyEval_EvalFrameDefault + 0x1abb (0x55fa5d3d610b in /home/anhaoran/anaconda3/bin/python)
frame #54: <unknown function> + 0x196206 (0x55fa5d3a9206 in /home/anhaoran/anaconda3/bin/python)
frame #55: _PyFunction_FastCallDict + 0x1bc (0x55fa5d3aa67c in /home/anhaoran/anaconda3/bin/python)
frame #56: _PyObject_FastCallDict + 0x26f (0x55fa5d322daf in /home/anhaoran/anaconda3/bin/python)
frame #57: _PyObject_Call_Prepend + 0x63 (0x55fa5d327a73 in /home/anhaoran/anaconda3/bin/python)
frame #58: PyObject_Call + 0x3e (0x55fa5d3227ee in /home/anhaoran/anaconda3/bin/python)
frame #59: <unknown function> + 0x16b897 (0x55fa5d37e897 in /home/anhaoran/anaconda3/bin/python)
frame #60: _PyObject_FastCallDict + 0x8b (0x55fa5d322bcb in /home/anhaoran/anaconda3/bin/python)
frame #61: <unknown function> + 0x19cf4e (0x55fa5d3aff4e in /home/anhaoran/anaconda3/bin/python)
frame #62: _PyEval_EvalFrameDefault + 0x2fa (0x55fa5d3d494a in /home/anhaoran/anaconda3/bin/python)
frame #63: PyEval_EvalCodeEx + 0x329 (0x55fa5d3aacb9 in /home/anhaoran/anaconda3/bin/python)


# Summary

DataParallel splits your data automatically and sends job orders to multiple models on several GPUs. After each model finishes their job, DataParallel collects and merges the results before returning it to you.

For more information, please check out https://pytorch.org/tutorials/beginner/former_torchies/parallelism_tutorial.html.