**Data parallelism** is when you use the same model for every thread, but feed it with different parts of the data; 

**model parallelism** is when you use the same data for every thread, but split the model among threads

https://pytorch.org/tutorials/beginner/blitz/data_parallel_tutorial.html

In [1]:
import os

In [2]:
os.environ['CUDA_VISIBLE_DEVICES']

'0'

In [3]:
os.environ['CUDA_VISIBLE_DEVICES']="0,1"

In [4]:
import torch

In [5]:
device = torch.device("cuda:0")
# model.to(device) # convert model to run on GPU is as simple as this

In [6]:
# mytensor = my_tensor.to(device) ## copy all your tensors to the GPU

Please note that just calling ```my_tensor.to(device)``` returns a new copy of my_tensor on GPU instead of rewriting my_tensor. You need to assign it to a new tensor and use that tensor on the GPU.

It’s natural to execute your forward, backward propagations on multiple GPUs. However, Pytorch will only use one GPU by default. You can easily run your operations on multiple GPUs by making your model run parallelly using ```DataParallel```

In [7]:
# model = nn.DataParallel(model)

In [8]:
import torch
import torch.nn as nn
from torch.utils.data import Dataset, DataLoader

# Parameters and DataLoaders
input_size = 5
output_size = 2

batch_size = 30
data_size = 100

In [9]:
device = torch.device("cuda:0" if torch.cuda.is_available() else "cpu")

In [10]:
device

device(type='cuda', index=0)

#### Dummy dataset

In [11]:
class RandomDataset(Dataset):

    def __init__(self, size, length):
        self.len = length
        self.data = torch.randn(length, size)

    def __getitem__(self, index):
        return self.data[index]

    def __len__(self):
        return self.len

rand_loader = DataLoader(dataset=RandomDataset(input_size, data_size),
                         batch_size=batch_size, shuffle=True)

#### Simple Model (dense)

In [12]:
class Model(nn.Module):
    # Our model

    def __init__(self, input_size, output_size):
        super(Model, self).__init__()
        self.forward_count=0
        self.fc = nn.Linear(input_size, output_size)

    def forward(self, input):
        output = self.fc(input)
        self.forward_count+=1
        print("\n[Forward Pass] Count", self.forward_count,"In Model: input size", input.size(),
              "output size", output.size())

        return output

This is the core part of the tutorial. First, we need to make a model instance and check if we have multiple GPUs. **If we have multiple GPUs, we can wrap our model using nn.DataParallel**. Then we can put our model on GPUs by model.to(device)

In [13]:
model = Model(input_size, output_size)

In [14]:
if torch.cuda.device_count() > 1:
    print("Let's use", torch.cuda.device_count(), "GPUs!")
    # dim = 0 [30, xxx] -> [10, ...], [10, ...], [10, ...] on 3 GPUs
    model = nn.DataParallel(model)
    
model.to(device)

Let's use 2 GPUs!


DataParallel(
  (module): Model(
    (fc): Linear(in_features=5, out_features=2, bias=True)
  )
)

In [15]:
model.device_ids

[0, 1]

In [16]:
for data in rand_loader:
    input = data.to(device)
    output = model(input)
    print("\n[Top Level] Outside: input size", input.size(),
          "output_size", output.size())


[Forward Pass] Count
[Forward Pass] Count 1 In Model: input size torch.Size([15, 5]) output size torch.Size([15, 2])
 1 In Model: input size torch.Size([15, 5]) output size torch.Size([15, 2])

[Top Level] Outside: input size torch.Size([30, 5]) output_size torch.Size([30, 2])

[Forward Pass] Count 
[Forward Pass] Count1 In Model: input size torch.Size([15, 5]) output size torch.Size([15, 2])
 1 In Model: input size torch.Size([15, 5]) output size torch.Size([15, 2])

[Top Level] Outside: input size torch.Size([30, 5]) output_size torch.Size([30, 2])

[Forward Pass] Count 
[Forward Pass] Count1 In Model: input size torch.Size([15, 5]) output size torch.Size([15, 2])
 1 In Model: input size torch.Size([15, 5]) output size torch.Size([15, 2])

[Top Level] Outside: input size torch.Size([30, 5]) output_size torch.Size([30, 2])

[Forward Pass] Count
[Forward Pass] Count  1 In Model: input size torch.Size([5, 5]) output size torch.Size([5, 2])
1 In Model: input size torch.Size([5, 5]) outp