#### use multiple GPUs using DataParallel
- It's natural to execute your forward, backward popagaions on multiple GPUs.
- 모델과 데이터를 멀티 GPU를 사용하는 건 다르게 생각해야함

In [18]:
import torch
import torch.nn as nn
from torch.utils.data import Dataset, DataLoader

In [19]:
input_size = 5
output_size = 2
batch_size = 30
data_size = 100

In [20]:
class RandomDataset(Dataset):
    def __init__(self, size, length):
        self.len = length
        self.data = torch.randn(length, size)
    
    def __getitem__(self, index):
        return self.data[index]
    
    def __len__(self):
        return self.len
rand_loader = DataLoader(dataset=RandomDataset(input_size, data_size), batch_size=batch_size, shuffle=True)

In [21]:
class Model(nn.Module):
    def __init__(self, input_size, output_size):
        super(Model, self).__init__()
        self.fc = nn.Linear(input_size, output_size)
    
    def forward(self, input):
        output = self.fc(input)
        print('In Model: input size', input.size(), "output size", output.size())
        return output

In [22]:
model = Model(input_size, output_size)
if torch.cuda.device_count() > 1:
    print("Let's use", torch.cuda.device_count(), "Gpus!")
    model = nn.DataParallel(model)

Let's use 2 Gpus!


In [23]:
model.to(device)

DataParallel(
  (module): Model(
    (fc): Linear(in_features=5, out_features=2, bias=True)
  )
)

In [24]:
for data in rand_loader:
    input = data.to(device)
    output = model(input)
    print("Outsize: input size", input.size(), "output_size", output.size())

In Model: input size torch.Size([15, 5]) output size torch.Size([15, 2])
In Model: input size torch.Size([15, 5]) output size torch.Size([15, 2])
Outsize: input size torch.Size([30, 5]) output_size torch.Size([30, 2])
In Model: input size torch.Size([15, 5]) output size torch.Size([15, 2])
In Model: input size torch.Size([15, 5]) output size torch.Size([15, 2])
Outsize: input size torch.Size([30, 5]) output_size torch.Size([30, 2])
In Model: input size torch.Size([15, 5]) output size torch.Size([15, 2])
In Model: input size torch.Size([15, 5]) output size torch.Size([15, 2])
Outsize: input size torch.Size([30, 5]) output_size torch.Size([30, 2])
In Model: input size torch.Size([5, 5]) output size torch.Size([5, 2])
In Model: input size torch.Size([5, 5]) output size torch.Size([5, 2])
Outsize: input size torch.Size([10, 5]) output_size torch.Size([10, 2])


- splits single model onto different GPUs, rather than replicating the entire model on each GPU
    - 10 layer 모델에서 DataParallel만 사용 시 각각 배치만큼 GPU가 데이터를 분할해서 가져가지면 model은 전체를 replicate를 해서 학습한다.
    - 반면에 model을 분산처리하면 각각 GPU가 5개씩 담당한다.
- The high level idea of model parallel is to place different sub-networks of a model onto different devices, and implement the forward method accordingly to move intermediate outputs across divices.

In [None]:
import torch
import torch.nn as nn
import torch.optim as optim
class ToyModel(nn.Module):
    def __init__(self):
        super(ToyModel, self).__init__()
        self.net1 = torch.nn.Linear(10,10).to('cuda:0')
        self.relu = torch.nn.ReLU()
        self.net2 = torch.nn.Linear(10,5).to('cuda:1')
    
    def forward(self, x):
        x = self.relu(self.net1(x.to(cuda:0)))
        x = self.net2(x.to('cuda:1')) #모델을 태우는걸 GPU를 나눠 버린다, default를 설정하면 알아서 되는건가?
        return x
    