### Sumary
- 数据并行 vs 模型并行
    - 数据并行：模型拷贝（per device），数据split/chunk(batch上)
      - the module is replicated on each device, and each replica handles a different portion of the input data
      - during the backwards pass, gradients from each replica are summed into the original module
    - 模型并行：数据拷贝（per device）, 模型split/chunk（显然是单卡放不下模型的情况）
- DP -> DDP
    - DP: torch.nn.DataParallel
    - DDP: torch.DistributedDataParallel
    - Use nn.parallel.DistributedDataParallel instead of multiprocessing or nn.DataParallel and Distributed Data Parallel.

In [1]:
import torch
import torch.nn as nn
from torch.utils.data import Dataset, DataLoader

input_size = 5
output_size = 2

batch_size = 32
data_size = 100

In [2]:
device = torch.device('cuda:0' if torch.cuda.is_available() else 'cpu')
device

device(type='cuda', index=0)

In [3]:
class RandomDataset(Dataset):
    def __init__(self, size, length):
        self.len = length
        self.data = torch.randn(length, size)

    def __getitem__(self, index):
        # (5, )
        return self.data[index]

    def __len__(self):
        # 100
        return self.len


rand_loader = DataLoader(dataset=RandomDataset(input_size, data_size), batch_size=batch_size, shuffle=True)

In [4]:
next(iter(rand_loader)).shape

torch.Size([32, 5])

In [5]:
class Model(nn.Module):
    # Our model
    def __init__(self, input_size, output_size):
        # 5 -> 2
        super(Model, self).__init__()
        self.fc = nn.Linear(input_size, output_size)

    def forward(self, input):
        # Here the model will forward pass
        output = self.fc(input)
        print("\tIn Model: input size", input.size(), "output size", output.size())

        return output

### DataParallel

- https://pytorch.org/docs/stable/generated/torch.nn.DataParallel.html
    - device_ids=None
        - 参与训练的GPU有哪些，device_ids=gpus
    - output_device=None
        - 输出的GPU是哪个，output_device=gpus[0]
    - dim-0
- The parallelized module must have its parameters and buffers on device_ids[0] before running(forward/backward) this DataParallel module.
    - model.to('cuda:0')


In [6]:
model = Model(input_size, output_size)
if torch.cuda.device_count() > 1:
    print('Use', torch.cuda.device_count(), 'GPUs')
    model = nn.DataParallel(model)

Use 4 GPUs


In [7]:
model

DataParallel(
  (module): Model(
    (fc): Linear(in_features=5, out_features=2, bias=True)
  )
)

In [8]:
model.to(device)

DataParallel(
  (module): Model(
    (fc): Linear(in_features=5, out_features=2, bias=True)
  )
)

### tensors: to(device)

In [9]:
a = torch.randn(3, 4)
print('a.is_cuda', a.is_cuda)
b = a.to('cuda:0')
print('a.is_cuda', a.is_cuda)
print('b.is_cuda', b.is_cuda)

# a and b are different

a.is_cuda False
a.is_cuda False
b.is_cuda True


### models：to(device)

model.to device会改变之前的model的状态

In [10]:
a = Model(3, 4)
print(next(a.parameters()).is_cuda)
b = a.to('cuda:0')
print(next(a.parameters()).is_cuda)
print(next(b.parameters()).is_cuda)

False
True
True


In [11]:
for data in rand_loader:
    # input_var can be on any device, including CPU
    input = data.to(device)
    output = model(input)
    print("Outside: input size", input.size(), "output_size", output.size())

	In Model: input size torch.Size([8, 5]) output size torch.Size([8, 2])
	In Model: input size torch.Size([8, 5]) output size torch.Size([8, 2])
	In Model: input size torch.Size([8, 5]) output size torch.Size([8, 2])
	In Model: input size torch.Size([8, 5]) output size torch.Size([8, 2])
Outside: input size torch.Size([32, 5]) output_size torch.Size([32, 2])
	In Model: input size torch.Size([8, 5]) output size torch.Size([8, 2])
	In Model: input size torch.Size([8, 5]) output size torch.Size([8, 2])
	In Model: input size torch.Size([8, 5]) output size torch.Size([8, 2])
	In Model: input size torch.Size([8, 5]) output size torch.Size([8, 2])
Outside: input size torch.Size([32, 5]) output_size torch.Size([32, 2])
	In Model: input size torch.Size([8, 5]) output size torch.Size([8, 2])
	In Model: input size torch.Size([8, 5]) output size torch.Size([8, 2])
	In Model: input size torch.Size([8, 5]) output size torch.Size([8, 2])
	In Model: input size torch.Size([8, 5]) output size torch.Size(