New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Model is moved to GPU after the optimizer is instatiated, resulting in a performance hit. #82
Comments
Thank you for reporting this, @schlabrendorff. I'll look more into it and make a PR handling this soon. |
Reproduced issue with this script. import time
import torch
import torch.nn as nn
from torchvision.models import resnet152
from tqdm import tqdm
device = torch.device('cuda')
def run_training(model, optimizer, steps, batch_size=10):
data = torch.randn(batch_size, 3, 224, 224).to(device)
target = torch.LongTensor([1]*batch_size).to(device)
loss_fn = nn.NLLLoss()
for _ in tqdm(range(steps)):
optimizer.zero_grad()
output = model(data)
loss = loss_fn(output, target)
loss.backward()
optimizer.step()
def optim_before_cuda(steps, batch_size):
print('optim_before_cuda')
model = resnet152()
model = model.to(device)
optimizer = torch.optim.Adam(model.parameters())
run_training(model, optimizer, steps, batch_size)
def optim_after_cuda(steps, batch_size):
print('optim_after_cuda')
model = resnet152()
optimizer = torch.optim.Adam(model.parameters())
model = model.to(device)
run_training(model, optimizer, steps, batch_size)
if __name__ == '__main__':
steps = 200
batch_size = 16
start = time.time()
optim_after_cuda(steps, batch_size)
# optim_before_cuda(steps, batch_size)
end = time.time()
print(end - start) The difference in speed was small, but GPU utilization was significantly unstable when optimizer initialization preceded |
@schlabrendorff I made PR #83 handling this issue. |
@SunQpark Looks good!! Thank you! |
I noticed that the optimizer is instantiated before the model is moved to the GPU.
This is contrary to the PyTorch docs:
I noticed the problem on my machine, because I had fluctuating GPU utilization (checked with
nvtop
). The utilization jumped every couple seconds from 10-20% to 80% and back. Moving the model to cuda beforehand (intrain.py
) fixed issue for me. (Afterwards the utilization never dropped under 70%).Can you reproduce the behavior?
The text was updated successfully, but these errors were encountered: