-
Notifications
You must be signed in to change notification settings - Fork 60
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Error resuming Diffusion-InsGen #6
Comments
Hi @GilesBathgate, Your fix is correct. The following error for resuming is originated from InsGen: genforce/insgen#6. I tried to fix it but I didn't work it out. I guess the error is from the |
Thanks a lot! Will make the change. |
The fix should probably be this: --- a/diffusion-insgen/training/training_loop.py
+++ b/diffusion-insgen/training/training_loop.py
@@ -154,22 +154,22 @@ def training_loop(
# Construct networks.
if rank == 0:
print('Constructing networks...')
common_kwargs = dict(c_dim=training_set.label_dim, img_resolution=training_set.resolution, img_channels=training_set.num_channels)
G = dnnlib.util.construct_class_by_name(**G_kwargs, **common_kwargs).train().requires_grad_(False).to(device) # subclass of torch.nn.Module
D = dnnlib.util.construct_class_by_name(**D_kwargs, **common_kwargs).train().requires_grad_(False).to(device) # subclass of torch.nn.Module
G_ema = copy.deepcopy(G).eval()
# Construct contrastive heads.
- DHead = dnnlib.util.construct_class_by_name(**DHead_kwargs).train().to(device) if DHead_kwargs is not None else None
- GHead = dnnlib.util.construct_class_by_name(**GHead_kwargs).train().to(device) if GHead_kwargs is not None else None
+ DHead = dnnlib.util.construct_class_by_name(**DHead_kwargs).train().requires_grad_(False).to(device) if DHead_kwargs is not None else None
+ GHead = dnnlib.util.construct_class_by_name(**GHead_kwargs).train().requires_grad_(False).to(device) if GHead_kwargs is not None else None
D_ema = copy.deepcopy(D).eval()
# Setup augmentation.
@@ -221,6 +224,8 @@ def training_loop(
ddp_modules[name] = module
# Distribute Heads across GPUs.
+ DHead.requires_grad_(True)
+ GHead.requires_grad_(True)
if rank == 0:
print(f'Distributing Contrastive Heads across {num_gpus} GPUS...')
if num_gpus > 1: This seems to fit the intent of the original stylegan code better. |
@GilesBathgate Really appreciate your investigation here 💯. I will test the code and update accordingly. |
This fix seems not working when saving ckpts? Do you know what could be the possible reason @GilesBathgate ?
|
I only have 1 GPU so was not using in distributed mode. So perhaps that's why. I had to make another patch for support only 1 gpu |
Thanks @GilesBathgate ! I remembered that you have one another fix, which finds some in-place operation of InsGen in its Constrastive_Head. The fix was deleted (I don't know. ...). Do you mind share it agian and I can try that one. I didn't find where it is, lol. Thanks again! |
I proposed a change here:
Essentially disable grad before copying then re-enable. However, I prefer the fix above, which should have the same effect as grad should not be enabled before misc.copy_params_and_buffers is called, as is the case for the other modules. I don't think either of these fixes will solve your error in misc.check_ddp_consistency |
--- a/diffusion-insgen/torch_utils/misc.py
+++ b/diffusion-insgen/torch_utils/misc.py
@@ -150,7 +150,9 @@ def copy_params_and_buffers(src_module, dst_module, require_all=False):
for name, tensor in named_params_and_buffers(dst_module):
assert (name in src_tensors) or (not require_all)
if name in src_tensors:
- tensor.copy_(src_tensors[name].detach()).requires_grad_(tensor.requires_grad)
+ requires_grad = tensor.requires_grad
+ with torch.no_grad():
+ tensor.copy_(src_tensors[name].detach())
+ tensor.requires_grad_(requires_grad) |
I think the fix I just to add:
But then the following error occurs:
Furthermore this issue also appears to be present in the upstream version of insgen.
The text was updated successfully, but these errors were encountered: