Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Getting error in d_loss.backward() of first_stage training #260

Open
SandyPanda-MLDL opened this issue Jun 24, 2024 · 0 comments
Open

Getting error in d_loss.backward() of first_stage training #260

SandyPanda-MLDL opened this issue Jun 24, 2024 · 0 comments

Comments

@SandyPanda-MLDL
Copy link

I am getting error in the following part of the first training stage code:

discriminator loss

        if epoch >= TMA_epoch:
            torch.autograd.set_detect_anomaly(True)
            optimizer.zero_grad()
            d_loss = dl(wav.detach().clone().unsqueeze(1).float(), y_rec.detach().clone()).mean()
            #d_loss = dl(wav.detach().unsqueeze(1).float(), y_rec.detach()).mean()
            #accelerator.backward(d_loss)
            print(f'Discriminator loss is {d_loss} and requires_grad is {d_loss.requires_grad} and epoch is {epoch}')
            d_loss.backward()
            optimizer.step('msd')
            optimizer.step('mpd')
            print(f'Discriminator loss is {d_loss} and requires_grad is {d_loss.requires_grad} and epoch is {epoch}')
        else:
            d_loss = 0
            print(f'Else discriminator loss is {d_loss} and epoch is {epoch}')

The error is:
File "train_first.py", line 293, in main
d_loss.backward()
File "/hdd5/Sandipan/envs/styletts1/lib/python3.7/site-packages/torch/_tensor.py", line 489, in backward
self, gradient, retain_graph, create_graph, inputs=inputs
File "/hdd5/Sandipan/envs/styletts1/lib/python3.7/site-packages/torch/autograd/init.py", line 199, in backward
allow_unreachable=True, accumulate_grad=True) # Calls into the C++ engine to run the backward pass
RuntimeError: one of the variables needed for gradient computation has been modified by an inplace operation: [torch.cuda.FloatTensor [2, 1, 6000, 5]] is at version 6; expected version 0 instead. Hint: the backtrace further above shows the operation that failed to compute its gradient. The variable in question was changed in there or anywhere later. Good luck!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

1 participant