-
Notifications
You must be signed in to change notification settings - Fork 40
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Issue with vggish checkpoint #13
Comments
Hi, I checked the code and I think you are right! Thanks a lot for the catch! I will commit the fixes. |
Thanks you very much for the very quick answer and fix :D |
The loss is going to 'nan' when i load the correct ckpt, do you have this problem? I trained on VAS dataset. |
Hi, I want to ask about the parameter of lpaps. The vggishish16 model is trained by vggsound. I want to know how you get the parameter of followwing layers? Whether you directly use the pre-trained model from taming transformer |
You may train them by adapting https://github.com/richzhang/PerceptualSimilarity script. |
Can you share the code that you use vggsound dataset to train lpaps? |
Ok, I managed to look into this issue for a bit more. Thanks to your questions I discovered that this problem is actually deeper than I originally anticipated. It seems that I completely missed that What happens is that these layers are actually randomly inited and, luckily, the model could even train to such great quality — thanks to the GAN loss. This means, that you can just drop the perceptual loss from the model and it will train much faster and to the same performance. On the practical side, it seems that having this dorky loss you may still get a bit of a boost in quality. |
Thanks for your reply. I understand it. |
Today I had a chance to inspect the issue a bit more thanks to @jhyau. It seems that @jwliu-cc was right and these fixes let codebook training diverge to nans. For this reason, I am resetting the commits mentioned in this issue to the initial well-tested state despite having this nasty bug with vggish and lpaps checkpoint loading 🙁 . Current solution: This means that those who want to build upon SpecVQGAN could turn off the perceptual loss by setting the weight to zero and benefit from a significant speedup during training. This, however, would yield slightly different results which, according to our ablations, are still strong. I also added a notice about it in README for other people to see. |
This reverts commit 3894458. Reverting due to seeing nans in loss during codebook training
Hello.
the vggishish_lpaps checkpoint is used here:
SpecVQGAN/specvqgan/modules/losses/lpaps.py
Line 35 in eee222d
SpecVQGAN/specvqgan/modules/losses/lpaps.py
Line 135 in eee222d
Errors are ignored in the code, but neither lpaps, nor vggishish manage to load it.
The checkpoint URL is here:
SpecVQGAN/specvqgan/util.py
Line 8 in eee222d
The vggish weights can be found under the 'model' key, but I cannot find the lpaps weights anywhere in here. Are they not required ?
Best regards,
The text was updated successfully, but these errors were encountered: