Issue with vggish checkpoint #13

luc-leonard · 2022-02-03T15:53:49Z

Hello.

the vggishish_lpaps checkpoint is used here:

SpecVQGAN/specvqgan/modules/losses/lpaps.py

Line 35 in eee222d

self.load_state_dict(torch.load(ckpt, map_location=torch.device("cpu")), strict=False)
SpecVQGAN/specvqgan/modules/losses/lpaps.py

Line 135 in eee222d

model.load_state_dict(ckpt, strict=False)

Errors are ignored in the code, but neither lpaps, nor vggishish manage to load it.

The checkpoint URL is here:

    
           'vggishish_lpaps': 'https://a3s.fi/swift/v1/AUTH_a235c0f452d648828f745589cde1219a/specvqgan_public/vggishish16.pt',

The vggish weights can be found under the 'model' key, but I cannot find the lpaps weights anywhere in here. Are they not required ?

Best regards,

v-iashin · 2022-02-03T17:40:35Z

Hi, I checked the code and I think you are right! Thanks a lot for the catch! I will commit the fixes.

luc-leonard · 2022-02-03T18:07:03Z

Thanks you very much for the very quick answer and fix :D

jwliu-cc · 2022-02-24T05:16:09Z

The loss is going to 'nan' when i load the correct ckpt, do you have this problem? I trained on VAS dataset.

yangdongchao · 2022-04-18T09:35:03Z

Hi, I checked the code and I think you are right! Thanks a lot for the catch! I will commit the fixes.

Hi, I want to ask about the parameter of lpaps. The vggishish16 model is trained by vggsound. I want to know how you get the parameter of followwing layers? Whether you directly use the pre-trained model from taming transformer
self.lin0 = NetLinLayer(self.chns[0], use_dropout=use_dropout) self.lin1 = NetLinLayer(self.chns[1], use_dropout=use_dropout) self.lin2 = NetLinLayer(self.chns[2], use_dropout=use_dropout) self.lin3 = NetLinLayer(self.chns[3], use_dropout=use_dropout) self.lin4 = NetLinLayer(self.chns[4], use_dropout=use_dropout)

v-iashin · 2022-04-18T12:52:19Z

You may train them by adapting https://github.com/richzhang/PerceptualSimilarity script.

yangdongchao · 2022-04-18T13:15:52Z

You may train them by adapting https://github.com/richzhang/PerceptualSimilarity script.

Can you share the code that you use vggsound dataset to train lpaps?

v-iashin · 2022-04-18T18:34:01Z

Ok, I managed to look into this issue for a bit more.

Thanks to your questions I discovered that this problem is actually deeper than I originally anticipated. It seems that I completely missed that NetLinLayer layers have trainable parameters and only relied on training VGGishish. I think because the code did not complain about loading the checkpoint, as the topic starter noticed, I just moved on.

What happens is that these layers are actually randomly inited and, luckily, the model could even train to such great quality — thanks to the GAN loss. This means, that you can just drop the perceptual loss from the model and it will train much faster and to the same performance. On the practical side, it seems that having this dorky loss you may still get a bit of a boost in quality.

yangdongchao · 2022-04-19T04:31:10Z

Ok, I managed to look into this issue for a bit more.

Thanks to your questions I discovered that this problem is actually deeper than I originally anticipated. It seems that I completely missed that NetLinLayer layers have trainable parameters and only relied on training VGGishish. I think because the code did not complain about loading the checkpoint, as the topic starter noticed, I just moved on.

What happens is that these layers are actually randomly inited and, luckily, the model could even train to such great quality — thanks to the GAN loss. This means, that you can just drop the perceptual loss from the model and it will train much faster and to the same performance. On the practical side, it seems that having this dorky loss you may still get a bit of a boost in quality.

Thanks for your reply. I understand it.

v-iashin · 2022-06-12T07:28:47Z

Today I had a chance to inspect the issue a bit more thanks to @jhyau.

It seems that @jwliu-cc was right and these fixes let codebook training diverge to nans. For this reason, I am resetting the commits mentioned in this issue to the initial well-tested state despite having this nasty bug with vggish and lpaps checkpoint loading 🙁 .

Current solution:
perceptual_weight=0.0

This means that those who want to build upon SpecVQGAN could turn off the perceptual loss by setting the weight to zero and benefit from a significant speedup during training. This, however, would yield slightly different results which, according to our ablations, are still strong.

I also added a notice about it in README for other people to see.

This reverts commit 3894458. Reverting due to seeing nans in loss during codebook training

v-iashin added a commit that referenced this issue Feb 3, 2022

lpaps ckpt, fixes in loading state dicts (#13)

3894458

jhyau mentioned this issue Jun 11, 2022

Loss becoming "nan" during codebook training? #20

Closed

jhyau added a commit to jhyau/SpecVQGAN that referenced this issue Jun 12, 2022

Revert "lpaps ckpt, fixes in loading state dicts (v-iashin#13)"

53f938e

This reverts commit 3894458. Reverting due to seeing nans in loss during codebook training

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Issue with vggish checkpoint #13

Issue with vggish checkpoint #13

luc-leonard commented Feb 3, 2022

v-iashin commented Feb 3, 2022

luc-leonard commented Feb 3, 2022

jwliu-cc commented Feb 24, 2022

yangdongchao commented Apr 18, 2022 •

edited

Loading

v-iashin commented Apr 18, 2022

yangdongchao commented Apr 18, 2022

v-iashin commented Apr 18, 2022

yangdongchao commented Apr 19, 2022

v-iashin commented Jun 12, 2022

Issue with vggish checkpoint #13

Issue with vggish checkpoint #13

Comments

luc-leonard commented Feb 3, 2022

v-iashin commented Feb 3, 2022

luc-leonard commented Feb 3, 2022

jwliu-cc commented Feb 24, 2022

yangdongchao commented Apr 18, 2022 • edited Loading

v-iashin commented Apr 18, 2022

yangdongchao commented Apr 18, 2022

v-iashin commented Apr 18, 2022

yangdongchao commented Apr 19, 2022

v-iashin commented Jun 12, 2022

yangdongchao commented Apr 18, 2022 •

edited

Loading