New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
BigGAN: consistency regularization (SimCLR-style) loss #11
Comments
We can do SimCLR on G by treating a minibatch of generated samples as negative samples for each other (the negative loss), and by generating slightly-variant sibling images by slightly varying z vectors which serve as positive samples for each other (positive loss). This may enable removing the adversarial loss entirely, replacing it with a self-supervised perceptual loss. One idea that the ICR paper introduces is zCR-GAN: a CR-GAN where the Generator is given its own consistency loss, where 2 images generated from almost the same z latent vector are encouraged to have different D losses, as a way of fighting mode collapse and encouraging diversity. This raises a question: if you have SimCLR implemented in D, learning to create embeddings using a contrastive loss on data-augmented images as we outline above as a souped-up CR-GAN, why not add an embedding-related loss to the outputs of G as well, similar to zCR-GAN? In this idea, G takes random z latents, generates images and feeds them into D to get the SimCLR embedding. These embeddings can then be trained using a contrastive loss, updating parameters of G (while freezing D) to push the minibatch of sampled images apart from each other (each sample is a negative sample for all the others) This is fine, but using only negative samples would seem inadequate, and it might simply blow out the G. We also need a source of positive examples. Data augmentations won't work like it does for D because G is generating the same image, so there's nothing to train. However, G has built in data augmentation in the form of small random changes to a given z latent vector. So G can generate positive samples by simply jittering each z and generating similar images. This provides both positive & negative samples for a SimCLR-like contrastive loss on G. One might wonder: is the adversarial loss even necessary at this point? Why not have 'D' simply be a SimCLR CNN which constantly trains on real & fake images, and then G does its generative variant, benefiting from the 'D' SimCLR embedding? No supervision beyond the two contrastive losses. So D loops through batches of reals/fakes, refining its SimCLR embedding; meanwhile, G keeps generating batches of fakes, pushing random z vectors as far apart, while still generating similar images for neighboring z vectors. All very smooth and nonadversarial: the negative examples fight mode collapse, while the positive examples create a meaningful latent space, and the fake images help bootstrap the simCLR loss by providing lots of weird images to train on. So it's like zCR-GAN except way cooler and contrastier. (The G loss is only the contrastive loss: the G tries to make nearby z vectors get similar contrastive embeddings. My hypothesis is that only realistic images will produce well-behaved embeddings from the real-image-trained SimCLR D model, so G will constantly be molded towards realism.) One question would be: should the fakes be fed into the D during SimCLR training or should it only see real images? It seems like D should see fakes in order to provide more of a signal to G (imagine at the beginning, when it is only generating random static noise - a SimCLR trained on real images might have little to say about those early samples), but then it does introduce some dependency/feedback I'm not sure how works out: G will generate weird images for D, which possibly can be rewarded by changing the embedding, but maybe that's aligned in a good way? The advantage of this (aside from being intellectually interesting) is that it may reduce mode dropping, as diversity in each minibatch is directly optimized for, and increase stability, as it is not (as) adversarial. Prototype update:An initial attempt by gna using lightweight DCGAN on FFHQ has been made (training D on reals only). Sample is odd: it clearly is learning something, but why tessellated hair/eyes? Suspect: the highly-aggressive cropping of SimCLR is throwing away global structure. Needs bigger images, presumably. But if you use light cropping which includes most of the image, will there be enough data...? That might be a justification for training on G fake samples too: that gives you an arbitrarily large number of whole (albeit low quality) images to pseudo-data-augment the whole reals. gna is skeptical that training on fakes will do anything but confuse D or potentially allow collapse to degenerate minima where D creates an 'easy' embedding that G can excel on, but I think this must be tested out empirically. |
Current status of the full SimCLR loss for BigGAN according to shawwn:
|
gna has continued fiddling with a CLR-like GAN, moving over to a perceptual loss instead on the internal embedding as trained by a SimCLR final loss, under the reasoning that the final embedding throws away too much global information, but may still induce useful intermediate representations:
|
SimCLR news: @lucidrains has experimented with a StyleGAN2 implementation drawing on his SimCLR has been confirmed to work for BigGAN and to help! The new paper "Image Augmentations for GAN Training", Zhao et al 2020b, reports:
Their SimCLR BigGAN, which they call "Cntr-GAN", builds on the fixed data augmentation earlier in the paper (discussed in more detail in #35 ), their previous work on additional consistency losses (essentially, jittering G/D and requiring the losses be similar/dissimilar for similar images), and then adds in SimCLR for even further benefits. The gains: SimCLR on its own is a small improvement, but it comes in addition to all of the other gains. (Interestingly, SimCLR benefits from MixUp almost as much as the usual crop+scale+color-distortion SimCLR data augmentation, while the regular D data augs did not benefit at all from MixUp, suggesting that it's doing something qualitatively different.) The writeup is very brief and sketchy, unfortunately: However, it seems to be pretty much what we were trying? So if we can fix the final bugs, this should get us another few FID points / quality improvement of 10% or so. |
Self-supervision/semi-supervised learning is ultra-hot now, with new SOTAs being set in DRL using shockingly simple method, and self-supervised being competitive with classical supervised CNNs at ImageNet classification. Self-supervised auxiliary losses have also been slightly helpful in the latest variants on BigGAN.
Hypothetically, adding a self-supervised loss where the Discriminator is forced to learn more about images could stabilize training (by providing a second loss which is unrelated to the unstable zero-sum dynamics of GAN training) and make the D learn better semantics & meaningful classifications for teaching G.
Skylion did initial experiments with a simple rotation loss from SS-GAN, where the D tries to predict how an image has been randomly rotated. This helped a little bit.
SimCLR establishes that cropping->color-distorting an image and forcing the D to try to encode them in a similar way ('consistency') works extremely well at learning classification, and various DRL papers establish that even just cropping & consistency loss training is amazingly effective in DRL. A prototype by lucidrains of just cropping+flipping showed some promise in BigGAN runs, where it seemed like proto-CLR runs learned better overall structure despite problems with balancing the proto-CLR loss with the regular classification loss and the slowdown an additional training phase introduces.
We would like to use full SimCLR-like distortion + consistency training on BigGAN to train D on distorted real & fake images (Zhao shows that doing it on both is better than on just reals for GANs).
The text was updated successfully, but these errors were encountered: