Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

BigGAN: non-normal latent space (binomial mixture?) #34

Open
gwern opened this issue Jun 4, 2020 · 5 comments
Open

BigGAN: non-normal latent space (binomial mixture?) #34

gwern opened this issue Jun 4, 2020 · 5 comments
Labels
enhancement New feature or request good first issue Good for newcomers priority: low

Comments

@gwern
Copy link

gwern commented Jun 4, 2020

Feature: experiment with z which are a mix of normals, censored normals, binomials, and categoricals.

Typically, in almost all GANs, the original z random noise is just a bunch of Gaussian variables (N(0,1)). This is chosen as a sheer default (when in doubt, a random variable is just a Gaussian), but there is no reason the z couldn't be something totally different. StyleGAN, for example, finds that it's important to transform z into w using no less than 8 layers (#26), which suggests that the standard z is... not great, and does affect the results quite a bit (both the quality of generated samples, and also control of the model for editing, everyone finds that edits on w work a lot better than z in StyleGAN). The w is learning to transform the Gaussians into some distribution more appropriate - and potentially much rougher and spikier and more discrete?

The intuition here is that using just normals forces the GAN to find 128 or 512 or whatever 'factors' which can vary smoothly; but we want a GAN to learn things like 'wearing glasses', which are inherently discrete: a face either is or is not wearing glasses, there's no such thing as wearing +0.712σ glasses, or wearing −0.12σ of glasses-ness, even though that's more or less what using only normals is forcing the GAN to do. So, you get weird things where face interpolations look like raccoons halfway through as you force the GAN to generate partial-glass-ness. If the GAN had access to a binary variable, it could assign that to 'glasses' and simply snap between glasses/no-glasses while interpolating on the latent space without any nonsensical intermediates being forced by smoothness. (Since the G isn't being forced to generate artifactual images which D can easily detect & penalize, it may also have a easier time learning.)

mooch's BigGAN paper experiments with the latent space to offer something better than a giant w (which we've thus far found a little hard to stabilize inside BigGAN training). In particular, he finds that binary 0/1 (Bernouilli) variables and rectified (censored) normals perform surprisingly better:

xwd-159128010693015

A mix of Gaussians and binary variables, inspired by InfoGAN (where the z is the usual Gaussian provided for randomness & modeling nuisance variables, and c is a vector of binary variables which are the human-meaningful latent variables one hopes the GAN will automatically learn), also performs well.

mooch did not thoroughly explore this space because he wanted to use truncation, so there's potential for getting improvements here. (We could, for example, use a mix of Bernouillis and censored normals, and simply only apply the truncation trick to the censored normals.)

One suggestion would be that we could expand the latent z a lot by tacking on an additional few hundred censored normals or binomials in addition to the existing normals, feeding a mix into each BigGAN block.

Simply modifying the z code should be easy. compare_gan already provides a z_generator abstraction which lets us pass in distribution_fn=tf.random.uniform or distribution_fn=tf.random.normal etc. tf.random supports categorical / Bernouilli, and normals (but not, it seems, censored normals, only truncated normals, which are different, but censored is easy: if x > I then 0). So we could define a distribution which, say, returns the first half of censored normals and a second half of Bernouillis.

@gwern gwern added enhancement New feature or request good first issue Good for newcomers priority: low labels Jun 4, 2020
@gwern
Copy link
Author

gwern commented Jun 9, 2020

Update: @aydao has experimented with half binomial / half censored (with non-hierarchical-z for simplicity). The BigGAN appears to be stable and train similarly, without any obvious flaws.

@gwern
Copy link
Author

gwern commented Jun 10, 2020

Hierarchical-z and hierarchical-z with w (#26 ) samples at ~27k iters (non-EMA):

hierarchical-27k-individualImage

hierarchical-w-individualImage

The w in particular seems like it might be better than a regular w? If so, this suggests that he mixture helps stabilize w: instead of a large number of ambiguous highly-symmetrical normals, which can be permuted easily by small updates, it has a mixture, which makes it harder to shuffle around (eg any latent variable which gets mapped early on to a binary variable, because it is discrete, will be 'stuck' as it will be hard for it to be swapped with another binary variable, much less a truncated normal).

@gwern
Copy link
Author

gwern commented Jun 10, 2020

Further update at ~45k iterations, mixture with w and mixture with normal z:

ema-with-w-eindividualImage

ema-no-w-eindividualImage

The w version looks much better, while the non-w (z-only) version looks as usual for BigGAN at this point, very smeary and nonstable and fairly typical for a BigGAN. Strikingly, w is usually even more smeary as discussed in the w issue, for hundreds of thousands of iterations. This kind of sharpness for EMA samples in w runs is unprecedented. It seems the mixture z is enormously stabilizing for w, and improve a lot.

I suggest that we use it as a default from here on out.

@gwern
Copy link
Author

gwern commented Jul 15, 2020

An example of the kind of pathologies that sparsity/binaries/censored-normals can help with in a mixed latent is discrete features like horns in Arfafax's This Pony Does Not Exist S2. Consider the interpolation video: https://cdn.discordapp.com/attachments/706488721410621451/725770677629485244/random_grid_2006_1.mp4 https://www.dropbox.com/s/j3sn26323sxupkl/random_grid_2006_1.mp4?dl=0

Watch the unicorn horns. You either have a horn or you do not have a horn, there is no such thing as having 0.55th of a horn, or having -0.1σ horns. But because the interpolation must be smooth and horns must be represented by a Gaussian or combination of Gaussians, all of the points in between horn/no-horn must map to some image, so you get weird intermediate images where the horn is just sort of a hair-colored blob or something. These obviously look bad, and yet, must make up a good chunk of the latent space!

If z contained some binary variables or censored normals, then it could be handled much more sensibly: no-horn could be mapped to the 0 of the censored normal, and then positive values could be used to denote how visibly large the horn is, for example.

@gwern
Copy link
Author

gwern commented Jul 15, 2020

@aydao's mixture latent primary patch for censored-normals appears to be: aydao/compare_gan@7cf79b5

His w branch seems to be https://github.com/aydao/compare_gan/blob/ayworking/compare_gan/architectures/resnet_biggan_mapping.py Patch: aydao/compare_gan@954e9b4

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement New feature or request good first issue Good for newcomers priority: low
Projects
None yet
Development

No branches or pull requests

1 participant