HyperNetwork (CNN+SIREN) on larger images? #15

trougnouf · 2020-07-23T12:32:20Z

Does it seem feasible to train the CNN+SIREN scheme on larger images? The given example (train_img_neural_process.py) uses 32x32 pixel images, and I haven't managed to generate visually pleasing results on a more useful size (eg 256x256px).

Either the image quality is very poor, or the memory blows up when I try increasing the number of parameters.

Another example (train_img.py) uses the same SIREN network to generate larger images (512x512) so the SIREN shouldn't be the problem, but its weight generation process is. It seems that the output of the CNN is reduced to only 256 features, first in ConvImgEncoder, which does something like

torch.nn.Linear(1024,1)(torch.rand([batch_size, 256, 32, 32]).view(4,256,-1)).squeeze(-1).shape                                                                                      
torch.Size([batch_size, 256])

Then the HyperParameter network (FCBlock) does the same according to the hidden layer's (hyper) hidden features.

This seems to be very little when we are generating weights for a network which has 256*256 weights per layer. I guess it would work on 32x32px but it makes sense that this would not scale up.

Do you have any insight as how to generate the SIREN weights from the output of a CNN without blowing up memory?

The text was updated successfully, but these errors were encountered:

alexanderbergman7 · 2020-07-26T03:10:35Z

Hi,

Thanks for the question, this is a good observation and something we've thought about a bit. As you mention, the capacity of the SIREN network itself (hypo-network?) is likely not the problem as similar size SIRENs are capable of perfectly reconstructing high-resolution images.

We've previously ran experiments using an auto-decoder architecture for generalization: instead of an encoder which maps input images to a latent space, assign each image in the training set a latent code which we jointly optimize with the hypernetwork. In this case, the generalization scheme was able to reconstruct high-resolution images with higher fidelity, but the latent space learned was not especially useful as a prior. This suggests that the encoder is likely the bottleneck - the encoder provides the ability to trade-off some training reconstruction quality for a latent space which can generalize to unseen images.

I'm not entirely sure on how to improve the convolutional encoder without blowing up the memory cost. Have you tried the set encoder instead of the convolutional encoder? The set encoder tended to "under-fit" and blur images, but perhaps its variance with respect to the input size is not as much.

trougnouf · 2020-08-26T13:14:04Z

Thank you @alexanderbergman7 !
Sorry I meant to respond earlier but I haven't experimented with this as much as I hoped to report results.
I was unable to train a CNN to generate better hyperparameters just by increasing the number of parameters or changing the CNN (or hypernetwork)'s architecture. Thank you for the suggestion to use a linear network all along, I have not yet tried that.
It is especially interesting that this hypernetwork architecture is able to reconstruct high-quality images given an optimized latent code.

trougnouf closed this as completed Aug 26, 2020

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

HyperNetwork (CNN+SIREN) on larger images? #15

HyperNetwork (CNN+SIREN) on larger images? #15

trougnouf commented Jul 23, 2020

alexanderbergman7 commented Jul 26, 2020

trougnouf commented Aug 26, 2020

HyperNetwork (CNN+SIREN) on larger images? #15

HyperNetwork (CNN+SIREN) on larger images? #15

Comments

trougnouf commented Jul 23, 2020

alexanderbergman7 commented Jul 26, 2020

trougnouf commented Aug 26, 2020