Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Does SIREN only work well with over parameterised network? #31

Open
MengZephyr opened this issue Dec 18, 2020 · 2 comments
Open

Does SIREN only work well with over parameterised network? #31

MengZephyr opened this issue Dec 18, 2020 · 2 comments

Comments

@MengZephyr
Copy link

Hello,

I spent one week on testing siren and try to introduce it to my system. In my knowledge, the Relu network at least is able to come out a result with the averaged or balanced patterns from the training data.

I did a small test like this: to jointly train the neural feature image and a CNN auto-encoder with Sin as the activation. With just 2 cat images as the reconstruction target, to my surprise, an over-parameterized network along with 2 corresponding neural images, very quickly (about 1000 iterations) outputs the results with beautiful high frequency patterns of hairs. Then I begin to reduce the size and dimension of the neural image. Consequently, the loss value still decrease fast at first, then at some iteration, the value suddenly jump high and come a very desperate result. Here I attach the processing result:
image

I just think very natively, is it because Sin activation is very sensitive to the gradient step, that any wrong step may lead the result to a bad local minimum? Have you tested the generalisation of Siren? Or does the Siren only work well with the over parameterised network?

@VovaTch
Copy link

VovaTch commented Feb 15, 2022

I've only come across SIREN very recently, I've been experimenting on my own with a similar type of network independently so maybe I can help. Yes this is a problem that I've encountered, where my fitted images go haywire and dissolve into noise. What worked for me is using a specific set of training methods. I've used OneCycleLR (pytorch) in conjunction with AdamW. It seems using weight decay and amsgrad is essential (usually 1e-3 max learning rate and 1e-6 weight decay), else the output indeed seems to dissolve into noise. Setting a max_clip_norm of about 0.1 to 1 also helps.

Keep in mind that mathematically smaller images tend to require higher frequency sinusoid for fitting, and the derivative of a high-frequency sinusoid can grow very large if not multiplied by a small constant to compensate.

@ivanstepanovftw
Copy link

Try setting smaller c in initializer, that was initially proposed as 6, to lower value like 4 or 3.2, as it helps to stabilize gradient flow, i.e.

self.c = 4

if self.is_first:
    bound = 1 / fan_in
else:
    bound = math.sqrt(self.c / fan_in) / self.omega_0
x.uniform_(-bound, bound)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants