Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Question about first layer #49

Open
Luciennnnnnn opened this issue Sep 26, 2021 · 10 comments
Open

Question about first layer #49

Luciennnnnnn opened this issue Sep 26, 2021 · 10 comments

Comments

@Luciennnnnnn
Copy link

Hi,
I have counted a problem in my reading.
As you stated in appendix 1, for input drawn uniformly at random in the interval [-1, 1], pushing this input through a sine nonlinearity yields an arcsine distribution as input for dot product as later layers do. So according to this correct SIREN should composed as sin->linear->sin->linear->...->linear. But actually, what you do in your code is linear->sin->linear->sin->...->linear, so my question is why are you have chosen this implementation since the first one follows the distribution assumption correctly and the second one does not.

Please tell me the right answer or point out my mistakes, thanks!

Best regards,
Xin Luo.

@Luciennnnnnn
Copy link
Author

Waiting for answers

@Luciennnnnnn
Copy link
Author

Luciennnnnnn commented Sep 29, 2021

And why the last linear layer of SIREN also divides omega in initialization without multiple omega in the calculation.

@pielbia
Copy link

pielbia commented Jan 17, 2022

Regarding your first question about the architecture, the linear module performs the operation Wx+b which is the argument of the following sine activation function sin(Wx+b) hence the sin module following the linear module. According to the paper, with a uniformly distributed input x in the interval [-1, 1] you should get a normally distributed output from the linear module and an arcsin distributed output from the sin activation layer. You should be able to find the details in the appendix of the paper.

@Luciennnnnnn
Copy link
Author

@pielbia Thank you for your reply. You said "with a uniformly distributed input x in the interval [-1, 1] you should get a normally distributed output from the linear module", but in the appendix 1.1 Overview of the proof, the author stated is "The second layer computes a linear combination of such arcsine distributed outputs,..., this linear combination will be normal distributed". The analysis is based on input is arcsine distributed not uniformly distributed.

@pielbia
Copy link

pielbia commented Jan 18, 2022

@LuoXin-s The input to the second layer is arcsin distributed because it comes out of the sine module from the first layer. I was talking about the input to the first layer, i.e. the input to the network. Consider that each layer is a combination of a linear module followed by a sin module. From the overview of the proof you have cited: "we consider an input in the interval [−1, 1]. We assume it is drawn uniformly at random, since we interpret it as a “normalized coordinate" in our applications", this is the input I was referring to, the input to the network/first layer. After a linear combination of this input with the weights and biases, it is pushed through the sine non-linearity. The output of the sine is arcsin distributed and provides the input to the second layer you were referring to.

@Luciennnnnnn
Copy link
Author

@pielbia My point is what authors proved in the paper is that linear combination of arcsine is normaly distributed with corresponding variance, and this normal distribution will become to arcsine after sin, and work recurrenctly to following layers. In first layer, the difference lays on the input is uniform distribution not arcsine distribution, so after linear combination, it may also be normaly distributed according to central limit threom, but the variance not need be same as that the arcsine after linear combination. So in first layet, we also initialize it according to analysis based on assumption that input is arcsine distribution, this may be wrong.

@pielbia
Copy link

pielbia commented Jan 18, 2022

@LuoXin-s I see your point. The distributions after the linear combination are Gaussian in both cases but you're right about them having different variance. Theorem 1.8 states that the input is uniform between [-1, 1] so it looks to me that it was taken into account that the input to the network didn't follow an arcsin distribution but a uniform one. And the two distribution, if defined over the same interval, have the same mean and a variance differing by a constant factor (Lemma 1.3 and 1.7). The initialization of the first layer is different from the initialization of the other layers. The weights of the first layer also follow a uniform distribution but over a larger interval. What I don't get is where the interval used for the weights of the first layer comes from. The weights for the other layer seems to follow the initialization scheme from the paper. Also, to answer your other question, the weights are multiplied by omega in the forward method of the sin module in module.py

@Luciennnnnnn
Copy link
Author

@pielbia I see, as mean and variance of arcsine and uniform only differing by a constant factor, the output after linear combination may also dose. This seems intuitive to me, but about the omega the author states from a different viewpoint in section 3.2. There are some subtle problems, and it is strange. For the second question, what the code I referenced is the COLAB version; you can check out that as " the last linear layer of SIREN also divides omega in initialization without multiple omega in the calculation".

@ivanstepanovftw
Copy link

@Luciennnnnnn I have tried reproduce authors activation distribution after dot product and nonlinearity. This is what I got
image (1)

@ivanstepanovftw
Copy link

@vsitzmann, hello! Can you answer questions regarding your paper and implementation inconsistencies? Only you know your work better than anyone.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants