Skip to content

Conversation

@seanpmorgan
Copy link
Member

@seanpmorgan seanpmorgan commented Apr 30, 2019

Closes #216

Looks like assigning the variables using operations during init created an issue in the gradients. Setting their values inside of tf.function as done before appears to fix it. I'm not really thrilled about the lack of test coverage, but also not excited to build a regression test into our unit tests.

https://colab.research.google.com/drive/1Md7SnyEC5bUfkME1Akth4KKM24ac26NK

After this bugfix is in I'll work on publishing a 0.3 release

@facaiy
Copy link
Member

facaiy commented Apr 30, 2019

Thanks for ping me, Sean, I'll look into it tomorrow. By the way, is it related with #220 ?

@facaiy facaiy requested a review from qlzh727 April 30, 2019 03:39
@WindQAQ
Copy link
Member

WindQAQ commented Apr 30, 2019

Thanks for ping me, Sean, I'll look into it tomorrow. By the way, is it related with #220 ?

No, it's just the typo in example :-)

@WindQAQ WindQAQ added the layers label Apr 30, 2019
@seanpmorgan
Copy link
Member Author

Thanks for ping me, Sean, I'll look into it tomorrow. By the way, is it related with #220 ?

No problem that was auto from CODEOWNERS. @qlzh727 might want to review as this was just worked on.

Changes:

  1. data_init also calls init_norm because it was previously called in build
  2. init_norm now calls _compute_weights as it ran afterward in build before
  3. The variable assigns were causing broken gradients.

@qlzh727
Copy link
Member

qlzh727 commented Apr 30, 2019

I am ok with point 3 which changes the .assign() to =, but I am still bit confused about 1 and 2.

a. Both data init and init_norm are trying to init the value of g, and I don't see the reason to make init_norm to call _compute_weights(), which will update the self.layer.kernel.
b. For my understanding, init_norm and data_init are trying to do the same thing, which is initialize the g value. In the data init's case, does it also require init_norm()?

@seanpmorgan
Copy link
Member Author

seanpmorgan commented Apr 30, 2019

I am ok with point 3 which changes the .assign() to =, but I am still bit confused about 1 and 2.

a. Both data init and init_norm are trying to init the value of g, and I don't see the reason to make init_norm to call _compute_weights(), which will update the self.layer.kernel.
b. For my understanding, init_norm and data_init are trying to do the same thing, which is initialize the g value. In the data init's case, does it also require init_norm()?

@qlzh727 Ah sry thats a bit of laziness on my part. During troubleshooting I just made everything match the same steps as it was before to try to debug. I've removed 1+2 and confirmed the expected results:
https://colab.research.google.com/drive/1Md7SnyEC5bUfkME1Akth4KKM24ac26NK

@seanpmorgan seanpmorgan merged commit 377edb7 into tensorflow:master Apr 30, 2019
@seanpmorgan seanpmorgan deleted the fix-weightnorm branch April 30, 2019 15:54
@facaiy
Copy link
Member

facaiy commented May 1, 2019

Is it a bug? I think we are encouraged to use assign than =, if I'm not wrong. cc @alextp

@alextp
Copy link

alextp commented May 1, 2019

using assign sets the value of a tf.Variable, using = will set the value of a python variable to be that tensor. Assigning to tf.Variables is not differentiable but making a tensor based on some operations is.

Doing self.bias = ... inside a tf.function seems very dangerous because you've now kept a reference to a tensor defined inside a function which you're not allowed to use outside the function.

I recommend this code be changed to just say bias = instead of self.bias = . Then I believe it is correct.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

WeightNormalization fails after update

6 participants