Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

What is the purpose of _built_kernel_divergence and _built_bias_divergence? #894

Closed
nbro opened this issue Apr 22, 2020 · 1 comment
Closed

Comments

@nbro
Copy link
Contributor

nbro commented Apr 22, 2020

If you look at the source code of _ConvVariational, you will see that there are two mysterious private fields self._built_kernel_divergence and self._built_bias_divergence, whose goal and behaviour is completely obscure.

Why?

Look at the following code:

    if not self._built_kernel_divergence:
      self._apply_divergence(self.kernel_divergence_fn,
                             self.kernel_posterior,
                             self.kernel_prior,
                             self.kernel_posterior_tensor,
                             name='divergence_kernel')
      self._built_kernel_divergence = True
    if not self._built_bias_divergence:
      self._apply_divergence(self.bias_divergence_fn,
                             self.bias_posterior,
                             self.bias_prior,
                             self.bias_posterior_tensor,
                             name='divergence_bias')
      self._built_bias_divergence = True

In theory, those if blocks are executed only if self._built_kernel_divergence and self._built_bias_divergence are False. However, if those if blocks are executed, both fields are turned into True.

However, after having searched a little bit, self._built_kernel_divergence and self._built_bias_divergence are only used in the call method (like above: note that the code above is taken from the call method) or in the build method, where they are initialized to False. So, if you perform more than one forward pass after the model has been built (and I assume that build is called only once when the model is built), self._apply_divergence will not be called, but this is not possible, because the losses are added to model.losses and only self._apply_divergence adds the KL divergence to self.losses (which are added to model.losses).

What's going on here, @jvdillon, @davmre, @csuter , @brianwa84, @jburnim? Can you please help?

I really need to change the behaviour of a custom layer that inherits from Convolution2DFlipout dynamically in a callback. Currently, I was trying to do that by having a non-trainable variable inside this class A that inherits from Convolution2DFlipout. Now, I can change that variable, but the problem is that I am getting weird results (i.e. the loss that the optimizer prints in the progress bar does not correspond to the loss I compute manually, see https://stackoverflow.com/q/61371627/3924118, https://stackoverflow.com/q/61372167/3924118, https://stackoverflow.com/q/61357111/3924118 and #887)

This is so weird! It's like the code that I think I am executing I am not really executing.

I am using TensorFlow 2.1 and TFP 0.9.0 and I had to use experimental_run_tf_function=False when compiling the model in order not to have the error NotImplementedError: Cannot convert a symbolic Tensor (truediv:0) to a numpy array. (see e.g. #519, tensorflow/tensorflow#33729)

@nbro
Copy link
Contributor Author

nbro commented Apr 26, 2020

See 1574c1d, which removes the usage of these properties from the dense layers, but not the convolutional layers. Why?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

1 participant