Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Transformed Variable not trainable in Keras model #946

Open
keyonvafa opened this issue May 23, 2020 · 6 comments
Open

Transformed Variable not trainable in Keras model #946

keyonvafa opened this issue May 23, 2020 · 6 comments

Comments

@keyonvafa
Copy link
Contributor

keyonvafa commented May 23, 2020

Hi,
I am trying to train a positive variable using tfp.util.TransformedVariable as an attribute of a tf.keras.Model object. However, the model does not recognize it as a trainable variable, and it does not receive gradients. This behavior holds for tensorflow_probability==0.10.0 and tensorflow==2.2.0, as well as for the nightly builds of both.

Here is a colab notebook illustrating this behavior: https://colab.research.google.com/drive/1XGCcm8l0OGRiy35lr3XcHAyZMuBNpsIB?usp=sharing

In this example, we are trying to train both an unconstrained variable (loc) and a constrained variable (scale). Only the loc variable updates.

import numpy as np
import tensorflow as tf
import tensorflow_probability as tfp

class Model(tf.keras.Model):
  def __init__(self):
    super(Model, self).__init__()
    self.loc = tf.Variable(tf.ones(shape=[5]), name="loc")
    self.scale = tfp.util.TransformedVariable(
        tf.ones([5]),
        bijector=tfp.bijectors.Softplus(),
        name="scale") 
    self.distribution = tfp.distributions.Normal(loc=self.loc, scale=self.scale)

  def call(self, inputs):
    samples = self.distribution.sample()
    assigned_means = tf.gather(samples, inputs)
    return tfp.distributions.Normal(loc=assigned_means, scale=1.)

model = Model()
print(model.trainable_weights)  # only 'loc' shows up

optimizer = tf.keras.optimizers.Adam(learning_rate=0.1)
loss = lambda x, rv: -tf.reduce_sum(rv.log_prob(x))
inputs = np.array([0, 1, 2, 3, 4]).astype(np.int32)
outputs = np.array([0., 1., 2., 3., 4.]).astype(np.float32)
dataset = tf.data.Dataset.from_tensor_slices((inputs, outputs))
dataset = dataset.batch(5)
model.compile(optimizer=optimizer, loss=loss)
model.fit(dataset, epochs=100, verbose=0)

# Check if the location parameters have moved from their original values.
assert(not (np.isclose(model.loc.numpy(), np.ones(5))).all())  # Passes

# Check if the scale parameters have moved from their original values.
assert(not (np.isclose(model.scale.numpy(), np.ones(5))).all())  # Fails

Thanks!

@brianwa84
Copy link
Contributor

brianwa84 commented May 23, 2020 via email

@keyonvafa
Copy link
Contributor Author

Thank you. That works for me. Hopefully it's possible for keras to recognize variables inside a tf.Module.

@krzysztofrusek
Copy link

Hello,

I have the same problem with the tfp.experimental.nn.util.RandomVariable (tf.version, tfp.version = ('2.3.0', '0.11.0'))

def random_variable_scope(next_creator, **kwargs):
  iv = kwargs['initial_value']
  if callable(iv):
    return tfn.util.RandomVariable(tfd.Normal(tf.Variable(iv()), 1))
  return next_creator(**kwargs)


with tf.variable_creator_scope(random_variable_scope):
  d = tf.keras.layers.Dense(2)
  d(tf.zeros([3,4]))


[type(v) for v in d.variables]

gives

[tensorflow_probability.python.experimental.nn.util.random_variable.RandomVariable,
 tensorflow_probability.python.experimental.nn.util.random_variable.RandomVariable]

The foo trick kind of works but it makes the variables list polluted:

d.foo = [ v.variables for v in d.variables]
[type(v) for v in d.variables]
[tensorflow_probability.python.experimental.nn.util.random_variable.RandomVariable,
 tensorflow_probability.python.experimental.nn.util.random_variable.RandomVariable,
 tensorflow.python.ops.resource_variable_ops.ResourceVariable,
 tensorflow.python.ops.resource_variable_ops.ResourceVariable]

@st--
Copy link
Contributor

st-- commented Feb 19, 2021

@keyonvafa this is a core TensorFlow issue (see tensorflow/tensorflow#47264). You can use the TrackableLayer code that I posted in tensorflow/tensorflow#47264 (comment) to work around this issue as follows: You only need to change the first few lines of your Model class definition to

class Model(tf.keras.Model, TrackableLayer):
  def __init__(self):
    super().__init__()  # this is the recommended style in Python3 anyways

With your original Model definition, model.trainable_variables is missing the scale Variable:

[<tf.Variable 'loc:0' shape=(5,) dtype=float32, numpy=array([1., 1., 1., 1., 1.], dtype=float32)>]

When inheriting from TrackableLayer, model.trainable_variables now returns

[<tf.Variable 'loc:0' shape=(5,) dtype=float32, numpy=array([1., 1., 1., 1., 1.], dtype=float32)>,
 <tf.Variable 'scale:0' shape=(5,) dtype=float32, numpy=
 array([0.54132485, 0.54132485, 0.54132485, 0.54132485, 0.54132485],
       dtype=float32)>]

as expected.

@krzysztofrusek you can use a similar workaround for your own issue as well by using a TrackableDense instead that is defined by

class TrackableDense(tf.keras.layers.Dense, TrackableLayer):
    pass

though I'm not sure it resolves your duplicated-variable issue. 🤔

@keyonvafa
Copy link
Contributor Author

Thank you @st-- ! I'll give that a try.

@st--
Copy link
Contributor

st-- commented Feb 23, 2021

Also, it looks like this issue will finally be fixed by TensorFlow 2.5: tensorflow/tensorflow#47264 (comment)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

5 participants