Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

sampled_softmax_loss weights and logits don't get gradients #41792

Open
w4nderlust opened this issue Jul 28, 2020 · 5 comments
Open

sampled_softmax_loss weights and logits don't get gradients #41792

w4nderlust opened this issue Jul 28, 2020 · 5 comments
Assignees
Labels
comp:keras Keras related issues TF 2.10 type:bug Bug

Comments

@w4nderlust
Copy link

System information

  • OS Platform and Distribution: Linux Ubuntu 20.04
  • TensorFlow version: 2.2.0
  • Python version: 3.8.2
  • CUDA/cuDNN version: 10.2 / 7.6.2
  • GPU model and memory: TITAN X

Describe the current behavior

In order to use tf.nn.sampled_softmax_loss weights and biases need to be provided as inputs. I believe internally rows from those tensors are selected based on the samples and hte computation is performed.
The problem is that if you create a model with a final Dense layer and provide the weights and biases of that layer as input to tf.nn.sampled_softmax_loss, you end up receving a warning that gradients for them are not computed:

WARNING:tensorflow:Gradients do not exist for variables ['my_model/dense_1/kernel:0', 'my_model/dense_1/bias:0'] when minimizing the loss.
WARNING:tensorflow:Gradients do not exist for variables ['my_model/dense_1/kernel:0', 'my_model/dense_1/bias:0'] when minimizing the loss.

As a consequence, they never get updated during training.

Describe the expected behavior

Gradients for those tensors should be computed and they should get updated.

Standalone code to reproduce the issue

import numpy as np
import tensorflow as tf
from tensorflow.keras import Model
from tensorflow.keras.layers import Dense
from tensorflow.keras.optimizers import SGD

num_classes = 500
num_epochs = 3
num_samples = 10000
batch_size = 10
learning_rate = 0.001

y = np.random.randint(0, num_classes, num_samples, dtype=np.int64)
x = np.expand_dims(y.astype(np.float32), -1)

x_test = x[:10]
y_test = y[:10]


class MyModel(Model):

    def __init__(self, num_classes, *args, **kwargs):
        super().__init__(*args, **kwargs)
        self.dense1 = Dense(10)
        self.dense2 = Dense(num_classes)
        self.first_step = True

    def call(self, inputs, training=None, mask=None):
        hidden = self.dense1(inputs)
        if training and not self.first_step:
            return None, hidden
        else:
            logits = self.dense2(hidden)
            return logits, hidden


class SampledSoftmaxCrossEntropyLoss(tf.keras.losses.Loss):
    def __init__(self, decoder_obj=None, num_classes=0):
        super().__init__()
        self.decoder_obj = decoder_obj
        self.num_classes = num_classes

    def call(self, labels, hidden):
        labels = tf.cast(tf.expand_dims(labels, -1), tf.int64)

        weights = tf.transpose(self.decoder_obj.get_weights()[0])
        biases = self.decoder_obj.get_weights()[1]

        sampled_values = tf.random.uniform_candidate_sampler(
            true_classes=labels,
            num_true=1,
            num_sampled=5,
            range_max=self.num_classes,
            unique=False
        )

        loss_val = tf.nn.sampled_softmax_loss(
            weights=weights,
            biases=biases,
            labels=labels,
            inputs=hidden,
            num_sampled=5,
            num_classes=self.num_classes,
            sampled_values=sampled_values)

        return loss_val


my_model = MyModel(num_classes)
optimizer = SGD(learning_rate=learning_rate)
sampled_loss = SampledSoftmaxCrossEntropyLoss(
    decoder_obj=my_model.dense2, num_classes=num_classes)


def train_step(model, loss, optimizer, inputs, targets):
    with tf.GradientTape() as tape:
        logits, hidden = model(inputs, training=True)
        loss_val = loss(targets, hidden)
    grads = tape.gradient(loss_val, model.trainable_weights)
    optimizer.apply_gradients(zip(grads, model.trainable_weights))
    return loss_val


def oredict(model, inputs):
    logits, _ = model(inputs, training=True)
    predictions = tf.argmax(logits, -1)
    return predictions


x_batches = np.split(x, 100)
y_batches = np.split(y, 100)

print(x_test)
print(oredict(my_model, x_test))

first_batch = True
for i in range(num_epochs):
    for x_batch, y_batch in zip(x_batches, y_batches):
        if first_batch:
            print("Weights and biases after first batch")
            print(my_model.dense2.get_weights()[0])
            print(my_model.dense2.get_weights()[1])
            first_batch = False

        loss_val = train_step(my_model, sampled_loss, optimizer, x_batch,
                              y_batch)
        print(loss_val)

print(x_test)
print(oredict(my_model, x_test))

print("Weights and biases after training")
print(my_model.dense2.get_weights()[0])
print(my_model.dense2.get_weights()[1])
@w4nderlust w4nderlust added the type:bug Bug label Jul 28, 2020
@ravikyram ravikyram added comp:apis Highlevel API related issues TF 2.2 Issues related to TF 2.2 labels Jul 28, 2020
@ravikyram
Copy link
Contributor

@w4nderlust

I have tried in colab with TF version 2.2,2.3-rc2 .Please, find the gist here.You are also seeing the same behavior?
Thanks!

@ravikyram ravikyram added the stat:awaiting response Status - Awaiting response from author label Jul 28, 2020
@w4nderlust
Copy link
Author

Yes that's exactly the output I get (just with a different random initialization).
As you can see the weights and biases at the first epoch and at the end of the training are the same and you get those warnings that they are receiving no gradients.

@ravikyram ravikyram removed the stat:awaiting response Status - Awaiting response from author label Jul 28, 2020
@jvishnuvardhan jvishnuvardhan added comp:keras Keras related issues stat:awaiting tensorflower Status - Awaiting response from tensorflower and removed comp:apis Highlevel API related issues labels Jul 29, 2020
@sushreebarsa
Copy link
Contributor

Was able to reproduce the issue in TF v2.5,please find the gist here..Thanks !

@mohantym mohantym self-assigned this Nov 4, 2021
@mohantym mohantym removed the stat:awaiting tensorflower Status - Awaiting response from tensorflower label Nov 4, 2021
@mohantym
Copy link
Contributor

mohantym commented Nov 4, 2021

I was able to reproduce the issue on TF 2.12.0-dev20221114 , Attaching Gist for reference.

@mohantym mohantym added the stat:awaiting response Status - Awaiting response from author label Nov 4, 2021
@w4nderlust
Copy link
Author

@mohantym from your gist it looks to me it looks that the weights and biases are the same after the first batch and at the end:

Weights and biases after first batch
[[ 0.05411606  0.02750688 -0.09201522 ...  0.00735597  0.00862968
  -0.06153766]
 [ 0.10168394  0.10543302 -0.04004839 ... -0.05372164  0.06464603
   0.03657628]
 [ 0.03811692 -0.07817607  0.02010193 ...  0.05285154  0.04165239
  -0.01438953]
 ...
 [-0.02329516  0.03987963  0.02113827 ... -0.03183416  0.02946573
   0.00674187]
 [-0.07360842 -0.10110037 -0.06190708 ...  0.09768424 -0.00933281
   0.0934676 ]
 [ 0.07280309  0.10233886  0.04173826 ... -0.09212768  0.08369612
   0.01230942]]
[0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0.
 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0.
 0. 0. 0. 0. 0. 0...]

Weights and biases after training
[[ 0.05411606  0.02750688 -0.09201522 ...  0.00735597  0.00862968
  -0.06153766]
 [ 0.10168394  0.10543302 -0.04004839 ... -0.05372164  0.06464603
   0.03657628]
 [ 0.03811692 -0.07817607  0.02010193 ...  0.05285154  0.04165239
  -0.01438953]
 ...
 [-0.02329516  0.03987963  0.02113827 ... -0.03183416  0.02946573
   0.00674187]
 [-0.07360842 -0.10110037 -0.06190708 ...  0.09768424 -0.00933281
   0.0934676 ]
 [ 0.07280309  0.10233886  0.04173826 ... -0.09212768  0.08369612
   0.01230942]]
[0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0.
 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0.
 0. 0. 0. 0. 0. 0. 0. 0. 0. ...]

@mohantym mohantym removed the stat:awaiting response Status - Awaiting response from author label Nov 5, 2021
@mohantym mohantym removed their assignment Dec 1, 2021
@tilakrayal tilakrayal added TF 2.10 and removed TF 2.2 Issues related to TF 2.2 labels Nov 15, 2022
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
comp:keras Keras related issues TF 2.10 type:bug Bug
Projects
None yet
Development

No branches or pull requests

7 participants