sampled_softmax_loss weights and logits don't get gradients #41792

w4nderlust · 2020-07-28T00:33:05Z

System information

OS Platform and Distribution: Linux Ubuntu 20.04
TensorFlow version: 2.2.0
Python version: 3.8.2
CUDA/cuDNN version: 10.2 / 7.6.2
GPU model and memory: TITAN X

Describe the current behavior

In order to use tf.nn.sampled_softmax_loss weights and biases need to be provided as inputs. I believe internally rows from those tensors are selected based on the samples and hte computation is performed.
The problem is that if you create a model with a final Dense layer and provide the weights and biases of that layer as input to tf.nn.sampled_softmax_loss, you end up receving a warning that gradients for them are not computed:

WARNING:tensorflow:Gradients do not exist for variables ['my_model/dense_1/kernel:0', 'my_model/dense_1/bias:0'] when minimizing the loss.
WARNING:tensorflow:Gradients do not exist for variables ['my_model/dense_1/kernel:0', 'my_model/dense_1/bias:0'] when minimizing the loss.

As a consequence, they never get updated during training.

Describe the expected behavior

Gradients for those tensors should be computed and they should get updated.

Standalone code to reproduce the issue

import numpy as np
import tensorflow as tf
from tensorflow.keras import Model
from tensorflow.keras.layers import Dense
from tensorflow.keras.optimizers import SGD

num_classes = 500
num_epochs = 3
num_samples = 10000
batch_size = 10
learning_rate = 0.001

y = np.random.randint(0, num_classes, num_samples, dtype=np.int64)
x = np.expand_dims(y.astype(np.float32), -1)

x_test = x[:10]
y_test = y[:10]


class MyModel(Model):

    def __init__(self, num_classes, *args, **kwargs):
        super().__init__(*args, **kwargs)
        self.dense1 = Dense(10)
        self.dense2 = Dense(num_classes)
        self.first_step = True

    def call(self, inputs, training=None, mask=None):
        hidden = self.dense1(inputs)
        if training and not self.first_step:
            return None, hidden
        else:
            logits = self.dense2(hidden)
            return logits, hidden


class SampledSoftmaxCrossEntropyLoss(tf.keras.losses.Loss):
    def __init__(self, decoder_obj=None, num_classes=0):
        super().__init__()
        self.decoder_obj = decoder_obj
        self.num_classes = num_classes

    def call(self, labels, hidden):
        labels = tf.cast(tf.expand_dims(labels, -1), tf.int64)

        weights = tf.transpose(self.decoder_obj.get_weights()[0])
        biases = self.decoder_obj.get_weights()[1]

        sampled_values = tf.random.uniform_candidate_sampler(
            true_classes=labels,
            num_true=1,
            num_sampled=5,
            range_max=self.num_classes,
            unique=False
        )

        loss_val = tf.nn.sampled_softmax_loss(
            weights=weights,
            biases=biases,
            labels=labels,
            inputs=hidden,
            num_sampled=5,
            num_classes=self.num_classes,
            sampled_values=sampled_values)

        return loss_val


my_model = MyModel(num_classes)
optimizer = SGD(learning_rate=learning_rate)
sampled_loss = SampledSoftmaxCrossEntropyLoss(
    decoder_obj=my_model.dense2, num_classes=num_classes)


def train_step(model, loss, optimizer, inputs, targets):
    with tf.GradientTape() as tape:
        logits, hidden = model(inputs, training=True)
        loss_val = loss(targets, hidden)
    grads = tape.gradient(loss_val, model.trainable_weights)
    optimizer.apply_gradients(zip(grads, model.trainable_weights))
    return loss_val


def oredict(model, inputs):
    logits, _ = model(inputs, training=True)
    predictions = tf.argmax(logits, -1)
    return predictions


x_batches = np.split(x, 100)
y_batches = np.split(y, 100)

print(x_test)
print(oredict(my_model, x_test))

first_batch = True
for i in range(num_epochs):
    for x_batch, y_batch in zip(x_batches, y_batches):
        if first_batch:
            print("Weights and biases after first batch")
            print(my_model.dense2.get_weights()[0])
            print(my_model.dense2.get_weights()[1])
            first_batch = False

        loss_val = train_step(my_model, sampled_loss, optimizer, x_batch,
                              y_batch)
        print(loss_val)

print(x_test)
print(oredict(my_model, x_test))

print("Weights and biases after training")
print(my_model.dense2.get_weights()[0])
print(my_model.dense2.get_weights()[1])

The text was updated successfully, but these errors were encountered:

ravikyram · 2020-07-28T06:10:02Z

@w4nderlust

I have tried in colab with TF version 2.2,2.3-rc2 .Please, find the gist here.You are also seeing the same behavior?
Thanks!

w4nderlust · 2020-07-28T06:32:59Z

Yes that's exactly the output I get (just with a different random initialization).
As you can see the weights and biases at the first epoch and at the end of the training are the same and you get those warnings that they are receiving no gradients.

sushreebarsa · 2021-05-30T11:59:19Z

Was able to reproduce the issue in TF v2.5,please find the gist here..Thanks !

mohantym · 2021-11-04T13:42:17Z

I was able to reproduce the issue on TF 2.12.0-dev20221114 , Attaching Gist for reference.

w4nderlust · 2021-11-04T16:54:58Z

@mohantym from your gist it looks to me it looks that the weights and biases are the same after the first batch and at the end:

Weights and biases after first batch
[[ 0.05411606  0.02750688 -0.09201522 ...  0.00735597  0.00862968
  -0.06153766]
 [ 0.10168394  0.10543302 -0.04004839 ... -0.05372164  0.06464603
   0.03657628]
 [ 0.03811692 -0.07817607  0.02010193 ...  0.05285154  0.04165239
  -0.01438953]
 ...
 [-0.02329516  0.03987963  0.02113827 ... -0.03183416  0.02946573
   0.00674187]
 [-0.07360842 -0.10110037 -0.06190708 ...  0.09768424 -0.00933281
   0.0934676 ]
 [ 0.07280309  0.10233886  0.04173826 ... -0.09212768  0.08369612
   0.01230942]]
[0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0.
 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0.
 0. 0. 0. 0. 0. 0...]

Weights and biases after training
[[ 0.05411606  0.02750688 -0.09201522 ...  0.00735597  0.00862968
  -0.06153766]
 [ 0.10168394  0.10543302 -0.04004839 ... -0.05372164  0.06464603
   0.03657628]
 [ 0.03811692 -0.07817607  0.02010193 ...  0.05285154  0.04165239
  -0.01438953]
 ...
 [-0.02329516  0.03987963  0.02113827 ... -0.03183416  0.02946573
   0.00674187]
 [-0.07360842 -0.10110037 -0.06190708 ...  0.09768424 -0.00933281
   0.0934676 ]
 [ 0.07280309  0.10233886  0.04173826 ... -0.09212768  0.08369612
   0.01230942]]
[0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0.
 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0.
 0. 0. 0. 0. 0. 0. 0. 0. 0. ...]

w4nderlust added the type:bug Bug label Jul 28, 2020

google-ml-butler bot assigned ravikyram Jul 28, 2020

ravikyram added comp:apis Highlevel API related issues TF 2.2 Issues related to TF 2.2 labels Jul 28, 2020

ravikyram added the stat:awaiting response Status - Awaiting response from author label Jul 28, 2020

ravikyram removed the stat:awaiting response Status - Awaiting response from author label Jul 28, 2020

ravikyram assigned jvishnuvardhan and unassigned ravikyram Jul 28, 2020

jvishnuvardhan assigned pavithrasv and unassigned jvishnuvardhan Jul 29, 2020

jvishnuvardhan added comp:keras Keras related issues stat:awaiting tensorflower Status - Awaiting response from tensorflower and removed comp:apis Highlevel API related issues labels Jul 29, 2020

mohantym self-assigned this Nov 4, 2021

mohantym removed the stat:awaiting tensorflower Status - Awaiting response from tensorflower label Nov 4, 2021

mohantym added the stat:awaiting response Status - Awaiting response from author label Nov 4, 2021

mohantym removed the stat:awaiting response Status - Awaiting response from author label Nov 5, 2021

mohantym removed their assignment Dec 1, 2021

tilakrayal added TF 2.10 and removed TF 2.2 Issues related to TF 2.2 labels Nov 15, 2022

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

sampled_softmax_loss weights and logits don't get gradients #41792

sampled_softmax_loss weights and logits don't get gradients #41792

w4nderlust commented Jul 28, 2020

ravikyram commented Jul 28, 2020

w4nderlust commented Jul 28, 2020

sushreebarsa commented May 30, 2021

mohantym commented Nov 4, 2021 •

edited by tilakrayal

w4nderlust commented Nov 4, 2021

sampled_softmax_loss weights and logits don't get gradients #41792

sampled_softmax_loss weights and logits don't get gradients #41792

Comments

w4nderlust commented Jul 28, 2020

ravikyram commented Jul 28, 2020

w4nderlust commented Jul 28, 2020

sushreebarsa commented May 30, 2021

mohantym commented Nov 4, 2021 • edited by tilakrayal

w4nderlust commented Nov 4, 2021

mohantym commented Nov 4, 2021 •

edited by tilakrayal