Mysterious bunch of meta_optimizer.cc:801 errors #52124

sbushmanov · 2021-09-24T17:05:52Z

OS Platform and Distribution: Linux Ubuntu 18.04
TensorFlow compiled from source master, via Docker
TensorFlow version 2.7.0
Python version: 3.9.7
Bazel version: 3.7.2
CUDA/cuDNN version: 11.4/8.2
GPU model and memory: NVIDIA 1080 Ti

I'm getting the following error messages which I don't understand if I have to pay attention to or disregard:

2021-09-24 18:59:43.243685: E tensorflow/core/framework/resource_handle.cc:39] A ref-counted ResourceHandle cannot be serialized losslesslyDeserializing the result is a failure: ShuffleDatasetV3/SeedGenerator_2
2021-09-24 18:59:53.258130: I tensorflow/core/kernels/data/shuffle_dataset_op.cc:380] Filling up shuffle buffer (this may take a while): 2799 of 10000
2021-09-24 19:00:03.260611: I tensorflow/core/kernels/data/shuffle_dataset_op.cc:380] Filling up shuffle buffer (this may take a while): 5660 of 10000
2021-09-24 19:00:13.259760: I tensorflow/core/kernels/data/shuffle_dataset_op.cc:380] Filling up shuffle buffer (this may take a while): 8506 of 10000
2021-09-24 19:00:18.494994: I tensorflow/core/kernels/data/shuffle_dataset_op.cc:405] Shuffle buffer filled.
2021-09-24 19:00:20.879185: E tensorflow/core/grappler/optimizers/meta_optimizer.cc:801] function_optimizer failed: INVALID_ARGUMENT: Node 'model_lstm_partitionedcall_24_RetVal': Connecting to invalid output 27 of source node model/lstm/PartitionedCall which has 27 outputs.
2021-09-24 19:00:20.901732: E tensorflow/core/grappler/optimizers/meta_optimizer.cc:801] layout failed: OUT_OF_RANGE: src_output = 27, but num_outputs is only 27
2021-09-24 19:00:20.929210: E tensorflow/core/grappler/optimizers/meta_optimizer.cc:801] function_optimizer failed: INVALID_ARGUMENT: Node 'model_lstm_partitionedcall_24_RetVal': Connecting to invalid output 27 of source node model/lstm/PartitionedCall which has 27 outputs.
2021-09-24 19:00:20.970470: W tensorflow/core/common_runtime/process_function_library_runtime.cc:859] Ignoring multi-device function optimization failure: INVALID_ARGUMENT: Input 0 of node model_lstm_partitionedcall_2_RetVal was passed bool from model/lstm/PartitionedCall:5 incompatible with expected int32.
2021-09-24 19:00:21.214614: E tensorflow/core/grappler/optimizers/meta_optimizer.cc:801] function_optimizer failed: INVALID_ARGUMENT: Input 5 of node gradients/decoder/lstm_1/PartitionedCall_grad/PartitionedCall was passed int32 from gradients_decoder_lstm_1_partitionedcall_grad_decoder_lstm_1_partitionedcall:0 incompatible with expected bool.
2021-09-24 19:00:21.275349: E tensorflow/core/grappler/optimizers/meta_optimizer.cc:801] function_optimizer failed: INVALID_ARGUMENT: Input 5 of node gradients/decoder/lstm_1/PartitionedCall_grad/PartitionedCall was passed int32 from gradients_decoder_lstm_1_partitionedcall_grad_decoder_lstm_1_partitionedcall:0 incompatible with expected bool.
2021-09-24 19:00:21.331375: W tensorflow/core/common_runtime/process_function_library_runtime.cc:859] Ignoring multi-device function optimization failure: INVALID_ARGUMENT: Input 5 of node gradients/decoder/lstm_1/PartitionedCall_grad/PartitionedCall was passed int32 from gradients_decoder_lstm_1_partitionedcall_grad_decoder_lstm_1_partitionedcall:0 incompatible with expected bool.

This happens while running custom training loop:

num_epochs = 20
optimizer = Adam(learning_rate=0.001)
train_loss_result = []
val_loss_result = []


@tf.function
def ger_preproc(ger):
    inputs = ger[:, :-1]
    outputs = ger[:, 1:]
    return inputs, outputs


@tf.function
def masked_loss(true_german, predicted_german):
    loss = SparseCategoricalCrossentropy(from_logits=True, reduction='none')(true_german, predicted_german)
    mask = tf.cast(true_german != 0, tf.float32)
    loss *= mask
    return tf.reduce_mean(loss)


@tf.function
def forward_pass(eng_inputs, ger_inputs):
    g_in, g_out = ger_preproc(ger_inputs)
    hidden_state, cell_state = encoder(eng_inputs)
    predicted_german, _, _ = decoder(g_in, hidden_state, cell_state)
    current_loss = masked_loss(g_out, predicted_german)
    return current_loss


for epoch in range(num_epochs):
    train_ds, val_ds = make_dss()
    train_loss = Mean()
    val_loss = Mean()

    # train
    for eng_inputs, ger_inputs in train_ds:
        with tf.GradientTape() as t:
            current_loss = forward_pass(eng_inputs, ger_inputs)
        trainable_vars = encoder.trainable_variables + decoder.trainable_variables
        grads = t.gradient(current_loss, trainable_vars)
        optimizer.apply_gradients(zip(grads, trainable_vars))
        train_loss(current_loss)

    # validate
    for eng_inputs, ger_inputs in val_ds:
        current_loss = forward_pass(eng_inputs, ger_inputs)
        val_loss(current_loss)

The funny thing here they have similar 801 errors:

2021-08-31 11:08:27.919851: E tensorflow/core/grappler/optimizers/meta_optimizer.cc:801] function_optimizer failed: Invalid argument: Input 6 of node gradient_tape/while/while_grad/body/_531/gradient_tape/while/gradients/while/decoder_1/gru_3/PartitionedCall_grad/PartitionedCall was passed variant from gradient_tape/while/while_grad/body/_531/gradient_tape/while/gradients/while/decoder_1/gru_3/PartitionedCall_grad/TensorListPopBack_2:1 incompatible with expected float.
2021-08-31 11:08:28.004195: E tensorflow/core/grappler/optimizers/meta_optimizer.cc:801] shape_optimizer failed: Out of range: src_output = 25, but num_outputs is only 25
2021-08-31 11:08:28.044145: E tensorflow/core/grappler/optimizers/meta_optimizer.cc:801] layout failed: Out of range: src_output = 25, but num_outputs is only 25
2021-08-31 11:08:28.169643: E tensorflow/core/grappler/optimizers/meta_optimizer.cc:801] function_optimizer failed: Invalid argument: Input 6 of node gradient_tape/while/while_grad/body/_531/gradient_tape/while/gradients/while/decoder_1/gru_3/PartitionedCall_grad/PartitionedCall was passed variant from gradient_tape/while/while_grad/body/_531/gradient_tape/while/gradients/while/decoder_1/gru_3/PartitionedCall_grad/TensorListPopBack_2:1 incompatible with expected float.
2021-08-31 11:08:28.227653: E tensorflow/core/grappler/optimizers/meta_optimizer.cc:801] shape_optimizer failed: Out of range: src_output = 25, but num_outputs is only 25
2021-08-31 11:08:28.301920: W tensorflow/core/common_runtime/process_function_library_runtime.cc:841] Ignoring multi-device function optimization failure: Invalid argument: Input 1 of node while/body/_1/while/TensorListPushBack_56 was passed float from while/body/_1/while/decoder_1/gru_3/PartitionedCall:6 incompatible with expected variant.
{'batch_loss': <tf.Tensor: shape=(), dtype=float32, numpy=4.0628138>}

which they explain will disappear after couple of training loops.

Interestingly enough, when I run the same code on Google Colab I don't have any errors

Question:

What these errors mean and should I pay attention to them?
Does not having these errors on Colab mean there is a problem with locally compiled TF?

The text was updated successfully, but these errors were encountered:

mohantym · 2021-09-27T09:49:17Z

Hi @sbushmanov ! Could you please provide a Colab gist for the same code as It will help expedite the issue.

google-ml-butler · 2021-10-04T09:59:16Z

This issue has been automatically marked as stale because it has no recent activity. It will be closed if no further activity occurs. Thank you.

google-ml-butler · 2021-10-11T10:40:27Z

Closing as stale. Please reopen if you'd like to work on this further.

sbushmanov added the type:performance Performance Issue label Sep 24, 2021

google-ml-butler bot assigned mohantym Sep 24, 2021

sbushmanov changed the title ~~Mysterious meta_optimizer.cc:801 error~~ Mysterious bunch of meta_optimizer.cc:801 errors Sep 24, 2021

mohantym added the 2.6.0 label Sep 27, 2021

mohantym added the stat:awaiting response Status - Awaiting response from author label Sep 27, 2021

google-ml-butler bot added the stale This label marks the issue/pr stale - to be closed automatically if no activity label Oct 4, 2021

google-ml-butler bot closed this as completed Oct 11, 2021

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Mysterious bunch of meta_optimizer.cc:801 errors #52124

Mysterious bunch of meta_optimizer.cc:801 errors #52124

sbushmanov commented Sep 24, 2021

mohantym commented Sep 27, 2021

google-ml-butler bot commented Oct 4, 2021

google-ml-butler bot commented Oct 11, 2021

Mysterious bunch of meta_optimizer.cc:801 errors #52124

Mysterious bunch of meta_optimizer.cc:801 errors #52124

Comments

sbushmanov commented Sep 24, 2021

mohantym commented Sep 27, 2021

google-ml-butler bot commented Oct 4, 2021

google-ml-butler bot commented Oct 11, 2021