Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Mysterious bunch of meta_optimizer.cc:801 errors #52124

Closed
sbushmanov opened this issue Sep 24, 2021 · 3 comments
Closed

Mysterious bunch of meta_optimizer.cc:801 errors #52124

sbushmanov opened this issue Sep 24, 2021 · 3 comments
Assignees
Labels
2.6.0 stale This label marks the issue/pr stale - to be closed automatically if no activity stat:awaiting response Status - Awaiting response from author type:performance Performance Issue

Comments

@sbushmanov
Copy link

  • OS Platform and Distribution: Linux Ubuntu 18.04
  • TensorFlow compiled from source master, via Docker
  • TensorFlow version 2.7.0
  • Python version: 3.9.7
  • Bazel version: 3.7.2
  • CUDA/cuDNN version: 11.4/8.2
  • GPU model and memory: NVIDIA 1080 Ti

I'm getting the following error messages which I don't understand if I have to pay attention to or disregard:

2021-09-24 18:59:43.243685: E tensorflow/core/framework/resource_handle.cc:39] A ref-counted ResourceHandle cannot be serialized losslesslyDeserializing the result is a failure: ShuffleDatasetV3/SeedGenerator_2
2021-09-24 18:59:53.258130: I tensorflow/core/kernels/data/shuffle_dataset_op.cc:380] Filling up shuffle buffer (this may take a while): 2799 of 10000
2021-09-24 19:00:03.260611: I tensorflow/core/kernels/data/shuffle_dataset_op.cc:380] Filling up shuffle buffer (this may take a while): 5660 of 10000
2021-09-24 19:00:13.259760: I tensorflow/core/kernels/data/shuffle_dataset_op.cc:380] Filling up shuffle buffer (this may take a while): 8506 of 10000
2021-09-24 19:00:18.494994: I tensorflow/core/kernels/data/shuffle_dataset_op.cc:405] Shuffle buffer filled.
2021-09-24 19:00:20.879185: E tensorflow/core/grappler/optimizers/meta_optimizer.cc:801] function_optimizer failed: INVALID_ARGUMENT: Node 'model_lstm_partitionedcall_24_RetVal': Connecting to invalid output 27 of source node model/lstm/PartitionedCall which has 27 outputs.
2021-09-24 19:00:20.901732: E tensorflow/core/grappler/optimizers/meta_optimizer.cc:801] layout failed: OUT_OF_RANGE: src_output = 27, but num_outputs is only 27
2021-09-24 19:00:20.929210: E tensorflow/core/grappler/optimizers/meta_optimizer.cc:801] function_optimizer failed: INVALID_ARGUMENT: Node 'model_lstm_partitionedcall_24_RetVal': Connecting to invalid output 27 of source node model/lstm/PartitionedCall which has 27 outputs.
2021-09-24 19:00:20.970470: W tensorflow/core/common_runtime/process_function_library_runtime.cc:859] Ignoring multi-device function optimization failure: INVALID_ARGUMENT: Input 0 of node model_lstm_partitionedcall_2_RetVal was passed bool from model/lstm/PartitionedCall:5 incompatible with expected int32.
2021-09-24 19:00:21.214614: E tensorflow/core/grappler/optimizers/meta_optimizer.cc:801] function_optimizer failed: INVALID_ARGUMENT: Input 5 of node gradients/decoder/lstm_1/PartitionedCall_grad/PartitionedCall was passed int32 from gradients_decoder_lstm_1_partitionedcall_grad_decoder_lstm_1_partitionedcall:0 incompatible with expected bool.
2021-09-24 19:00:21.275349: E tensorflow/core/grappler/optimizers/meta_optimizer.cc:801] function_optimizer failed: INVALID_ARGUMENT: Input 5 of node gradients/decoder/lstm_1/PartitionedCall_grad/PartitionedCall was passed int32 from gradients_decoder_lstm_1_partitionedcall_grad_decoder_lstm_1_partitionedcall:0 incompatible with expected bool.
2021-09-24 19:00:21.331375: W tensorflow/core/common_runtime/process_function_library_runtime.cc:859] Ignoring multi-device function optimization failure: INVALID_ARGUMENT: Input 5 of node gradients/decoder/lstm_1/PartitionedCall_grad/PartitionedCall was passed int32 from gradients_decoder_lstm_1_partitionedcall_grad_decoder_lstm_1_partitionedcall:0 incompatible with expected bool.

This happens while running custom training loop:

num_epochs = 20
optimizer = Adam(learning_rate=0.001)
train_loss_result = []
val_loss_result = []


@tf.function
def ger_preproc(ger):
    inputs = ger[:, :-1]
    outputs = ger[:, 1:]
    return inputs, outputs


@tf.function
def masked_loss(true_german, predicted_german):
    loss = SparseCategoricalCrossentropy(from_logits=True, reduction='none')(true_german, predicted_german)
    mask = tf.cast(true_german != 0, tf.float32)
    loss *= mask
    return tf.reduce_mean(loss)


@tf.function
def forward_pass(eng_inputs, ger_inputs):
    g_in, g_out = ger_preproc(ger_inputs)
    hidden_state, cell_state = encoder(eng_inputs)
    predicted_german, _, _ = decoder(g_in, hidden_state, cell_state)
    current_loss = masked_loss(g_out, predicted_german)
    return current_loss


for epoch in range(num_epochs):
    train_ds, val_ds = make_dss()
    train_loss = Mean()
    val_loss = Mean()

    # train
    for eng_inputs, ger_inputs in train_ds:
        with tf.GradientTape() as t:
            current_loss = forward_pass(eng_inputs, ger_inputs)
        trainable_vars = encoder.trainable_variables + decoder.trainable_variables
        grads = t.gradient(current_loss, trainable_vars)
        optimizer.apply_gradients(zip(grads, trainable_vars))
        train_loss(current_loss)

    # validate
    for eng_inputs, ger_inputs in val_ds:
        current_loss = forward_pass(eng_inputs, ger_inputs)
        val_loss(current_loss)

The funny thing here they have similar 801 errors:

2021-08-31 11:08:27.919851: E tensorflow/core/grappler/optimizers/meta_optimizer.cc:801] function_optimizer failed: Invalid argument: Input 6 of node gradient_tape/while/while_grad/body/_531/gradient_tape/while/gradients/while/decoder_1/gru_3/PartitionedCall_grad/PartitionedCall was passed variant from gradient_tape/while/while_grad/body/_531/gradient_tape/while/gradients/while/decoder_1/gru_3/PartitionedCall_grad/TensorListPopBack_2:1 incompatible with expected float.
2021-08-31 11:08:28.004195: E tensorflow/core/grappler/optimizers/meta_optimizer.cc:801] shape_optimizer failed: Out of range: src_output = 25, but num_outputs is only 25
2021-08-31 11:08:28.044145: E tensorflow/core/grappler/optimizers/meta_optimizer.cc:801] layout failed: Out of range: src_output = 25, but num_outputs is only 25
2021-08-31 11:08:28.169643: E tensorflow/core/grappler/optimizers/meta_optimizer.cc:801] function_optimizer failed: Invalid argument: Input 6 of node gradient_tape/while/while_grad/body/_531/gradient_tape/while/gradients/while/decoder_1/gru_3/PartitionedCall_grad/PartitionedCall was passed variant from gradient_tape/while/while_grad/body/_531/gradient_tape/while/gradients/while/decoder_1/gru_3/PartitionedCall_grad/TensorListPopBack_2:1 incompatible with expected float.
2021-08-31 11:08:28.227653: E tensorflow/core/grappler/optimizers/meta_optimizer.cc:801] shape_optimizer failed: Out of range: src_output = 25, but num_outputs is only 25
2021-08-31 11:08:28.301920: W tensorflow/core/common_runtime/process_function_library_runtime.cc:841] Ignoring multi-device function optimization failure: Invalid argument: Input 1 of node while/body/_1/while/TensorListPushBack_56 was passed float from while/body/_1/while/decoder_1/gru_3/PartitionedCall:6 incompatible with expected variant.
{'batch_loss': <tf.Tensor: shape=(), dtype=float32, numpy=4.0628138>}

which they explain will disappear after couple of training loops.

Interestingly enough, when I run the same code on Google Colab I don't have any errors

Question:

  1. What these errors mean and should I pay attention to them?
  2. Does not having these errors on Colab mean there is a problem with locally compiled TF?
@sbushmanov sbushmanov added the type:performance Performance Issue label Sep 24, 2021
@sbushmanov sbushmanov changed the title Mysterious meta_optimizer.cc:801 error Mysterious bunch of meta_optimizer.cc:801 errors Sep 24, 2021
@mohantym mohantym added the 2.6.0 label Sep 27, 2021
@mohantym
Copy link
Contributor

Hi @sbushmanov ! Could you please provide a Colab gist for the same code as It will help expedite the issue.

@mohantym mohantym added the stat:awaiting response Status - Awaiting response from author label Sep 27, 2021
@google-ml-butler
Copy link

This issue has been automatically marked as stale because it has no recent activity. It will be closed if no further activity occurs. Thank you.

@google-ml-butler google-ml-butler bot added the stale This label marks the issue/pr stale - to be closed automatically if no activity label Oct 4, 2021
@google-ml-butler
Copy link

Closing as stale. Please reopen if you'd like to work on this further.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
2.6.0 stale This label marks the issue/pr stale - to be closed automatically if no activity stat:awaiting response Status - Awaiting response from author type:performance Performance Issue
Projects
None yet
Development

No branches or pull requests

2 participants