-
Notifications
You must be signed in to change notification settings - Fork 74.2k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
E tensorflow/core/grappler/optimizers/meta_optimizer.cc:502] layout failed: Invalid argument: size of values 0 does not match size of permutation 4. #34499
Comments
same issue for me. |
@wkdgnsgo, Please paste the standalone code to reproduce the issue. |
I have the exact same issue training a cycleGAN, my operations are built in |
I get the same error message(/warning?). Quite a large I2I translation project, so it's hard to make minimal reproducing code. An issue that might be related is a shape error that is produced if the A par of subclassed self.loss_object = tf.keras.losses.MeanSquaredError()
self.optimizer = tf.keras.optimizers.Adam(tf.keras.optimizers.schedules.ExponentialDecay(...))
def train_step(x,y,cross_loss_weight)
with tf.GradientTape as tape:
y_hat, x_hat = f_x(x), f_y(y)
x_tilde, y_tilde = f_y(y_hat), f_x(x_hat)
fx_loss = {
"cross": self.loss_object(y, y_hat, clw),
"cycle": self.loss_object(x, x_tilde),
"reg": sum(self._fx.losses), # regularization from submodel
}
fx_loss = {key: self.lambdas[key] * value for key, value in fx_loss.items()}
fx_total_loss = sum(fx_loss.values())
... same for f_y...
gradient_targets = self._fx.trainable_variables + self._fy.trainable_variables
gradients = tape.gradient(loss_value, gradient_targets)
self.optimizer.apply_gradients(zip(gradients, gradient_targets)) So it seems the use is similar to @miguelalba96 with a cyclic loss term, but no adversarial term in my case. The model is hardly a DAG, but rather two DAGs trained in conjunction. Can the issue be related to this? |
I am training a VAE and I encountered the same issue and managed to remove the error but I do not understand what caused the issue. When I run this code I get the error: @tf.function
def forward(x_real):
eps = tf.random.normal([FLAGS.batch_size, FLAGS.latent_dim])
z_mu, z_log_sigma = E(x_real, training=True)
z = z_mu + tf.exp(z_log_sigma) * eps
x_real_mu = D(z, training=True)
kl_loss = tf.reduce_mean(
kl_divergence(z_mu, z_log_sigma))
ll_loss = tf.reduce_mean(
negative_log_likelyhood(x_real, x_real_mu))
return x_real_mu, ll_loss, kl_loss but the error vanishes if I run this one: @tf.function
def forward(x_real):
eps = tf.random.normal([FLAGS.batch_size, FLAGS.latent_dim])
z_mu, z_log_sigma = E(x_real, training=True)
z = z_mu + tf.exp(z_log_sigma) * eps
x_real_mu = D(z, training=True)
kl_loss = tf.reduce_mean(
kl_divergence(z_mu, z_log_sigma))
ll_loss = tf.reduce_mean(
negative_log_likelyhood(x_real+0.0, x_real_mu)) # <-- change here
return x_real_mu, ll_loss, kl_loss I wonder why the first version raises the error... |
@AntoinePlumerault, Could provide the complete code snippet to replicate the reported issue. Thanks! |
Automatically closing due to lack of recent activity. Please update the issue when new information becomes available, and we will reopen the issue. Thanks! |
I confirm the same issue. I am working with python 3.7 and tensorflow-gpu 2.0 release installed from pip. As @AntoinePlumerault mentioned, I could solve the issue by changing line 173, to this,
|
Similar issue. Here is my solution. In my case the error was caused by the following code:
Which gives Looks like in the layout optimization stage tensors are transposed from NHWC to NCHW for performance, but some are failed.
The best solution is writing your model in NCHW format, so you can skip the layout optimization stage. Here a tutorial on what tf does in graph optimization stage. |
This problem seems to come back on the latest TF versions. |
Another workaround is to disable the |
I solve it using this workaround in colab file_path = '/content/models/research/object_detection/model_main_tf2.py'
# Read the content of the file
with open(file_path, 'r') as file:
content = file.readlines()
# Find the index of the import tensorflow line
import_line_index = next(i for i, line in enumerate(content) if 'import tensorflow' in line)
# Define the fix command to insert
fix_command = "tf.config.optimizer.set_experimental_options({'layout_optimizer': False})\n"
# Insert the fix command after the TensorFlow import
content.insert(import_line_index + 1, fix_command)
# Write the modified content back to the file
with open(file_path, 'w') as file:
file.writelines(content)
|
- added option for batch_normalization - added option for different activation functions (https://keras.io/api/layers/activations/#available-activations) - added option for dropout (generic/pixel-wise and spatial/feature-wise) including the option to adjust the dropout rate - had to add a workaround to get dropout to work as mentioned here: tensorflow/tensorflow#34499 (comment) - simplified _conv_block - removed outdated comments - improve typing
- added option for batch_normalization - added option for different activation functions (https://keras.io/api/layers/activations/#available-activations) - added option for dropout (generic/pixel-wise and spatial/feature-wise) including the option to adjust the dropout rate - had to add a workaround to get dropout to work as mentioned here: tensorflow/tensorflow#34499 (comment) - simplified _conv_block - removed outdated comments - improve typing
- added option for batch_normalization - added option for different activation functions (https://keras.io/api/layers/activations/#available-activations) - added option for dropout (generic/pixel-wise and spatial/feature-wise) including the option to adjust the dropout rate - had to add a workaround to get dropout to work as mentioned here: tensorflow/tensorflow#34499 (comment) - simplified _conv_block - removed outdated comments - improve typing
Already did this and still the problem. I reduced the batch size and it is running, praying to run smoothly now. Anyone share your experience ^^ |
Please make sure that this is a bug. As per our GitHub Policy, we only address code/doc bugs, performance issues, feature requests and build/installation issues on GitHub. tag:bug_template
System information
You can collect some of this information using our environment capture
script
You can also obtain the TensorFlow version with: 1. TF 1.0:
python -c "import tensorflow as tf; print(tf.GIT_VERSION, tf.VERSION)"
2. TF 2.0:python -c "import tensorflow as tf; print(tf.version.GIT_VERSION, tf.version.VERSION)"
Describe the current behavior
Originally, I built this model in tensorflow 1.1x and I transferred the model to TF 2.0 manually to use tf.keras. It is working but it shows me this error message (E tensorflow/core/grappler/optimizers/meta_optimizer.cc:502] layout failed: Invalid argument: size of values 0 does not match size of permutation 4.) and its performance is worse than tf 1.1x.
I suspect that this error interrupts to train somehow.
I didn't put any permutation layer in my model. It is hard to find it.
Describe the expected behavior
Code to reproduce the issue
Provide a reproducible test case that is the bare minimum necessary to generate the problem.
Other info / logs
Include any logs or source code that would be helpful to diagnose the problem. If including tracebacks, please include the full traceback. Large logs and files should be attached.
The text was updated successfully, but these errors were encountered: