Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

USE latest version can't be used with MirroredStrategy ("Trying to access a placeholder that is not supposed to be executed.") #515

Closed
eduardofv opened this issue Feb 20, 2020 · 23 comments
Assignees
Labels
bug Something isn't working hub For all issues related to tf hub library and tf hub tutorials or examples posted by hub team stat:awaiting tensorflower subtype:text-embedding

Comments

@eduardofv
Copy link

The latest versions of USE throw this error when used in a MirroredStrategy. (tf: 2.1.0, keras: 2.2.4-tf, hub: 0.7.0)

import tensorflow as tf
import tensorflow.keras as keras
import tensorflow_hub as hub
import tensorflow_text as text

#This USE models fail with " InvalidArgumentError:  assertion failed: [Trying to access a placeholder that is not supposed to be executed"
LM = "https://tfhub.dev/google/universal-sentence-encoder-multilingual-large/3"
#LM = "https://tfhub.dev/google/universal-sentence-encoder-multilingual/3"
#LM = "https://tfhub.dev/google/universal-sentence-encoder-large/5"
#LM = "https://tfhub.dev/google/universal-sentence-encoder/4"
DIM = 512

strategy = tf.distribute.MirroredStrategy()
with strategy.scope():
    model = keras.models.Sequential()
    model.add(
        hub.KerasLayer(LM,
                       output_shape=DIM,
                       input_shape=[],
                       dtype=tf.string)
    )
    model.add(keras.layers.Dense(1, activation='sigmoid'))
    model.compile(optimizer="adam", loss="binary_crossentropy")

model.summary()

Throws:

INFO:tensorflow:Using MirroredStrategy with devices ('/job:localhost/replica:0/task:0/device:GPU:0',)
---------------------------------------------------------------------------
InvalidArgumentError                      Traceback (most recent call last)
<ipython-input-2-6b41a257fb9c> in <module>()
     32                        output_shape=DIM,
     33                        input_shape=[],
---> 34                        dtype=tf.string)
     35     )
     36     model.add(keras.layers.Dense(1, activation='sigmoid'))

12 frames
/usr/local/lib/python3.6/dist-packages/six.py in raise_from(value, from_value)

InvalidArgumentError:  assertion failed: [Trying to access a placeholder that is not supposed to be executed. This means you are executing a graph generated from cross-replica context in an in-replica context.]
	 [[node Assert/Assert (defined at /usr/local/lib/python3.6/dist-packages/tensorflow_hub/module_v2.py:95) ]] [Op:__inference_restored_function_body_45855]

Function call stack:
restored_function_body

Interestingly, NNLM works fine and also USE-large v3 (albeit some warnings, not sure if they affect it's later performance). Also, if OneDeviceStrategy or no strategy is used everything works.

Check this colab with code and test cases: https://colab.research.google.com/drive/1_YaGYje4tXPyQDx_hYaj9VNLAA5hi3Dg

@gowthamkpr gowthamkpr self-assigned this Feb 21, 2020
@gowthamkpr gowthamkpr added subtype:text-embedding bug Something isn't working hub For all issues related to tf hub library and tf hub tutorials or examples posted by hub team labels Feb 21, 2020
@mercarikaicheung
Copy link

Any updates on this also facing this issue?

@arnoegw arnoegw assigned arnoegw and unassigned vbardiovskyg Mar 11, 2020
@arnoegw
Copy link
Contributor

arnoegw commented Mar 11, 2020

Confirmed: there is an issue. Thank you, @eduardofv, for the very clear and reproducible report!

@mercarikaicheung, there is no useful update yet, sorry.

@guptapriya
Copy link

@arnoegw looking at the stacktrace, it seems that the model.add(hub.KerasLayer(...)) is actually trying to execute some ops. Is that expected? Usually I would expect loading a model to only construct the model but not try to execute it. Also, do you know what the differences between USE version 5 and 3 are (since 3 seems to work but not 5?)

@eduardofv
Copy link
Author

From the documentation of USE large: https://tfhub.dev/google/universal-sentence-encoder-large/5

Changelog

Version 1

  • Initial release.

Version 2

  • Exposed internal variables as Trainable.

Version 3

  • Fixed batch invariant bug. This version was retrained and its embedding space differs from previous versions.

Version 4

  • Retrained using TF2.

Version 5

@mercarikaicheung
Copy link

Hi everyone,
I was wondering if there are any updates on this?

@guptapriya
Copy link

Not from the distribution strategy side since it appears that this model might have been saved in a non standard way. The TF Hub team is working on new version of the USE that will be built natively in TF2 and should work correctly.

@statikkkkk
Copy link

I am running into the same issue. Any timeline on the fixes, @guptapriya ?

@arnoegw
Copy link
Contributor

arnoegw commented Jul 10, 2020

@jaxlaw?

@crccw
Copy link
Member

crccw commented Jul 10, 2020

Hi all, the issue was because this model was converted from TF1 SavedModel to TF2 SavedModel. There's some tricky issues with loading such models under tf.distribute.Strategy. The team is working on a TF2 native version of the model.

@r-wheeler
Copy link

r-wheeler commented Jul 31, 2020

I am able to reproduce this same error using bert on 1 machine with 8 v100 gpus using mirrored strategy. It did not occur when running the same code (and data) on 4 v100 gpus

Traceback (most recent call last):
...
tensorflow.python.framework.errors_impl.InvalidArgumentError: 2 root error(s) found.
  (0) Invalid argument:  assertion failed: [Trying to access a placeholder that is not supposed to be executed. This means you are executing a graph generated from the cross-replica context in an in-replica context.]
     [[{{node bert_embeddings/keras_layer/StatefulPartitionedCall/Assert/Assert}}]]
     [[cond/else/_144/Maximum/_388]]
  (1) Invalid argument:  assertion failed: [Trying to access a placeholder that is not supposed to be executed. This means you are executing a graph generated from the cross-replica context in an in-replica context.]
     [[{{node bert_embeddings/keras_layer/StatefulPartitionedCall/Assert/Assert}}]]
0 successful operations.
0 derived errors ignored. [Op:__inference_call_240207]

Function call stack:
call -> call

@RobRomijnders
Copy link

Changing from strategy.scope to strategy.run resolved the issue for me. Not sure if this solution is related to the thread though.

@eduardofv
Copy link
Author

eduardofv commented Oct 1, 2020

Changing from strategy.scope to strategy.run resolved the issue for me. Not sure if this solution is related to the thread though.

Great. Currently I'm not working on this but if I return to it I'll check this solution. Thanks

@dkorkinof
Copy link

Hi all, any news on this issue?

@RobRomijnders
Copy link

@dkorkinof What issue are you facing exactly? Irrespective of the distribution strategy, defining ops within strategy.scope and accessing them via strategy.run resolved the issue for me.

@dkorkinof
Copy link

Basically, I am getting the exact same error with @eduardofv.
More specifically with TF 2.3.1 and Hub 0.10.0 the following code:

import tensorflow as tf
import tensorflow_hub as hub
import tensorflow_text

strategy = tf.distribute.MirroredStrategy()
with strategy.scope():
    model = hub.load("https://tfhub.dev/google/universal-sentence-encoder-multilingual-large/3")

Throws this exception:

Traceback (most recent call last):
  File "test.py", line 10, in <module>
    model = hub.load("https://tfhub.dev/google/universal-sentence-encoder-multilingual-large/3")
  File "python3.7/site-packages/tensorflow_hub/module_v2.py", line 106, in load
    obj = tf.compat.v1.saved_model.load_v2(module_path, tags=tags)
  File "python3.7/site-packages/tensorflow/python/saved_model/load.py", line 603, in load
    return load_internal(export_dir, tags, options)
  File "python3.7/site-packages/tensorflow/python/saved_model/load.py", line 633, in load_internal
    ckpt_options)
  File "python3.7/site-packages/tensorflow/python/saved_model/load.py", line 135, in __init__
    init_op = node._initialize()  # pylint: disable=protected-access
  File "python3.7/site-packages/tensorflow/python/eager/def_function.py", line 780, in __call__
    result = self._call(*args, **kwds)
  File "python3.7/site-packages/tensorflow/python/eager/def_function.py", line 846, in _call
    return self._concrete_stateful_fn._filtered_call(canon_args, canon_kwds)  # pylint: disable=protected-access
  File "python3.7/site-packages/tensorflow/python/eager/function.py", line 1848, in _filtered_call
    cancellation_manager=cancellation_manager)
  File "python3.7/site-packages/tensorflow/python/eager/function.py", line 1924, in _call_flat
    ctx, args, cancellation_manager=cancellation_manager))
  File "python3.7/site-packages/tensorflow/python/eager/function.py", line 550, in call
    ctx=ctx)
  File "python3.7/site-packages/tensorflow/python/eager/execute.py", line 60, in quick_execute
    inputs, attrs, num_outputs)
tensorflow.python.framework.errors_impl.InvalidArgumentError:  assertion failed: [Trying to access a placeholder that is not supposed to be executed. This means you are executing a graph generated from the cross-replica context in an in-replica context.]
	 [[node Assert/Assert (defined at python3.7/site-packages/tensorflow_hub/module_v2.py:106) ]] [Op:__inference_restored_function_body_49292]

Function call stack:
restored_function_body

It seems what @crccw mentioned checks out, as the model's tensorflow version appears as 1.15.0.
However automatically converting the TF1 model to TF2 is different to what is mentioned in the release notes: "Version 4 Retrained using TF2.".
Has there been any progress since @crccw 's last post in July?

@arnoegw arnoegw assigned crccw and unassigned arnoegw Feb 10, 2021
@arnoegw
Copy link
Contributor

arnoegw commented Feb 10, 2021

Thank you, @jaxlaw!

In light of @crccw's comment above, I think that's as much of a resolution we can hope for.

@arnoegw arnoegw closed this as completed Feb 10, 2021
@yhethanchen-tw
Copy link

hmm... after downloading the new model, I still face the same issue.

@AlexSchumi
Copy link

Anyone has updates on this issue?

@MorganR
Copy link
Contributor

MorganR commented May 4, 2021

It looks like this is working with the new model. Example:

import tensorflow as tf
import tensorflow.keras as keras
import tensorflow_hub as hub
import tensorflow_text as text

def prepare_model():
  inputs = keras.Input(shape=[], dtype=tf.string)
  preprocessor = hub.KerasLayer("https://tfhub.dev/google/universal-sentence-encoder-cmlm/multilingual-preprocess/2")(inputs)
  embedding = hub.KerasLayer("https://tfhub.dev/google/universal-sentence-encoder-cmlm/multilingual-base/1")(preprocessor)
  return keras.Model(inputs=inputs, outputs=embedding["default"])

strategy = tf.distribute.MirroredStrategy()
with strategy.scope():
  model = prepare_model()
  model.compile(optimizer="adam", loss="binary_crossentropy")

# Make sure you're running the model outside of the strategy's scope!
model(tf.constant(["dog", "cat"]))

If folks are still running into issues, could you please share some example code and specific error messages?

@mercarikaicheung
Copy link

Still having error on TF 2.3 with with https://tfhub.dev/google/universal-sentence-encoder/4
`InvalidArgumentError: assertion failed: [Trying to access a placeholder that is not supposed to be executed. This means you are executing a graph generated from the cross-replica context in an in-replica context.]
[[node Assert/Assert (defined at /opt/conda/lib/python3.7/site-packages/tensorflow_hub/module_v2.py:114) ]] [Op:__inference_restored_function_body_39384]

Function call stack:
restored_function_body`

@EdwardCuiPeacock
Copy link

This is still an issue to me with TF2.3 and the latest version of USE: https://tfhub.dev/google/universal-sentence-encoder-large/5

InvalidArgumentError:  assertion failed: [Trying to access a placeholder that is not supposed to be executed. This means you are executing a graph generated from the cross-replica context in an in-replica context.]
	 [[node Assert/Assert (defined at /opt/conda/lib/python3.7/site-packages/tensorflow_hub/module_v2.py:114) ]] [Op:__inference_restored_function_body_73316]

Function call stack:
restored_function_body

@freshforlife
Copy link

Here is a minimum reproducible example for tf.distribute.MirroredStrategy() with USE latest version which throws up an error.

import tensorflow as  tf
import tensorflow_hub as hub
import numpy as np
import datetime, os
import pandas as pd
import datetime, os
import tensorflow.keras as keras 

## These are my tensorflow , keras , tensorflow_hub and python versions: 

print(tf.__version__, hub.__version__, tf.keras.__version__, sys.version)
## 2.2.0 0.8.0  2.3.0-tf  3.8.0 (default, Nov  6 2019, 21:49:08) 

## Text and labels for classification
text = [

    "SOCCER",
    "CRICKET",
    "BASKETBALL",

    "400 METRES SPRINT",
    "JAVELIN THROW",
    "TRIPLE JUMP",


    "FREESTYLE RELAY",
    "BACKSTROKE",
    "MEDLEY RELAY",

    "SAILING",
    "ROWING",
    "SURFING",
]

label = [0,0,0,1,1,1,2,2,2,3,3,3]

# Definition of hub.KerasLayer with USE large
module_obj = 'https://tfhub.dev/google/universal-sentence-encoder-large/5'
embed =        hub.KerasLayer(module_obj, trainable= True)

def create_model():
    return tf.keras.models.Sequential(
            [
                tf.keras.layers.Input(shape = [], dtype=tf.string),
                embed,
                tf.keras.layers.Dense(4,activation='softmax')]
            )


mirrored_strategy = tf.distribute.MirroredStrategy()
with mirrored_strategy.scope():
    model = create_model()
    model.compile(optimizer = 'adam', loss = 'categorical_crossentropy', metrics = ['accuracy'])

Output after this :

INFO:tensorflow:Using MirroredStrategy with devices ('/job:localhost/replica:0/task:0/device:GPU:0', '/job:localhost/replica:0/task:0/device:GPU:1')
INFO:tensorflow:Using MirroredStrategy with devices ('/job:localhost/replica:0/task:0/device:GPU:0', '/job:localhost/replica:0/task:0/device:GPU:1')
ValueError: Variable (<tf.Variable 'Embeddings/sharded_0:0' shape=(13334, 320) dtype=float32, numpy=
array([[-1.1641915e+00, -4.0162230e+00, -1.0308318e+00, ...,
1.1064901e-01, 2.9283254e+00, 1.4386557e+00],
[-3.7163803e-01, -1.8228565e-01, -4.0679860e-01, ...,
1.6463491e-01, -1.2669672e-01, -1.8727861e-01],
[ 6.4132787e-02, 2.4964781e-01, -5.7500858e-02, ...,
2.7740359e-01, -7.3342794e-01, -1.5283586e-01],
...,
[-4.7713269e-02, 1.7435255e-02, 1.7971721e-01, ...,
-2.6605502e-02, -6.8220109e-02, 5.4901250e-02],
[-8.2942925e-02, 8.3685674e-02, 4.9772050e-02, ...,
9.7135836e-03, -3.4118034e-02, -7.6729544e-03],
[-1.7106453e-02, 9.3901977e-02, -1.6374167e-02, ...,
4.9962573e-02, 9.2947654e-02, -1.7278243e-03]], dtype=float32)>) was not created in the distribution strategy scope of (<tensorflow.python.distribute.mirrored_strategy.MirroredStrategy object at 0x7f79a45c9790>). It is most likely due to not all layers or the model or optimizer being created outside the distribution strategy scope. Try to make sure your code looks similar to the following.
with strategy.scope():
model=_create_model()
model.compile(...)

@arnoegw , @MorganR : Any insights on how to resolve this error.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working hub For all issues related to tf hub library and tf hub tutorials or examples posted by hub team stat:awaiting tensorflower subtype:text-embedding
Projects
None yet
Development

No branches or pull requests