New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Error in graph analyzer when using universal sentence encoder from tf_hub as per tutorial #160
Comments
I'd like to add that any tf.keras layer also doesn't work within the preprocess_fn. For example, hub.KerasLayer(module_url), or ever tf.keras.layers.Embedding would not work since it throws errors regarding tf.Sessions. I tried using a tf.function decorator but it also would not work and throw errors regarding missing graphs. |
@jusjosgra, |
@rmothukuru thanks for updating the status of this - could you explain what status: awaiting tensorflower means please? |
@jusjosgra, |
ah, its tensorflow-er. I thought it was tensor-flower, like a module for distributed growth or something. I saw there are a lot of issues awaiting tensorflowers, how many people are supporting the transform project? We are hoping to include it in production workflows but it seems a bit too unstable atm. I realise its pre 1.0. |
We're having exact same issue, both locally and in Dataflow. |
We're having exact same issue, both locally and in Dataflow, too |
Could you let me know what version of Transform/TFX you are using? A recent commit should have fixed this. The commit is in the Transform 0.23 release and should be in the TFX 0.23 release. Thanks! |
@varshaan , I try again after installing TFX 0.23..but It occurs new error below: Tensor EncoderDNN/CNN_layers/LayerNorm/beta is not found in b'gs://bts_pan//transform_temp_dir/tftransform_tmp/906257a511934b83a52c74996b94ba03/variables/variables' checkpoint {'EncoderDNN/DNN/ResidualHidden_3/dense/kernel/part_9': [30, 512], 'EncoderDNN/DNN/ResidualHidden_3/dense/kernel/part_7': [30, 512],.... I currently use "https://tfhub.dev/google/universal-sentence-encoder-multilingual/3" model... |
Could you please post a snippet of your preprocessing_fn and the code where this error is being raised? Is it being raised in Transform or during Training? |
@varshaan ! preprocessing_fn encoder = None
def preprocess_fn(input_features):
import tensorflow_transform as tft
embedding = embed_text(input_features['data'])
output_features = {
'id': input_features['id'],
'logkey': input_features['logkey'],
'data': input_features['data'],
'embedding': embedding
}
return output_features
def embed_text(text):
import numpy as np
import tensorflow_hub as hub
import tensorflow_text
global encoder
use_url = "https://tfhub.dev/google/universal-sentence-encoder-multilingual/3"
if encoder is None:
encoder = hub.load(use_url)
outputs = encoder(text)
return outputs pipeline transform_temp_dir = DEST_DIR + '/transform_temp_dir'
with tft_beam.Context(transform_temp_dir):
pipeline = beam.Pipeline(runner, options=opts)
raw_data =(
pipeline
| 'Read from BigQuery' >> beam.io.ReadFromBigQuery(project='project-m', query=sql, use_standard_sql=True,validate=True,
flatten_results=False)
)
dataset = (raw_data, get_metadata())
result1 = (
raw_data
| 'Write raw data to gcs' >> beam.io.WriteToText(DEST_DIR + job_name + '/raw_output'+ '/output',file_name_suffix='.txt')
)
transformed_dataset, _ = (
dataset
| 'Embedding data' >> tft_beam.AnalyzeAndTransformDataset(preprocess_fn)
) error /opt/conda/lib/python3.7/site-packages/apache_beam/runners/common.cpython-37m-x86_64-linux-gnu.so in apache_beam.runners.common.SimpleInvoker.invoke_process() /opt/conda/lib/python3.7/site-packages/apache_beam/transforms/core.py in (x) /opt/conda/lib/python3.7/site-packages/tensorflow_transform/beam/impl.py in _infer_metadata_from_saved_model(saved_model_dir) /opt/conda/lib/python3.7/site-packages/tensorflow_transform/saved/saved_transform_io.py in partially_apply_saved_transform_internal(saved_model_dir, logical_input_map, tensor_replacement_map) /opt/conda/lib/python3.7/site-packages/tensorflow_transform/saved/saved_transform_io.py in _partially_apply_saved_transform_impl(saved_model_dir, logical_input_map, tensor_replacement_map) /opt/conda/lib/python3.7/site-packages/tensorflow/python/training/checkpoint_utils.py in init_from_checkpoint(ckpt_dir_or_file, assignment_map) /opt/conda/lib/python3.7/site-packages/tensorflow/python/distribute/distribute_lib.py in merge_call(self, merge_fn, args, kwargs) /opt/conda/lib/python3.7/site-packages/tensorflow/python/distribute/distribute_lib.py in _merge_call(self, merge_fn, args, kwargs) /opt/conda/lib/python3.7/site-packages/tensorflow/python/autograph/impl/api.py in wrapper(*args, **kwargs) /opt/conda/lib/python3.7/site-packages/tensorflow/python/training/checkpoint_utils.py in (_) /opt/conda/lib/python3.7/site-packages/tensorflow/python/training/checkpoint_utils.py in _init_from_checkpoint(ckpt_dir_or_file, assignment_map) ValueError: Tensor EncoderDNN/CNN_layers/LayerNorm/beta is not found in b'gs://bts_pan//transform_temp_dir/tftransform_tmp/359590ecb90e4a188b379704e7852ad2/variables/variables' checkpoint {'EncoderDNN/DNN/ResidualHidden_3/dense/kernel/part_9': [30, 512], 'EncoderDNN/DNN/ResidualHidden_3/dense/kernel/part_7': [30, 512], 'EncoderDNN/DNN/ResidualHidden_3/dense/kernel/part_6': [30, 512], 'EncoderDNN/DNN/ResidualHidden_3/dense/kernel/part_4': [30, 512], 'EncoderDNN/DNN/ResidualHidden_3/dense/kernel/part_3': [30, 512], ** full sample code below from time import time
import tensorflow as tf
import apache_beam as beam
import tensorflow_transform.beam as tft_beam
import tensorflow_transform.coders as tft_coders
from apache_beam.options.pipeline_options import PipelineOptions
import tempfile
model = None
def embed_text(text):
import tensorflow_hub as hub
import tensorflow_text
global model
if model is None:
model = hub.load(
'https://tfhub.dev/google/universal-sentence-encoder-multilingual/3')
embedding = model(text)
return embedding
def get_metadata():
from tensorflow_transform.tf_metadata import dataset_schema
from tensorflow_transform.tf_metadata import dataset_metadata
metadata = dataset_metadata.DatasetMetadata(dataset_schema.Schema({
'id': dataset_schema.ColumnSchema(
tf.string, [], dataset_schema.FixedColumnRepresentation()),
'text': dataset_schema.ColumnSchema(
tf.string, [], dataset_schema.FixedColumnRepresentation())
}))
return metadata
def preprocess_fn(input_features):
text_integerized = embed_text(input_features['text'])
output_features = {
'id': input_features['id'],
'embedding': text_integerized
}
return output_features
def run(runner):
pipeline_options = beam.pipeline.PipelineOptions(None)
DEST_DIR = "gs://daehwan/"
job_name = 'dataflow-use-multilingual-{}'.format(str(time())[:10])
options = {
'runner': runner,
# 'num_workers' : 10,
# 'machine_type' : 'n1-highmem-16',
'staging_location': DEST_DIR + 'staging',
'temp_location': DEST_DIR + 'tmp',
'job_name': job_name,
'project': 'project',
'region' : 'us-central1',
# 'teardown_policy': 'TEARDOWN_ALWAYS',
# 'no_save_main_session': True ,
'save_main_session': False,
'service_account_email' : 'id@project.iam.gserviceaccount.com',
'setup_file' : './setup.py'
}
opts = beam.pipeline.PipelineOptions(flags=[], **options)
pipeline = beam.Pipeline(runner, options=opts)
transform_temp_dir = DEST_DIR + '/transform_temp_dir'
with tft_beam.Context(transform_temp_dir):
articles = (
pipeline
| beam.Create([
{'id':'01','text':'To be, or not to be: that is the question: '},
{'id':'02','text':"Whether 'tis nobler in the mind to suffer "},
{'id':'03','text':'The slings and arrows of outrageous fortune, '},
{'id':'04','text':'Or to take arms against a sea of troubles, '},
]))
articles_dataset = (articles, get_metadata())
transformed_dataset, transform_fn = (
articles_dataset
| 'Extract embeddings' >> tft_beam.AnalyzeAndTransformDataset(preprocess_fn)
)
transformed_data, transformed_metadata = transformed_dataset
# _ = (
# transformed_data | 'Write embeddings to TFRecords' >> beam.io.tfrecordio.WriteToTFRecord(
# file_path_prefix='{0}'.format(known_args.output_dir),
# file_name_suffix='.tfrecords',
# coder=tft_coders.example_proto_coder.ExampleProtoCoder(
# transformed_metadata.schema),
# num_shards=1
# )
# )
result = pipeline.run()
result.wait_until_finished()
runner = 'DirectRunner'
# runner = 'DataflowRunner
run(runner) |
Both hub modules mentioned in this issue are TF 2 hub modules. There is a new (still experimental) parameter in tft_beam.Context: force_tf_compat_v1. Setting this to False, will trace the preprocessing fn using TF 2. I tested these hub modules work post commit 25170b6 with force_tf_compat_v1 as False. Please re-open if you see any issues with this path. |
Using the very similar code here, setting force_tf_compat_v1=False did solve the Tensor Not Found error (I'm using the USE v.5 which is a TF2 hub module) but now there's an assertion error like below. Any insights on this? AssertionError: Tried to export a function which references untracked object Tensor("7946:0", shape=(), dtype=resource).TensorFlow objects (e.g. tf.Variable) captured by functions must be tracked by assigning them to an attribute of a tracked object or assigned to an attribute of the main object directly. [while running 'Extract embeddings/AnalyzeDataset/CreateSavedModel/CreateSavedModel'] |
I only see a V4 here: https://tfhub.dev/google/universal-sentence-encoder/4, Could you give me a link to your hub module? Also, could you please give me details as to what version of Transform you are using? The fix to this issue is not yet in a release. |
Sure! The transformer based USE V.5 is available here: https://tfhub.dev/google/universal-sentence-encoder-large/5 I'm using tensorflow-transform version 0.24.1 |
I tested that hub module and it works at the Github master branch. As mentioned in my previous comment, the commit is not in a release yet. If possible you could try using transform from master. Alternately, you can wait and try it with 0.25. If you still face an error, please open a new issue as this one has currently been marked closed. Thanks!_ |
I am getting an error running the following code in direct runner
code:
called with:
and
error:
It looks like the graph analyzer is expecting a list of ops with a type attribute but is being passed a tensor instead. It is unclear to me what is going wrong here. Any help would be greatly appreciated!
The text was updated successfully, but these errors were encountered: