Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Transform fails when setting force_tf_compat_v1=False #3272

Closed
ConverJens opened this issue Feb 19, 2021 · 40 comments
Closed

Transform fails when setting force_tf_compat_v1=False #3272

ConverJens opened this issue Feb 19, 2021 · 40 comments

Comments

@ConverJens
Copy link
Contributor

ConverJens commented Feb 19, 2021

System information - Have I specified the code to reproduce the issue
(Yes/No): Yes

  • Environment in which the code is executed (e.g., Local (Linux/MacOS/Windows), Interactive Notebook, Google Cloud, etc): KubeFlow
  • TensorFlow version (you are using): 2.4.0
  • TFX Version: 0.27.0
  • Python version: 3.7

Describe the current behavior
Using to tf.strings.substr operations in the preprocessing fn used in Transform when setting force_tf_compat_v1=False fails while running in KubeFlow. Setting force_tf_compat_v1=True works.
Note: running in interactive mode, using force_tf_compat_v1=True causes python to crash and hence can't be tested properly.

Describe the expected behavior
Use native tf2 behaviour should work.

Standalone code to reproduce the issue Providing a bare minimum test case or
step(s) to reproduce the problem will greatly help us to debug the issue. If
possible, please share a link to Colab/Jupyter/any notebook.
Using attached file as preprocessing in KubeFlow with data generated by running (transform.py.zip):

data_path = "data.csv"
df = pd.DataFrame(data=[[random.randint(0,100), '2021-01-12T11:34:08'] for i in range(0, 100)], columns=["random_int", "datetime"])
df.head()
df.to_csv(os.path.join(data_root, data_path), index=False)

Other info / logs
This is the operation that I'm using in my preprocessing fn:

dt_str = tf.constant('2021-01-12T11:34:08')

year_str = tf.strings.substr(dt_str, pos=0, len=4, unit='UTF8_CHAR')
month_str = tf.strings.substr(dt_str, pos=5, len=2, unit='UTF8_CHAR')
day_str = tf.strings.substr(dt_str, pos=8, len=2, unit='UTF8_CHAR')
hour_str = tf.strings.substr(dt_str, pos=11, len=2, unit='UTF8_CHAR')
minute_str = tf.strings.substr(dt_str, pos=14, len=2, unit='UTF8_CHAR')
second_str = tf.strings.substr(dt_str, pos=17, len=2, unit='UTF8_CHAR')

Running locally this works in both eager and graph mode by setting tf.config.run_functions_eagerly(False/True).
However when running it through the Transform component and setting force_tf_compat_v1=False it fails with the following message:

Traceback (most recent call last):
  File "/usr/lib/python3.7/runpy.py", line 193, in _run_module_as_main
    "__main__", mod_spec)
  File "/usr/lib/python3.7/runpy.py", line 85, in _run_code
    exec(code, run_globals)
  File "/usr/local/lib/python3.7/dist-packages/tfx/orchestration/kubeflow/container_entrypoint.py", line 360, in <module>
    main()
  File "/usr/local/lib/python3.7/dist-packages/tfx/orchestration/kubeflow/container_entrypoint.py", line 353, in main
    execution_info = launcher.launch()
  File "/usr/local/lib/python3.7/dist-packages/tfx/orchestration/launcher/base_component_launcher.py", line 209, in launch
    copy.deepcopy(execution_decision.exec_properties))
  File "/usr/local/lib/python3.7/dist-packages/tfx/orchestration/launcher/in_process_component_launcher.py", line 72, in _run_executor
    copy.deepcopy(input_dict), output_dict, copy.deepcopy(exec_properties))
  File "/pipeline/kubeflow/custom_components/transform_master/executor.py", line 493, in Do
    self.Transform(label_inputs, label_outputs, status_file)
  File "/pipeline/kubeflow/custom_components/transform_master/executor.py", line 1074, in Transform
    len(analyze_data_paths))
  File "/pipeline/kubeflow/custom_components/transform_master/executor.py", line 1209, in _RunBeamImpl
    preprocessing_fn, pipeline=pipeline))
  File "/usr/local/lib/python3.7/dist-packages/apache_beam/transforms/ptransform.py", line 1058, in __ror__
    return self.transform.__ror__(pvalueish, self.label)
  File "/usr/local/lib/python3.7/dist-packages/apache_beam/transforms/ptransform.py", line 573, in __ror__
    result = p.apply(self, pvalueish, label)
  File "/usr/local/lib/python3.7/dist-packages/apache_beam/pipeline.py", line 646, in apply
    return self.apply(transform, pvalueish)
  File "/usr/local/lib/python3.7/dist-packages/apache_beam/pipeline.py", line 689, in apply
    pvalueish_result = self.runner.apply(transform, pvalueish, self._options)
  File "/usr/local/lib/python3.7/dist-packages/apache_beam/runners/runner.py", line 188, in apply
    return m(transform, input, options)
  File "/usr/local/lib/python3.7/dist-packages/apache_beam/runners/runner.py", line 218, in apply_PTransform
    return transform.expand(input)
  File "/usr/local/lib/python3.7/dist-packages/tensorflow_transform/beam/impl.py", line 1140, in expand
    self).expand(self._make_parent_dataset(dataset))
  File "/usr/local/lib/python3.7/dist-packages/tensorflow_transform/beam/impl.py", line 1087, in expand
    evaluate_schema_overrides=False)
  File "/usr/local/lib/python3.7/dist-packages/tensorflow_transform/schema_inference.py", line 196, in infer_feature_schema_v2
    metadata = collections.defaultdict(list, concrete_metadata_fn(inputs))
  File "/usr/local/lib/python3.7/dist-packages/tensorflow/python/eager/function.py", line 1669, in __call__
    return self._call_impl(args, kwargs)
  File "/usr/local/lib/python3.7/dist-packages/tensorflow/python/eager/function.py", line 1679, in _call_impl
    cancellation_manager)
  File "/usr/local/lib/python3.7/dist-packages/tensorflow/python/eager/function.py", line 1762, in _call_with_structured_signature
    cancellation_manager=cancellation_manager)
  File "/usr/local/lib/python3.7/dist-packages/tensorflow/python/eager/function.py", line 1919, in _call_flat
    ctx, args, cancellation_manager=cancellation_manager))
  File "/usr/local/lib/python3.7/dist-packages/tensorflow/python/eager/function.py", line 560, in call
    ctx=ctx)
  File "/usr/local/lib/python3.7/dist-packages/tensorflow/python/eager/execute.py", line 60, in quick_execute
    inputs, attrs, num_outputs)
tensorflow.python.framework.errors_impl.InvalidArgumentError:  pos 5 out of range for string at index 0
	 [[node Substr_1 (defined at pipeline/kubeflow/model/preprocessing/transform.py:96) ]] [Op:__inference_metadata_fn_1820]
Errors may have originated from an input operation.
Input Source operations connected to node Substr_1:
 inputs_copy (defined at usr/local/lib/python3.7/dist-packages/tensorflow_transform/tf_utils.py:81)
Function call stack:
metadata_fn

When setting force_tf_compat_v1=False it works as expected.

@zoyahav
Copy link
Member

zoyahav commented Feb 22, 2021

Can you share with us statistics about this feature from StatisticsGen?
Specifically average/min/max length of the string in inputs["datetime"], and what percentage of it is missing in the dataset?

@ConverJens
Copy link
Contributor Author

@zoyahav This is an extremely small test dataset, ~300 data points, so min=max=average length=19 and no items are missing, as is confirmed by output from StatisticsGen. As mentioned, this works perfectly when setting force_tf_compat_v1=True

@ConverJens
Copy link
Contributor Author

@zoyahav This is the output from stats gen:
Screenshot 2021-02-22 at 14 55 52

@varshaan
Copy link

Thanks! I am looking into this. Will provide an update when I have a fix.

@ConverJens
Copy link
Contributor Author

@varshaan Thats great! Let me know if you need anything else from me.

@ConverJens
Copy link
Contributor Author

@varshaan Any update on this?

@varshaan
Copy link

Hi, sorry I was away last week and not able to leave a comment here. I submitted a change to tf.Transform that I think should address this. Could you try using the latest TFX nightly from here?

@ConverJens
Copy link
Contributor Author

@varshaan That's great! Our pipelines are still on TFX 0.27.0 and isn't fully TFX >= 0.28.0 compatible. I will try to bump TFX and try your fix as soon as I can and get back with the result.

@ConverJens
Copy link
Contributor Author

@varshaan Now I've managed to test this and unfortunately the issue seems to remain for me.
If I try with the nightly release 20210318 I get the following (new) error:

Traceback (most recent call last):
  File "/usr/lib/python3.7/runpy.py", line 193, in _run_module_as_main
    "__main__", mod_spec)
  File "/usr/lib/python3.7/runpy.py", line 85, in _run_code
    exec(code, run_globals)
  File "/root/pyenv/lib/python3.7/site-packages/tfx/orchestration/kubeflow/container_entrypoint.py", line 360, in <module>
    main()
  File "/root/pyenv/lib/python3.7/site-packages/tfx/orchestration/kubeflow/container_entrypoint.py", line 353, in main
    execution_info = launcher.launch()
  File "/root/pyenv/lib/python3.7/site-packages/tfx/orchestration/launcher/base_component_launcher.py", line 209, in launch
    copy.deepcopy(execution_decision.exec_properties))
  File "/root/pyenv/lib/python3.7/site-packages/tfx/orchestration/launcher/in_process_component_launcher.py", line 72, in _run_executor
    copy.deepcopy(input_dict), output_dict, copy.deepcopy(exec_properties))
  File "/root/pyenv/lib/python3.7/site-packages/tfx/components/transform/executor.py", line 486, in Do
    self.Transform(label_inputs, label_outputs, status_file)
  File "/root/pyenv/lib/python3.7/site-packages/tfx/components/transform/executor.py", line 1009, in Transform
    len(analyze_data_paths))
  File "/root/pyenv/lib/python3.7/site-packages/tfx/components/transform/executor.py", line 1143, in _RunBeamImpl
    preprocessing_fn, pipeline=pipeline))
  File "/root/pyenv/lib/python3.7/site-packages/apache_beam/transforms/ptransform.py", line 1058, in __ror__
    return self.transform.__ror__(pvalueish, self.label)
  File "/root/pyenv/lib/python3.7/site-packages/apache_beam/transforms/ptransform.py", line 573, in __ror__
    result = p.apply(self, pvalueish, label)
  File "/root/pyenv/lib/python3.7/site-packages/apache_beam/pipeline.py", line 646, in apply
    return self.apply(transform, pvalueish)
  File "/root/pyenv/lib/python3.7/site-packages/apache_beam/pipeline.py", line 689, in apply
    pvalueish_result = self.runner.apply(transform, pvalueish, self._options)
  File "/root/pyenv/lib/python3.7/site-packages/apache_beam/runners/runner.py", line 188, in apply
    return m(transform, input, options)
  File "/root/pyenv/lib/python3.7/site-packages/apache_beam/runners/runner.py", line 218, in apply_PTransform
    return transform.expand(input)
  File "/root/pyenv/lib/python3.7/site-packages/tensorflow_transform/beam/impl.py", line 1173, in expand
    self).expand(self._make_parent_dataset(dataset))
  File "/root/pyenv/lib/python3.7/site-packages/tensorflow_transform/beam/impl.py", line 1120, in expand
    evaluate_schema_overrides=False)
  File "/root/pyenv/lib/python3.7/site-packages/tensorflow_transform/schema_inference.py", line 197, in infer_feature_schema_v2
    metadata = collections.defaultdict(list, optimized_concrete_fn())
  File "/root/pyenv/lib/python3.7/site-packages/tensorflow/python/eager/function.py", line 1669, in __call__
    return self._call_impl(args, kwargs)
  File "/root/pyenv/lib/python3.7/site-packages/tensorflow/python/eager/wrap_function.py", line 247, in _call_impl
    args, kwargs, cancellation_manager)
  File "/root/pyenv/lib/python3.7/site-packages/tensorflow/python/eager/function.py", line 1687, in _call_impl
    return self._call_with_flat_signature(args, kwargs, cancellation_manager)
  File "/root/pyenv/lib/python3.7/site-packages/tensorflow/python/eager/function.py", line 1736, in _call_with_flat_signature
    return self._call_flat(args, self.captured_inputs, cancellation_manager)
  File "/root/pyenv/lib/python3.7/site-packages/tensorflow/python/eager/function.py", line 1919, in _call_flat
    ctx, args, cancellation_manager=cancellation_manager))
  File "/root/pyenv/lib/python3.7/site-packages/tensorflow/python/eager/function.py", line 560, in call
    ctx=ctx)
  File "/root/pyenv/lib/python3.7/site-packages/tensorflow/python/eager/execute.py", line 60, in quick_execute
    inputs, attrs, num_outputs)
tensorflow.python.framework.errors_impl.InvalidArgumentError:  StringToNumberOp could not correctly convert string: 
	 [[node StringToNumber (defined at root/pyenv/lib/python3.7/site-packages/tensorflow_transform/saved/saved_transform_io_v2.py:430) ]] [Op:__inference_pruned_2112]
Function call stack:
pruned

And if I run the exact same code and data with 0.29.0rc0 I get an error that seems very similar to the one I first posted:

Traceback (most recent call last):
  File "/usr/lib/python3.7/runpy.py", line 193, in _run_module_as_main
    "__main__", mod_spec)
  File "/usr/lib/python3.7/runpy.py", line 85, in _run_code
    exec(code, run_globals)
  File "/root/pyenv/lib/python3.7/site-packages/tfx/orchestration/kubeflow/container_entrypoint.py", line 364, in <module>
    main()
  File "/root/pyenv/lib/python3.7/site-packages/tfx/orchestration/kubeflow/container_entrypoint.py", line 357, in main
    execution_info = launcher.launch()
  File "/root/pyenv/lib/python3.7/site-packages/tfx/orchestration/launcher/base_component_launcher.py", line 209, in launch
    copy.deepcopy(execution_decision.exec_properties))
  File "/root/pyenv/lib/python3.7/site-packages/tfx/orchestration/launcher/in_process_component_launcher.py", line 72, in _run_executor
    copy.deepcopy(input_dict), output_dict, copy.deepcopy(exec_properties))
  File "/root/pyenv/lib/python3.7/site-packages/tfx/components/transform/executor.py", line 466, in Do
    self.Transform(label_inputs, label_outputs, status_file)
  File "/root/pyenv/lib/python3.7/site-packages/tfx/components/transform/executor.py", line 985, in Transform
    len(analyze_data_paths))
  File "/root/pyenv/lib/python3.7/site-packages/tfx/components/transform/executor.py", line 1119, in _RunBeamImpl
    preprocessing_fn, pipeline=pipeline))
  File "/root/pyenv/lib/python3.7/site-packages/apache_beam/transforms/ptransform.py", line 1058, in __ror__
    return self.transform.__ror__(pvalueish, self.label)
  File "/root/pyenv/lib/python3.7/site-packages/apache_beam/transforms/ptransform.py", line 573, in __ror__
    result = p.apply(self, pvalueish, label)
  File "/root/pyenv/lib/python3.7/site-packages/apache_beam/pipeline.py", line 646, in apply
    return self.apply(transform, pvalueish)
  File "/root/pyenv/lib/python3.7/site-packages/apache_beam/pipeline.py", line 689, in apply
    pvalueish_result = self.runner.apply(transform, pvalueish, self._options)
  File "/root/pyenv/lib/python3.7/site-packages/apache_beam/runners/runner.py", line 188, in apply
    return m(transform, input, options)
  File "/root/pyenv/lib/python3.7/site-packages/apache_beam/runners/runner.py", line 218, in apply_PTransform
    return transform.expand(input)
  File "/root/pyenv/lib/python3.7/site-packages/tensorflow_transform/beam/impl.py", line 1178, in expand
    self).expand(self._make_parent_dataset(dataset))
  File "/root/pyenv/lib/python3.7/site-packages/tensorflow_transform/beam/impl.py", line 1125, in expand
    evaluate_schema_overrides=False)
  File "/root/pyenv/lib/python3.7/site-packages/tensorflow_transform/schema_inference.py", line 197, in infer_feature_schema_v2
    metadata = collections.defaultdict(list, optimized_concrete_fn())
  File "/root/pyenv/lib/python3.7/site-packages/tensorflow/python/eager/function.py", line 1669, in __call__
    return self._call_impl(args, kwargs)
  File "/root/pyenv/lib/python3.7/site-packages/tensorflow/python/eager/wrap_function.py", line 247, in _call_impl
    args, kwargs, cancellation_manager)
  File "/root/pyenv/lib/python3.7/site-packages/tensorflow/python/eager/function.py", line 1687, in _call_impl
    return self._call_with_flat_signature(args, kwargs, cancellation_manager)
  File "/root/pyenv/lib/python3.7/site-packages/tensorflow/python/eager/function.py", line 1736, in _call_with_flat_signature
    return self._call_flat(args, self.captured_inputs, cancellation_manager)
  File "/root/pyenv/lib/python3.7/site-packages/tensorflow/python/eager/function.py", line 1919, in _call_flat
    ctx, args, cancellation_manager=cancellation_manager))
  File "/root/pyenv/lib/python3.7/site-packages/tensorflow/python/eager/function.py", line 560, in call
    ctx=ctx)
  File "/root/pyenv/lib/python3.7/site-packages/tensorflow/python/eager/execute.py", line 60, in quick_execute
    inputs, attrs, num_outputs)
tensorflow.python.framework.errors_impl.InvalidArgumentError:  pos 5 out of range for string b'' at index 0
	 [[node Substr_1 (defined at root/pyenv/lib/python3.7/site-packages/tensorflow_transform/saved/saved_transform_io_v2.py:430) ]] [Op:__inference_pruned_2112]
Function call stack:
pruned

I would be very appreciative if you could have another look at this!

@ConverJens
Copy link
Contributor Author

@varshaan Any update on this?

@varshaan
Copy link

Sorry, I missed your previous comment. Looking into this now.

@ConverJens
Copy link
Contributor Author

No worries, thanks for investigating!

@ConverJens
Copy link
Contributor Author

@varshaan Any update? Have you been able to reproduce?

@varshaan
Copy link

Yes, I was able to repro. I have a change in progress to address this. I will let you know once its submitted.

@ConverJens
Copy link
Contributor Author

Great, thanks!

@varshaan
Copy link

Just leaving an update. I'm still working on fully testing this as the change is a bit non-trivial. I will try and get it submitted soon.

@ConverJens
Copy link
Contributor Author

@varshaan Thanks for the update!

@ConverJens
Copy link
Contributor Author

@varshaan Any update on this? Has your fix made it into the 1.0.0rc1 release?

@axeltidemann
Copy link
Contributor

I have the exact same issue in the 1.0.0rc1 release, upgrading from 0.27.0.

@varshaan
Copy link

varshaan commented Jun 16, 2021

This commit is supposed to address this: tensorflow/transform@b878c66

It is not in the 1.0, but it should be in the next release. You can try testing with a nightly build post that commit if possible.

@ConverJens
Copy link
Contributor Author

@varshaan Transform nightly version 1.1.0.dev20210617 works for me! Great job!

@axeltidemann Do you have the possibility to test this as well?

@axeltidemann
Copy link
Contributor

Yes, this works for me using TFX 1.1.0.dev20210617.

@varshaan
Copy link

Thanks for verifying. I will close this issue.

@google-ml-butler
Copy link

Are you satisfied with the resolution of your issue?
Yes
No

@ConverJens
Copy link
Contributor Author

I believe this issue has resurfaced for me in TFX 1.3.0.

@axeltidemann Is it the same for you?

@varshaan Has there been an regression in the latest TFX releases?

@axeltidemann
Copy link
Contributor

@ConverJens I don't know for 1.3.0, I am on TFX 1.2.0 which does not have this issue.

@varshaan
Copy link

varshaan commented Nov 3, 2021

There should be no regression that I am aware of. The regression tests I added with the previous fix still pass. Could you provide a minimal repro so I can test it?

@ConverJens
Copy link
Contributor Author

@varshaan Thanks for the quick response! I will test it a bit more and, if the issue persists, provide you with a minimal example.

@ConverJens
Copy link
Contributor Author

@varshaan I've able to reproduce the issue now with TFX 1.4.0. It appears that the issue reappears when one sets disable_statistics=True in the Transform component. Otherwise it is exactly the same data and preprocessing logic as before.

I'm attaching a notebook (with data generation) which reproduces the issue locally for me.

transform_compat_tf1_issue.ipynb.zip

@varshaan varshaan reopened this Nov 29, 2021
@varshaan
Copy link

Thanks for the repro! I'll take a look at this and get back to you by end of week.

@varshaan
Copy link

varshaan commented Dec 6, 2021

This issue reproduces as far back as TFX 1.2 if disable_statistics=True, so this isn't a regression. The original fix missed something in this code path. Thanks for reporting, I will send out a fix for this and update here after that.

@ConverJens
Copy link
Contributor Author

@varshaan Great, thanks!

@ConverJens
Copy link
Contributor Author

@varshaan Do you have an update for this issue?

@varshaan
Copy link

Hi, yes, I debugged this further and while the error message is the same, the source of the errors is different. It'll take me a couple more weeks to figure out exactly how to address what's happening in the disable_statistics=True case. Sorry about that, but I hope to have a fix by end of this month.

@ConverJens
Copy link
Contributor Author

@varshaan No worries! Great work and thanks for the update!

@varshaan
Copy link

An update, I have a change in review to address this. Will comment back here once it is submitted.

@ConverJens
Copy link
Contributor Author

Great! I'll happily test it as soon as a nightly build with the fix is out.

@google-ml-butler
Copy link

Are you satisfied with the resolution of your issue?
Yes
No

@varshaan
Copy link

varshaan commented Feb 2, 2022

Hi, the commit to address this was submitted earlier today [1]. Feel free to re-open this issue if you find that that didn't solve your problem.

[1] tensorflow/transform@718e394

@ConverJens
Copy link
Contributor Author

@varshaan This issue now seems to be resolved. Thank you very much for your help!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

7 participants