Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

TensorFlow 1.13 support #56

Merged
merged 4 commits into from
Feb 20, 2019
Merged

TensorFlow 1.13 support #56

merged 4 commits into from
Feb 20, 2019

Conversation

yongtang
Copy link
Member

This PR tries to add TensorFlow 1.13 support. As 1.13 has not been released yet, this PR is a working in progress until the release of 1.13.

Since 1.13 have breaking changes for Dataset API, this PR can only be merged with 1.13 released.

This PR fixes #54.

@yongtang
Copy link
Member Author

Looks like almost all tests passed. Though Travis CI's python 3.6 will crash when import tf-nightly.

We could wait until tf 1.13rc is out to take another look. We also could decide which version or platform we plan to support.

@yongtang
Copy link
Member Author

When installing the generated whl files, the .so encountered an issue of:

_video_ops_libav_9.20.so: undefined symbol: _ZN10tensorflow14kernel_factory17OpKernelRegistrar12InitInternalEPKNS_9KernelDefEN4absl11string_viewESt10unique_ptrINS0_15OpKernelFactoryESt14default_deleteIS8_EE

Need some further investigation.

@yongtang
Copy link
Member Author

Think I found where undefined symbol comes from. In setup.py the requirement is tensorflow, not tf-nightly. So even if we install tf-nightly it will still reinstall tensorflow. That will cause the api mismatch.

Change setup.py to use tf-nightly will fix the issue.

@@ -48,7 +49,7 @@ def test_sequence_file_dataset(self):

dataset = hadoop_dataset_ops.SequenceFileDataset(filenames).repeat(
num_repeats)
iterator = dataset.make_initializable_iterator()
iterator = dataset_ops.make_initializable_iterator(dataset)
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I guess it would be better to use tf.compat.v1.data.make_one_shot_iterator(dataset) as it's mentioned in https://github.com/tensorflow/tensorflow/blob/master/tensorflow/python/data/ops/dataset_ops.py#L1358. What do you think?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks @dmitrievanthony. Updated and using tensorflow.compat.v1.data now.

@@ -66,7 +67,7 @@ def _check_dataset(self, dataset):
self.assertEqual(dtypes.string, dataset.output_types["val"]["NAME"])
self.assertEqual(dtypes.int64, dataset.output_types["val"]["VAL"])

it = dataset.make_one_shot_iterator()
it = dataset_ops.make_one_shot_iterator(dataset)
Copy link
Collaborator

@dmitrievanthony dmitrievanthony Jan 21, 2019

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Would be great to update README.md (in ignite module and in the root folder) as well. We can do it in the same PR (you can take this version as a reference) or I can do it in a separate PR.

@BryanCutler
Copy link
Member

@yongtang I figured out the issue with the ArrowDatasets and will put up the fix soon

@yongtang
Copy link
Member Author

I think we are almost complete for 1.13 support. Unfortunately R may need to be updated, but this could be done later.

Also this PR will not be merged until TensorFlow 1.13 is released (At the moment only 1.13rc0). I assume it will happen in several weeks or a month or two.

@yongtang
Copy link
Member Author

Also cc @terrytangyuan @BryanCutler FYI.

@yongtang
Copy link
Member Author

yongtang commented Feb 7, 2019

I am continuing to play with tf 1.12.0 vs. 1.13.0. Noticed that the C++ binaries build against 1.12.0 will not work with 1.13.0, as the following has changed:

# 1.12.0:
tensorflow::kernel_factory::OpKernelRegistrar::InitInternal( \
  tensorflow::KernelDef const*, absl::string_view, \
  tensorflow::OpKernel* (*)(tensorflow::OpKernelConstruction*))
# 1.13.0
tensorflow::kernel_factory::OpKernelRegistrar::InitInternal( \
  tensorflow::KernelDef const*, absl::string_view, \
  std::unique_ptr<tensorflow::kernel_factory::OpKernelFactory, \
  std::default_delete<tensorflow::kernel_factory::OpKernelFactory> >)

That means we do have to release a binary against 1.13.0 even if we use tensorflow's public endpoint Python API.

@terrytangyuan
Copy link
Member

terrytangyuan commented Feb 7, 2019

The class name has changed. We probably need to update this line: https://github.com/tensorflow/io/blob/master/R-package/R/dataset_utils.R#L10

Try change it to something like:

inherits(x, c("tensorflow.python.data.ops.dataset_ops.Dataset", "tensorflow.compat.v1.data.Dataset")) || is_tfio_dataset(x)

If this doesn't work, temporary solution is to have that is_dataset() function return TRUE to unblock this PR and I'll look into it later:

is_dataset <- function (x) {
  # inherits(x, "tensorflow.python.data.ops.dataset_ops.Dataset") || is_tfio_dataset(x)
  TRUE
}

Copy link
Member

@BryanCutler BryanCutler left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM, just one minor comment

self._output_types = output_types
self._output_shapes = output_shapes or \
nest.map_structure(
lambda _: tensorflow.TensorShape(None), self._output_types)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I could be wrong, but I don't think these need to be initialized here and could still be in the base class constructor, even though the call needs to be last?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@BryanCutler I just updated the PR and removed unneeded initialization (other than one place which is needed).

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks, looks good!

@yongtang
Copy link
Member Author

yongtang commented Feb 7, 2019

Thanks @terrytangyuan. I added the follwoing

inherits(x, c("tensorflow.python.data.ops.dataset_ops.Dataset", "tensorflow.python.data.ops.dataset_ops.DatasetV1", "tensorflow.python.data.ops.dataset_ops.DatasetV2")) || is_tfio_dataset(x)

and verified locally that the above check passes. However, the program still throw out Provided dataset is not a TensorFlow Dataset error even if I delete them completely.

I would guess that is because tfdatasets also performs the check:
https://github.com/rstudio/tfdatasets/blob/2af1c0fb6753fc5c6178dd0981125ff100761fc0/R/utils.R#L70-L71

Since last tfdatasets was released several months ago, I think tfdatasets may need to be updated as well to fix the R issue?

@BryanCutler
Copy link
Member

Thanks for looking into this @yongtang !

@terrytangyuan
Copy link
Member

@yongtang I doubt that it's related to tfdatasets since we are using our own is_dataset(). If the fix doesn't work, feel free to have it return TRUE and track this in a separate issue. I'll help look into further later. As long as class(dataset) <- c("tf_dataset", class(dataset)) logic is happening in as_tf_dataset(), this should be safe. This function is internal and for package developer only so I don't expect that random objects will be passed into as_tf_dataset().

@yongtang
Copy link
Member Author

yongtang commented Feb 8, 2019

@terrytangyuan I tried locally with TRUE and still it thrown out error. On the other hand, if I install tfdatasets from the GitHub (master branch HEAD):

R -e 'library("devtools"); install_github("rstudio/tfdatasets", ref="c6fc59b")'

Then it seems the tests are passing. Wondering if this helps.

@terrytangyuan
Copy link
Member

terrytangyuan commented Feb 8, 2019

@yongtang I see. Then it might be using the is_dataset() from tfdatasets. I’ll double check on this. In the meantime, let’s just use your fix for now. Thanks for looking into this!

Copy link
Collaborator

@dmitrievanthony dmitrievanthony left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM, only a comment regarding tensorflow version in REQUIRED_PACKAGES.

setup.py Outdated
REQUIRED_PACKAGES = [
'tensorflow == 1.12.0',
'tensorflow >= 1.13.0rc0',
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Does it makes sense to add upper bound as well?

@dmitrievanthony
Copy link
Collaborator

BTW, @yongtang are you planning to port these changes into tensorflow/contrib?

I see that last changes I made to recover ignite and kafka functionality in tensorflow were cherry-picked into 1.13 and so that 1.13.0rc0 works well. Unfortunately, it looks like folks changed underlying API again and our code is broken in 1.13.0rc1. I could fix this issue ad-hock, but we could also cherry-pick changes from this request into tensorflow/contrib also. What will be the best chose, what do you think?

@yongtang
Copy link
Member Author

yongtang commented Feb 8, 2019

@dmitrievanthony My understanding is that tensorflow 1.13 will likely be the last release before 2.0. And starting from tensorflow 2.0, the contrib directory will be deprecated and removed from tensorflow repo.

In that case, an ad-hoc fix on tensorflow/contrib to r1.13 branch directly (so that some ops are working fine) might be an easier path.

/cc @martinwicke @ewilderj to comment on the timeline of 1.13 and 2.0.

@martinwicke
Copy link
Member

Sadly, it's very likely we'll have at least one more 1.x release before 2.0 actually lands (outside of pre-releases).

@yongtang
Copy link
Member Author

Both TF 1.13.0rc2 and TF 2.0 (tf-nightly-2.0-preview) build with tensorflow-io succeed now. Think we are ready for 1.13 and 2.0.

Note:

  1. This PR is on top of another PR Travis CI Enhancement #94
  2. The code base is compatible with 1.13 and 2.0. However,
  3. the binary built with TF 1.13.0rc2 is not compatible with TF 2.0 (C++ API issue?), see TensorFlow 2.0 support #55 (comment)

I think the plan will be to rebuild another version of tensorflow-io when 2.0 is released.

Signed-off-by: Yong Tang <yong.tang.github@outlook.com>
Signed-off-by: Yong Tang <yong.tang.github@outlook.com>
Signed-off-by: Yong Tang <yong.tang.github@outlook.com>

Use GitHub version of tfdataset in R

Signed-off-by: Yong Tang <yong.tang.github@outlook.com>
Signed-off-by: Yong Tang <yong.tang.github@outlook.com>
@yongtang
Copy link
Member Author

Will merge this PR in for now. We want to start testing tf 1.13 now (as tf 1.13 release is imminent).

I will create a follow up PR to change the pip requirement to tensorflow>=1.13.0, < 1.14.0 once TF 1.13 is released.

@yongtang yongtang merged commit 02d4cb0 into tensorflow:master Feb 20, 2019
@yongtang yongtang deleted the 1.13 branch February 20, 2019 16:49
i-ony pushed a commit to i-ony/io that referenced this pull request Feb 8, 2021
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

TensorFlow 1.13 support
5 participants