Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

NotImplementedError: unable to open file: libtensorflow_io.so on a fresh GCP Deep Learning VM with TensorFlow 2.3 #1100

Closed
carlthome opened this issue Aug 28, 2020 · 7 comments
Assignees

Comments

@carlthome
Copy link

I just setup a GCE instance based on the GCP Deep Learning images, particularly this one:

======================================
Welcome to the Google Deep Learning VM
======================================

Version: tf2-2-3-gpu.2-3.m55
Based on: Debian GNU/Linux 9.13 (stretch) (GNU/Linux 4.9.0-13-amd64 x86_64\n)

which comes with TF 2.3 and TFIO 0.15

(base) carl@carl-gpu:~$ pip show tensorflow-io
Name: tensorflow-io
Version: 0.15.0
Summary: TensorFlow IO
Home-page: https://github.com/tensorflow/io
Author: Google Inc.
Author-email: opensource@google.com
License: UNKNOWN
Location: /opt/conda/lib/python3.7/site-packages
Requires: tensorflow
Required-by: 

(base) carl@carl-gpu:~$ pip show tensorflow
Name: tensorflow
Version: 2.3.0
Summary: TensorFlow is an open source machine learning framework for everyone.
Home-page: https://www.tensorflow.org/
Author: Google Inc.
Author-email: packages@tensorflow.org
License: Apache 2.0
Location: /opt/conda/lib/python3.7/site-packages
Requires: wheel, opt-einsum, tensorboard, h5py, wrapt, absl-py, astunparse, keras-preprocessing, tensorflow-estimator, numpy, six, protobuf, termcolor, google-pasta, grpcio, gast, scipy
Required-by: witwidget, tfx, tfx-bsl, tensorflow-transform, tensorflow-serving-api, tensorflow-model-analysis, tensorflow-io, tensorflow-enterprise-addons, tensorflow-data-validation, ml-metadata, Keras

but when I try to import tensorflow-io I get a low-level error with something missing in a TFIO specific shared library, and I'm a bit lost at this stage. I assume the intention is that these images should just work out of the box, but doesn't look like they do. TensorFlow core works fine.

(base) carl@carl-gpu:~$ ipython
Python 3.7.8 | packaged by conda-forge | (default, Jul 31 2020, 02:25:08) 
Type 'copyright', 'credits' or 'license' for more information
IPython 7.17.0 -- An enhanced Interactive Python. Type '?' for help.

In [1]: import tensorflow as tf; tf.config.list_physical_devices('GPU')
2020-08-28 09:41:20.126173: I tensorflow/stream_executor/platform/default/dso_loader.cc:48] Successfully opened dynamic library libcudart.so.11.0
2020-08-28 09:41:21.640498: I tensorflow/stream_executor/platform/default/dso_loader.cc:48] Successfully opened dynamic library libcuda.so.1
2020-08-28 09:41:22.414635: I tensorflow/stream_executor/cuda/cuda_gpu_executor.cc:982] successful NUMA node read from SysFS had negative value (-1), but there must be at least one NUMA node, so returning NUMA node zero
2020-08-28 09:41:22.415310: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1716] Found device 0 with properties: 
pciBusID: 0000:00:04.0 name: Tesla T4 computeCapability: 7.5
coreClock: 1.59GHz coreCount: 40 deviceMemorySize: 14.75GiB deviceMemoryBandwidth: 298.08GiB/s
2020-08-28 09:41:22.415346: I tensorflow/stream_executor/platform/default/dso_loader.cc:48] Successfully opened dynamic library libcudart.so.11.0
2020-08-28 09:41:22.418158: I tensorflow/stream_executor/platform/default/dso_loader.cc:48] Successfully opened dynamic library libcublas.so.11
2020-08-28 09:41:22.419414: I tensorflow/stream_executor/platform/default/dso_loader.cc:48] Successfully opened dynamic library libcufft.so.10
2020-08-28 09:41:22.419721: I tensorflow/stream_executor/platform/default/dso_loader.cc:48] Successfully opened dynamic library libcurand.so.10
2020-08-28 09:41:22.422613: I tensorflow/stream_executor/platform/default/dso_loader.cc:48] Successfully opened dynamic library libcusolver.so.10
2020-08-28 09:41:22.423358: I tensorflow/stream_executor/platform/default/dso_loader.cc:48] Successfully opened dynamic library libcusparse.so.11
2020-08-28 09:41:22.423510: I tensorflow/stream_executor/platform/default/dso_loader.cc:48] Successfully opened dynamic library libcudnn.so.8
2020-08-28 09:41:22.423620: I tensorflow/stream_executor/cuda/cuda_gpu_executor.cc:982] successful NUMA node read from SysFS had negative value (-1), but there must be at least one NUMA node, so returning NUMA node zero
2020-08-28 09:41:22.424274: I tensorflow/stream_executor/cuda/cuda_gpu_executor.cc:982] successful NUMA node read from SysFS had negative value (-1), but there must be at least one NUMA node, so returning NUMA node zero
2020-08-28 09:41:22.424860: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1858] Adding visible gpu devices: 0
Out[1]: [PhysicalDevice(name='/physical_device:GPU:0', device_type='GPU')]

In [2]: import tensorflow_io as tfio
---------------------------------------------------------------------------
NotImplementedError                       Traceback (most recent call last)
<ipython-input-2-30740b9d9b9a> in <module>
----> 1 import tensorflow_io as tfio

/opt/conda/lib/python3.7/site-packages/tensorflow_io/__init__.py in <module>
     15 """tensorflow_io"""
     16 
---> 17 from tensorflow_io.core.python.api.v0 import *  # pylint: disable=wildcard-import
     18 from tensorflow_io.core.python.api.version import VERSION as __version__
     19 

/opt/conda/lib/python3.7/site-packages/tensorflow_io/core/python/api/v0/__init__.py in <module>
     16 
     17 # tensorflow_io.core.python.ops is implicitly imported (along with file system)
---> 18 from tensorflow_io.core.python.ops.io_dataset import IODataset
     19 from tensorflow_io.core.python.ops.io_tensor import IOTensor
     20 

/opt/conda/lib/python3.7/site-packages/tensorflow_io/core/python/ops/__init__.py in <module>
     69 
     70 
---> 71 core_ops = _load_library("libtensorflow_io.so")

/opt/conda/lib/python3.7/site-packages/tensorflow_io/core/python/ops/__init__.py in _load_library(filename, lib)
     65     raise NotImplementedError(
     66         "unable to open file: "
---> 67         + "{}, from paths: {}\ncaused by: {}".format(filename, filenames, errs)
     68     )
     69 

NotImplementedError: unable to open file: libtensorflow_io.so, from paths: ['/opt/conda/lib/python3.7/site-packages/tensorflow_io/core/python/ops/libtensorflow_io.so']
caused by: ['/opt/conda/lib/python3.7/site-packages/tensorflow_io/core/python/ops/libtensorflow_io.so: undefined symbol: _ZN10tensorflow4data15DatasetOpKernel11TraceStringEPNS_15OpKernelContextEb']
@yongtang
Copy link
Member

@vlasenkoalexey Wondering if you know more about GCP VM. Is GCP VM's tensorflow installation the same as the open source tensorflow release (https://github.com/tensorflow/tensorflow/releases/tag/v2.3.0)?

@vlasenkoalexey
Copy link
Contributor

On GCP VMs we install stock TF.IO, but recompile TF from sources. Still they should be ABI compatible.
I'll follow up on this.

@vlasenkoalexey vlasenkoalexey self-assigned this Sep 8, 2020
@vlasenkoalexey
Copy link
Contributor

vlasenkoalexey commented Sep 8, 2020

Turned out that DLVM 2.3 image and container have tensorflow-io 0.14.0 installed by mistake.
The fix is to upgrade tensorflow-io to 0.15.0, see https://github.com/vlasenkoalexey/bigquery_perftest#tfe-23--fixed-tfio-dependency and corresponding Dockerfile.

Updated DLVM 2.3 image and container will be released soon.

@vlasenkoalexey
Copy link
Contributor

Actually there is another issue, updating tensorflow-io is only going to work in CPU image/container, GPU is still broken. Will follow up.

@vlasenkoalexey
Copy link
Contributor

Obvious workaround is to uninstall tensorflow and tensorflow-io and install them from pip:
pip uninstall tensorflow
pip uninstall tensorflow-io
pip install tensorflow-gpu
pip install --no-deps tensorflow-io
Tested that it works in container, should work for image as well.

@YuxuanChen
Copy link

YuxuanChen commented Sep 24, 2020

Hi carlthome,

Thank you for notifying us of this bug. We've released a new version of DLVMs / DL containers that should contain a fix.

To get the new version, you can recreate the notebook using the latest DLVM or DL container, and don't hesitate to let us know if the problems persists or if you run into any new bugs.

@vlasenkoalexey
Copy link
Contributor

Confirmed that it works with containers now, images should be fixed as well. Therefore closing this issue.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

4 participants