Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

XLA: Could not open input file: Is a directory #8947

Closed
Earthson opened this issue Apr 4, 2017 · 7 comments
Closed

XLA: Could not open input file: Is a directory #8947

Earthson opened this issue Apr 4, 2017 · 7 comments
Labels
stat:awaiting response Status - Awaiting response from author

Comments

@Earthson
Copy link
Contributor

Earthson commented Apr 4, 2017

XLA failed with Could not open input file: Is a directory

Environment info

Operating System: Ubuntu 16.04

Installed version of CUDA and cuDNN:
(please attach the output of ls -l /path/to/cuda/lib/libcud*):

.opt/anaconda/lib64/libcudadevrt.a
.opt/anaconda/lib64/libcudart.so
.opt/anaconda/lib64/libcudart.so.8.0
.opt/anaconda/lib64/libcudart.so.8.0.61
.opt/anaconda/lib64/libcudart_static.a
.opt/anaconda/lib64/libcudnn.so
.opt/anaconda/lib64/libcudnn.so.6
.opt/anaconda/lib64/libcudnn.so.6.0.20
.opt/anaconda/lib64/libcudnn_static.a

code setup

If installed from binary pip package, provide:

  1. http://q-phantom.com/conda/linux-64/tensorflow-1.1.0rc0-py36_3.tar.bz2
  2. 1.1.0-rc0

code init

import os
os.environ["CUDA_VISIBLE_DEVICES"]="0"
tf.reset_default_graph()
tl.layers.set_name_reuse(True)
placehold_mapping, networks = c_network(None, label_indices=label_index, feature_indices=feature_index)
network = networks[0]
config = tf.ConfigProto()
config.graph_options.optimizer_options.global_jit_level = tf.OptimizerOptions.ON_1
sess = tf.Session(config=config)
tl.layers.initialize_global_variables(sess)

Log

2017-04-04 16:26:48.275644: I tensorflow/core/common_runtime/gpu/gpu_device.cc:908] DMA: 0
2017-04-04 16:26:48.275648: I tensorflow/core/common_runtime/gpu/gpu_device.cc:918] 0:   Y
2017-04-04 16:26:48.275653: I tensorflow/core/common_runtime/gpu/gpu_device.cc:977] Creating TensorFlow device (/gpu:0) -> (device: 0, name: GeForce GTX 1080, pci bus id: 0000:0a:00.0)
2017-04-04 16:26:48.479102: I tensorflow/compiler/xla/service/platform_util.cc:58] platform CUDA present with 1 visible devices
2017-04-04 16:26:48.479122: I tensorflow/compiler/xla/service/platform_util.cc:58] platform Host present with 16 visible devices
2017-04-04 16:26:48.481008: I tensorflow/compiler/xla/service/service.cc:183] XLA service 0x5dd4360 executing computations on platform Host. Devices:
2017-04-04 16:26:48.481021: I tensorflow/compiler/xla/service/service.cc:191]   StreamExecutor device (0): <undefined>, <undefined>
2017-04-04 16:26:48.481138: I tensorflow/compiler/xla/service/platform_util.cc:58] platform CUDA present with 1 visible devices
2017-04-04 16:26:48.481146: I tensorflow/compiler/xla/service/platform_util.cc:58] platform Host present with 16 visible devices
2017-04-04 16:26:48.482239: I tensorflow/compiler/xla/service/service.cc:183] XLA service 0x5f0f950 executing computations on platform CUDA. Devices:
2017-04-04 16:26:48.482248: I tensorflow/compiler/xla/service/service.cc:191]   StreamExecutor device (0): GeForce GTX 1080, Compute Capability 6.1
GEN DATASET: 0.00 seconds elapsed
ROUND:  0
2017-04-04 16:26:57.149563: F tensorflow/compiler/xla/service/gpu/llvm_gpu_backend/utils.cc:31] -1:-1: Could not open input file: Is a directory
@gunan
Copy link
Contributor

gunan commented Apr 4, 2017

This is a package we do not maintain.
I am not sure when it was built, and which commit it was synced to.
I would recommend reaching out to the contributors of the package you installed.

Either they can escalate to us whith information about how they built the package, or you can then share the information you got from the package maintainers with us.

@Earthson
Copy link
Contributor Author

Earthson commented Apr 6, 2017

the package is built using bazel.

TF_ROOT_DIR=$HOME/git/tensorflow

mkdir -p $HOME/git

if [ -d $TF_ROOT_DIR ]; then
  cd $TF_ROOT_DIR
  git pull
else
  cd $HOME/git
  git clone https://github.com/tensorflow/tensorflow
  cd $TF_ROOT_DIR
fi

git checkout r1.1

echo  $PREFIX

bazel clean
echo $PYTHON_BIN_PATH
PYTHON_BIN_PATH=$(which python) \
PYTHON_LIB_PATH=$PREFIX/lib/python3.6/site-packages \
TF_NEED_MKL=1 \
MKL_INSTALL_PATH=$PREFIX \
CC_OPT_FLAGS="-march=native" \
TF_NEED_JEMALLOC=1 \
TF_NEED_GCP=0 \
TF_NEED_HDFS=0 \
TF_ENABLE_XLA=1 \
TF_NEED_OPENCL=0 \
TF_NEED_CUDA=1 \
GCC_HOST_COMPILER_PATH=$(which gcc) \
TF_CUDA_VERSION="8.0" \
CUDA_TOOLKIT_PATH=$PREFIX \
TF_CUDNN_VERSION=6 \
CUDNN_INSTALL_PATH=$PREFIX \
TF_CUDA_COMPUTE_CAPABILITIES=6.1 \
./configure

bazel build -c opt --copt=-march=native --copt=-mavx --copt=-mavx2 --copt=-mfma --copt=-mfpmath=both --copt=-msse4.2 --config=cuda -k  //tensorflow/tools/pip_package:build_pip_package
rm -rf /tmp/tensorflow_pkg
mkdir -p /tmp/tensorflow_pkg
bazel-bin/tensorflow/tools/pip_package/build_pip_package /tmp/tensorflow_pkg
pip install $(ls /tmp/tensorflow_pkg/tensorflow*)

@gunan
Copy link
Contributor

gunan commented Apr 6, 2017

Thanks for the information.
@tatatodd @hawkinsp is this an issue we have seen before?

@av8ramit Looks like this is an issue in the release branch. We may think about a cherrypick based on the investigation.

@tatatodd
Copy link
Contributor

tatatodd commented Apr 6, 2017

No, (at least) I have never seen this issue before.

Note that the code example is setting CUDA_VISIBLE_DEVICES=0, and then enabling session-level JIT. Enabling session-level JIT only supports GPU, as explained here (in the starred blue box):
https://www.tensorflow.org/performance/xla/jit#turning_on_jit_compilation

It would be nice to not return a cryptic error, but at a high-level, setting CUDA_VISIBLE_DEVICES=0 and then enabling session-level JIT is at best not going to turn XLA on anyways. I'll advise against doing this.

@tatatodd
Copy link
Contributor

tatatodd commented Apr 6, 2017

Oops, sorry, brain freeze. I just realized CUDA_VISIBLE_DEVICES=0 is selecting the 0th device, and the logs show it is being detected.

So my response is back to - "no, I've never seen this, we should probably debug".

@asimshankar
Copy link
Contributor

I suspect it's because TensorFlow cannot find the CUDA libraries, though I'm not sure since I'm not sure what $PREFIX is in your snippet above. To confirm, try running the program after setting the TF_CPP_MIN_VLOG_LEVEL environment variable to 1 before starting Python.

In particular, I'm interested in the log messages from gpu_backend_lib.cc, that might help figure out which file it's trying to load (and failing on)

@asimshankar asimshankar added the stat:awaiting response Status - Awaiting response from author label Apr 8, 2017
drpngx pushed a commit to drpngx/tensorflow that referenced this issue Apr 11, 2017
Will help with issues like tensorflow#8947
Change: 152733558
@asimshankar
Copy link
Contributor

Closing due to inactivity. If you're still running into this, please feel free to file an updated issue (including any output from suggestions above). Thanks!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
stat:awaiting response Status - Awaiting response from author
Projects
None yet
Development

No branches or pull requests

4 participants