Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

how to run gpu_implementation on GPU #26

Open
ZaneH1992 opened this issue Jun 2, 2019 · 3 comments
Open

how to run gpu_implementation on GPU #26

ZaneH1992 opened this issue Jun 2, 2019 · 3 comments

Comments

@ZaneH1992
Copy link

ZaneH1992 commented Jun 2, 2019

When I used nvidia 1080ti, I was able to compile gym_tensorflow.so and run the exp. The env is tensorflow-gpu 1.8.0 and cuda version is 9.0. But when I switch to 2080ti, the exp run into trouble as follow:

2019-06-02 18:03:10.727082: E tensorflow/stream_executor/cuda/cuda_blas.cc:654] failed to run cuBLAS routine cublasSgemmBatched: CUBLAS_STATUS_EXECUTION_FAILED
2019-06-02 18:03:10.727109: E tensorflow/stream_executor/cuda/cuda_blas.cc:2413] Internal: failed BLAS call, see log for details
Exception in thread Thread-1:
Traceback (most recent call last):
File "/usr/local/lib/python3.6/dist-packages/tensorflow/python/client/session.py", line 1322, in _do_call
return fn(*args)
File "/usr/local/lib/python3.6/dist-packages/tensorflow/python/client/session.py", line 1307, in _run_fn
options, feed_dict, fetch_list, target_list, run_metadata)
File "/usr/local/lib/python3.6/dist-packages/tensorflow/python/client/session.py", line 1409, in _call_tf_sessionrun
run_metadata)
tensorflow.python.framework.errors_impl.InternalError: Blas xGEMMBatched launch failed : a.shape=[1,441,256], b.shape=[64,256,16], idx.shape=[1], m=441, n=16, k=256, batch_size=1
[[Node: model/Model/conv1/IndexedBatchMatMul = IndexedBatchMatMul[T=DT_FLOAT, adj_x=false, adj_y=false, _device="/job:localhost/replica:0/task:0/device:GPU:0"](model/Model/conv1/Reshape_2, model/Model/conv1/Reshape, _arg_model/Placeholder_0_0/_37)]]
[[Node: model/Identity_1/_59 = _Recvclient_terminated=false, recv_device="/job:localhost/replica:0/task:0/device:CPU:0", send_device="/job:localhost/replica:0/task:0/device:GPU:0", send_device_incarnation=1, tensor_name="edge_210_model/Identity_1", tensor_type=DT_BOOL, _device="/job:localhost/replica:0/task:0/device:CPU:0"]]

During handling of the above exception, another exception occurred:

Traceback (most recent call last):
File "/usr/lib/python3.6/threading.py", line 916, in _bootstrap_inner
self.run()
File "/usr/lib/python3.6/threading.py", line 864, in run
self._target(*self._args, **self._kwargs)
File "/root/hz/deep-neuroevolution/gpu_implementation/neuroevolution/concurrent_worker.py", line 94, in _loop
rews, is_done, _ = self.sess.run([self.rew_op, self.done_op, self.incr_counter], {self.placeholder_indices: indices})
File "/usr/local/lib/python3.6/dist-packages/tensorflow/python/client/session.py", line 900, in run
run_metadata_ptr)

I tried to upgrade tensorflow-gpu version to 1. 12,1.13, but under that env , gym_tensorflow.so could not be compiled. I also found a similar issue in https://github.com/qqwweee/keras-yolo3/issues/332 , but still no luck after I take action to Install patchs for cuda9,
I wonder if there is a solution.

@fps7806
Copy link
Contributor

fps7806 commented Jun 3, 2019

I think 2080ti requires CUDA 10, and we haven't tested this code with CUDA 10 yet so there might be some problems.
I'll see if we can fix it and I'll let you know if we find a solution.

@ssrs5566
Copy link

ssrs5566 commented Jun 4, 2019

Great,thank you. When using cuda 10.0 and tensorflow 1.13.0, gym_tensorflow could not be compiled.

@deyandyankov
Copy link

any news on this? I am hitting the same issue using Tesla T4 and various versions of tensorflow-gpu. So far I have only tried CUDA 9.0 though.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

4 participants