Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Can't allocate memory from GPU error in digits.py #2219

Closed
neurotenguin opened this issue May 4, 2016 · 3 comments
Closed

Can't allocate memory from GPU error in digits.py #2219

neurotenguin opened this issue May 4, 2016 · 3 comments

Comments

@neurotenguin
Copy link

neurotenguin commented May 4, 2016

Environment info

Operating System: 16.04

Installed version of CUDA and cuDNN:
(please attach the output of ls -l /path/to/cuda/lib/libcud*):

-rw-r--r--   1 root root    322936 Eyl 19  2015 libcudadevrt.a
lrwxrwxrwx   1 root root        16 Mar 30 15:25 libcudart.so -> libcudart.so.7.5
lrwxrwxrwx   1 root root        19 Mar 30 15:25 libcudart.so.7.5 -> libcudart.so.7.5.18
-rw-r--r--   1 root root    383336 Eyl 19  2015 libcudart.so.7.5.18
-rw-r--r--   1 root root    720192 Eyl 19  2015 libcudart_static.a
lrwxrwxrwx   1 root root        12 Nis 14 18:53 libcuda.so -> libcuda.so.1
lrwxrwxrwx   1 root root        17 Nis 14 18:53 libcuda.so.1 -> libcuda.so.361.42
-rw-r--r--   1 root root  16881416 Mar 23 02:42 libcuda.so.361.42
-rwxr-xr-x   1 root root  61453024 Nis 30 11:36 libcudnn.so
-rwxr-xr-x   1 root root  61453024 Nis 30 11:36 libcudnn.so.4
-rwxr-xr-x   1 root root  61453024 Nis 30 11:36 libcudnn.so.4.0.7
-rwxr-xr-x   1 root root  59823168 Nis 30 11:12 libcudnn.so.5
-rwxr-xr-x   1 root root  59823168 Nis 30 11:12 libcudnn.so.5.0.4
-rw-r--r--   1 root root  62025862 Nis 30 11:36 libcudnn_static.a

Installed using:

sudo pip install --upgrade https://storage.googleapis.com/tensorflow/linux/gpu/tensorflow-0.8.0-cp27-none-linux_x86_64.whl```
I tensorflow/stream_executor/dso_loader.cc:105] successfully opened CUDA library libcublas.so locally
I tensorflow/stream_executor/dso_loader.cc:105] successfully opened CUDA library libcudnn.so locally
I tensorflow/stream_executor/dso_loader.cc:105] successfully opened CUDA library libcufft.so locally
I tensorflow/stream_executor/dso_loader.cc:105] successfully opened CUDA library libcuda.so.1 locally
I tensorflow/stream_executor/dso_loader.cc:105] successfully opened CUDA library libcurand.so locally
0.8.0

Steps to reproduce

(tensorflow)username@pcname:~/Research/ai/tf_examples$ python digits.py 
I tensorflow/stream_executor/dso_loader.cc:105] successfully opened CUDA library libcublas.so locally
I tensorflow/stream_executor/dso_loader.cc:105] successfully opened CUDA library libcudnn.so locally
I tensorflow/stream_executor/dso_loader.cc:105] successfully opened CUDA library libcufft.so locally
I tensorflow/stream_executor/dso_loader.cc:105] successfully opened CUDA library libcuda.so.1 locally
I tensorflow/stream_executor/dso_loader.cc:105] successfully opened CUDA library libcurand.so locally
I tensorflow/core/common_runtime/gpu/gpu_init.cc:102] Found device 0 with properties: 
name: GeForce GTX 980 Ti
major: 5 minor: 2 memoryClockRate (GHz) 1.291
pciBusID 0000:01:00.0
Total memory: 6.00GiB
Free memory: 5.53GiB
I tensorflow/core/common_runtime/gpu/gpu_init.cc:126] DMA: 0 
I tensorflow/core/common_runtime/gpu/gpu_init.cc:136] 0:   Y 
I tensorflow/core/common_runtime/gpu/gpu_device.cc:755] Creating TensorFlow device (/gpu:0) -> (device: 0, name: GeForce GTX 980 Ti, pci bus id: 0000:01:00.0)
E tensorflow/stream_executor/cuda/cuda_driver.cc:932] failed to allocate 5.53G (5935898624 bytes) from device: CUDA_ERROR_OUT_OF_MEMORY
Step #100, epoch #8, avg. train loss: 2.79814
Step #200, epoch #16, avg. train loss: 1.27029
Step #300, epoch #25, avg. train loss: 0.98960
Step #400, epoch #33, avg. train loss: 0.84844
Step #500, epoch #41, avg. train loss: 0.75324
Accuracy: 0.744444

Running digits.py throws the "failed to allocate 5.53G" (the available memory on GPU is 6GB).

Possible solution

I can restrict the allocated memory using

gpu_options = tf.GPUOptions(per_process_gpu_memory_fraction=0.333)
sess = tf.Session(config=tf.ConfigProto(gpu_options=gpu_options))

but I am wondering if there is any other way to handle this error.

@zheng-xq
Copy link
Contributor

zheng-xq commented May 4, 2016

Is this error fatal in your log? Depends on which part of TF triggers this error, it might or might not be a problem.

@terrytangyuan
Copy link
Member

@neurotenguin Are you running skflow digits.py example? Try change the RunConfig and pass it to your estimator before training. See example here.

@neurotenguin
Copy link
Author

@zheng-xq Not fatal, but I don't get CUDA_ERROR_OUT_OF_MEMORY error in other tutorials. Need to trace it in the cuda_driver.cc to see the how gpu memory requests are handled.

@terrytangyuan Yep. Thanks for pointing out that example. I thought maybe there was some autoscaling magic going on.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants