Can't allocate memory from GPU error in digits.py #2219

neurotenguin · 2016-05-04T16:29:51Z

Environment info

Operating System: 16.04

Installed version of CUDA and cuDNN:
(please attach the output of ls -l /path/to/cuda/lib/libcud*):

-rw-r--r--   1 root root    322936 Eyl 19  2015 libcudadevrt.a
lrwxrwxrwx   1 root root        16 Mar 30 15:25 libcudart.so -> libcudart.so.7.5
lrwxrwxrwx   1 root root        19 Mar 30 15:25 libcudart.so.7.5 -> libcudart.so.7.5.18
-rw-r--r--   1 root root    383336 Eyl 19  2015 libcudart.so.7.5.18
-rw-r--r--   1 root root    720192 Eyl 19  2015 libcudart_static.a
lrwxrwxrwx   1 root root        12 Nis 14 18:53 libcuda.so -> libcuda.so.1
lrwxrwxrwx   1 root root        17 Nis 14 18:53 libcuda.so.1 -> libcuda.so.361.42
-rw-r--r--   1 root root  16881416 Mar 23 02:42 libcuda.so.361.42
-rwxr-xr-x   1 root root  61453024 Nis 30 11:36 libcudnn.so
-rwxr-xr-x   1 root root  61453024 Nis 30 11:36 libcudnn.so.4
-rwxr-xr-x   1 root root  61453024 Nis 30 11:36 libcudnn.so.4.0.7
-rwxr-xr-x   1 root root  59823168 Nis 30 11:12 libcudnn.so.5
-rwxr-xr-x   1 root root  59823168 Nis 30 11:12 libcudnn.so.5.0.4
-rw-r--r--   1 root root  62025862 Nis 30 11:36 libcudnn_static.a

Installed using:

sudo pip install --upgrade https://storage.googleapis.com/tensorflow/linux/gpu/tensorflow-0.8.0-cp27-none-linux_x86_64.whl```

I tensorflow/stream_executor/dso_loader.cc:105] successfully opened CUDA library libcublas.so locally
I tensorflow/stream_executor/dso_loader.cc:105] successfully opened CUDA library libcudnn.so locally
I tensorflow/stream_executor/dso_loader.cc:105] successfully opened CUDA library libcufft.so locally
I tensorflow/stream_executor/dso_loader.cc:105] successfully opened CUDA library libcuda.so.1 locally
I tensorflow/stream_executor/dso_loader.cc:105] successfully opened CUDA library libcurand.so locally
0.8.0

Steps to reproduce

(tensorflow)username@pcname:~/Research/ai/tf_examples$ python digits.py 
I tensorflow/stream_executor/dso_loader.cc:105] successfully opened CUDA library libcublas.so locally
I tensorflow/stream_executor/dso_loader.cc:105] successfully opened CUDA library libcudnn.so locally
I tensorflow/stream_executor/dso_loader.cc:105] successfully opened CUDA library libcufft.so locally
I tensorflow/stream_executor/dso_loader.cc:105] successfully opened CUDA library libcuda.so.1 locally
I tensorflow/stream_executor/dso_loader.cc:105] successfully opened CUDA library libcurand.so locally
I tensorflow/core/common_runtime/gpu/gpu_init.cc:102] Found device 0 with properties: 
name: GeForce GTX 980 Ti
major: 5 minor: 2 memoryClockRate (GHz) 1.291
pciBusID 0000:01:00.0
Total memory: 6.00GiB
Free memory: 5.53GiB
I tensorflow/core/common_runtime/gpu/gpu_init.cc:126] DMA: 0 
I tensorflow/core/common_runtime/gpu/gpu_init.cc:136] 0:   Y 
I tensorflow/core/common_runtime/gpu/gpu_device.cc:755] Creating TensorFlow device (/gpu:0) -> (device: 0, name: GeForce GTX 980 Ti, pci bus id: 0000:01:00.0)
E tensorflow/stream_executor/cuda/cuda_driver.cc:932] failed to allocate 5.53G (5935898624 bytes) from device: CUDA_ERROR_OUT_OF_MEMORY
Step #100, epoch #8, avg. train loss: 2.79814
Step #200, epoch #16, avg. train loss: 1.27029
Step #300, epoch #25, avg. train loss: 0.98960
Step #400, epoch #33, avg. train loss: 0.84844
Step #500, epoch #41, avg. train loss: 0.75324
Accuracy: 0.744444

Running digits.py throws the "failed to allocate 5.53G" (the available memory on GPU is 6GB).

Possible solution

I can restrict the allocated memory using

gpu_options = tf.GPUOptions(per_process_gpu_memory_fraction=0.333)
sess = tf.Session(config=tf.ConfigProto(gpu_options=gpu_options))

but I am wondering if there is any other way to handle this error.

The text was updated successfully, but these errors were encountered:

zheng-xq · 2016-05-04T17:32:21Z

Is this error fatal in your log? Depends on which part of TF triggers this error, it might or might not be a problem.

terrytangyuan · 2016-05-04T17:47:27Z

@neurotenguin Are you running skflow digits.py example? Try change the RunConfig and pass it to your estimator before training. See example here.

neurotenguin · 2016-05-04T17:55:56Z

@zheng-xq Not fatal, but I don't get CUDA_ERROR_OUT_OF_MEMORY error in other tutorials. Need to trace it in the cuda_driver.cc to see the how gpu memory requests are handled.

@terrytangyuan Yep. Thanks for pointing out that example. I thought maybe there was some autoscaling magic going on.

neurotenguin closed this as completed May 4, 2016

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Can't allocate memory from GPU error in digits.py #2219

Can't allocate memory from GPU error in digits.py #2219

neurotenguin commented May 4, 2016 •

edited

zheng-xq commented May 4, 2016

terrytangyuan commented May 4, 2016

neurotenguin commented May 4, 2016

Can't allocate memory from GPU error in digits.py #2219

Can't allocate memory from GPU error in digits.py #2219

Comments

neurotenguin commented May 4, 2016 • edited

Environment info

Steps to reproduce

Possible solution

zheng-xq commented May 4, 2016

terrytangyuan commented May 4, 2016

neurotenguin commented May 4, 2016

neurotenguin commented May 4, 2016 •

edited