Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Allocating all memory, CUDA OOM #64284

Closed
dennisushi opened this issue Mar 22, 2024 · 4 comments
Closed

Allocating all memory, CUDA OOM #64284

dennisushi opened this issue Mar 22, 2024 · 4 comments
Assignees
Labels
comp:gpu GPU related issues stale This label marks the issue/pr stale - to be closed automatically if no activity stat:awaiting response Status - Awaiting response from author TF 1.13 Issues related to TF 1.13 type:bug Bug

Comments

@dennisushi
Copy link

Issue type

Bug

Have you reproduced the bug with TensorFlow Nightly?

Yes

Source

source

TensorFlow version

1.13, 1.10

Custom code

Yes

OS platform and distribution

Linux, Ubuntu 20

Mobile device

No response

Python version

3.8

Bazel version

No response

GCC/compiler version

No response

CUDA/cuDNN version

11.7

GPU model and memory

V100, 34GB

Current behavior?

TF tries to allocate ALL memory despite not calling any functions that should put any data on the GPU.

Standalone code to reproduce the issue

import os
os.environ["TF_CPP_MIN_LOG_LEVEL"] = "3"
import tensorflow as tf
from keras import backend as K

tf.config.experimental_run_functions_eagerly(not True)
message = "No GPU found. To actually train on CPU remove this assert."
assert tf.config.experimental.list_physical_devices("GPU"), message

gpus = tf.config.experimental.list_physical_devices('GPU')
if gpus:
    print("Found GPUs: ", gpus)
    # Restrict TensorFlow to only use the first GPU
    try:
        # Currently, memory growth needs to be the same across GPUs
        for gpu in gpus:
            print("Setting memory growth for ", gpu)
            tf.config.experimental.set_memory_growth(gpu, True)
        print("Setting visible devices to ", gpus[0])
        tf.config.experimental.set_visible_devices(gpus[0], 'GPU')
        logical_gpus = tf.config.experimental.list_logical_devices('GPU')
        print(len(gpus), "Physical GPUs,", len(logical_gpus), "Logical GPU")
    except RuntimeError as e:
        # Visible devices must be set before GPUs have been initialized
        print(e)

Relevant log output

WARNING:tensorflow:From mvmwm/_tf_error_test.py:6: experimental_run_functions_eagerly (from tensorflow.python.eager.polymorphic_function.quarantine) is deprecated and will be removed in a future version.
Instructions for updating:
Use `tf.config.run_functions_eagerly` instead of the experimental version.
Found GPUs:  [PhysicalDevice(name='/physical_device:GPU:0', device_type='GPU')]
Setting memory growth for  PhysicalDevice(name='/physical_device:GPU:0', device_type='GPU')
Setting visible devices to  PhysicalDevice(name='/physical_device:GPU:0', device_type='GPU')
2024-03-22 14:06:37.455972: F tensorflow/tsl/platform/statusor.cc:33] Attempting to fetch value instead of handling error INTERNAL: failed initializing StreamExecutor for CUDA device ordinal 0: INTERNAL: failed call to cuDevicePrimaryCtxRetain: CUDA_ERROR_OUT_OF_MEMORY: out of memory; total memory reported: 34087305216
Aborted (core dumped)
@google-ml-butler google-ml-butler bot added the type:bug Bug label Mar 22, 2024
@sushreebarsa sushreebarsa added comp:gpu GPU related issues TF 1.13 Issues related to TF 1.13 labels Mar 27, 2024
@sushreebarsa
Copy link
Contributor

@dennisushi I wasn't able to replicate the issue on colab using TF v2.15, please find the gist here.
Kindly use TF latest Version as TF v1.x is no longer actively supported. Thank you!

@sushreebarsa sushreebarsa added the stat:awaiting response Status - Awaiting response from author label Mar 27, 2024
Copy link

github-actions bot commented Apr 4, 2024

This issue is stale because it has been open for 7 days with no activity. It will be closed if no further activity occurs. Thank you.

@github-actions github-actions bot added the stale This label marks the issue/pr stale - to be closed automatically if no activity label Apr 4, 2024
Copy link

This issue was closed because it has been inactive for 7 days since being marked as stale. Please reopen if you'd like to work on this further.

Copy link

Are you satisfied with the resolution of your issue?
Yes
No

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
comp:gpu GPU related issues stale This label marks the issue/pr stale - to be closed automatically if no activity stat:awaiting response Status - Awaiting response from author TF 1.13 Issues related to TF 1.13 type:bug Bug
Projects
None yet
Development

No branches or pull requests

2 participants