Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Loaded trained model allocates most of GPU memory #7224

Open
Borda opened this issue Jul 16, 2019 · 3 comments
Open

Loaded trained model allocates most of GPU memory #7224

Borda opened this issue Jul 16, 2019 · 3 comments
Assignees

Comments

@Borda
Copy link

Borda commented Jul 16, 2019

I want to run multiple object detection instances on a single GPU but each of then always allocates a certain percentage of remaining GPU resources. I was playing with small pre-trained models like ssd_mobilenet_v1_coco from ZOO. I was following this tutorial.

I am wondering why the same script loading the same models takes once 70% 4GB (GTX 1050) and 66% of 11GB (RTX 2080) memory...


System information

  • OS Platform and Distribution: Linux-4.15.0-54-generic-x86_64-with-LinuxMint-19.1-tessa
  • TensorFlow version: ("b'v1.13.1-0-g6612da8951'", '1.13.1')
  • Bazel version (if compiling from source): no
  • CUDA/cuDNN version: Driver: 418.56,CUDA: 10.1
  • GPU model and memory: GeForce GTX 1050 with 4042MiB

Describe the problem

I have downloaded pre-trained simple Object detection model ssd_mobilenet_v1_coco and run prediction on a sample image. The loaded model takes once 70% 4GB (GTX 1050) and 66% of 11GB (RTX 2080) memory.

Source code / logs

import os
import sys
import tarfile
import six.moves.urllib as urllib

import numpy as np
import tensorflow as tf
import matplotlib.pyplot as plt

sys.path.append("/home/jb/Workspace/tfmodels/research")
from object_detection.utils import label_map_util
from object_detection.utils import visualization_utils as vis_util

MODEL_NAME = 'ssd_mobilenet_v1_coco_2018_01_28'
MODEL_FILE = MODEL_NAME + '.tar.gz'
DOWNLOAD_BASE = 'http://download.tensorflow.org/models/object_detection/'
# Path to frozen detection graph. This is the actual model that is used for the object detection.
PATH_TO_CKPT = MODEL_NAME + '/frozen_inference_graph.pb'

opener = urllib.request.URLopener()
opener.retrieve(DOWNLOAD_BASE + MODEL_FILE, MODEL_FILE)
tar_file = tarfile.open(MODEL_FILE)

for file in tar_file.getmembers():
    file_name = os.path.basename(file.name)
    if 'frozen_inference_graph.pb' in file_name:
        tar_file.extract(file, os.getcwd())

detection_graph = tf.Graph()
with detection_graph.as_default():
    graph_def = tf.GraphDef()
    with tf.gfile.GFile(PATH_TO_CKPT, 'rb') as fid:
        serialized_graph = fid.read()
        graph_def.ParseFromString(serialized_graph)
        tf.import_graph_def(graph_def, name='')

image_np = plt.imread('people.png')
if image_np.max() < 1.5:
    image_np = np.clip(np.round(image_np * 255), 0, 255).astype(np.uint8)

config = tf.ConfigProto(
    allow_soft_placement=True,
    log_device_placement=False,
)
config.gpu_options.force_gpu_compatible = False
# Don't pre-allocate memory; allocate as-needed
config.gpu_options.allow_growth = True
# config.gpu_options.per_process_gpu_memory_fraction = 0.7

# with detection_graph.as_default():
with tf.Session(config=config, graph=detection_graph) as sess:

    # Expand dimensions since the model expects images to have shape: [1, None, None, 3]
    image_np_expanded = np.expand_dims(image_np, axis=0)
    image_tensor = detection_graph.get_tensor_by_name('image_tensor:0')

    # Each box represents a part of the image where a particular object was detected.
    boxes = detection_graph.get_tensor_by_name('detection_boxes:0')

    # Each score represent how level of confidence for each of the objects.
    scores = detection_graph.get_tensor_by_name('detection_scores:0')
    classes = detection_graph.get_tensor_by_name('detection_classes:0')
    num_detections = detection_graph.get_tensor_by_name('num_detections:0')

    # Actual detection.
    (boxes, scores, classes, num_detections) = sess.run([boxes, scores, classes, num_detections],
                                                        feed_dict={image_tensor: image_np_expanded})

I got folowing

+-----------------------------------------------------------------------------+
| NVIDIA-SMI 418.56       Driver Version: 418.56       CUDA Version: 10.1     |
|-------------------------------+----------------------+----------------------+
| GPU  Name        Persistence-M| Bus-Id        Disp.A | Volatile Uncorr. ECC |
| Fan  Temp  Perf  Pwr:Usage/Cap|         Memory-Usage | GPU-Util  Compute M. |
|===============================+======================+======================|
|   0  GeForce GTX 1050    Off  | 00000000:01:00.0 Off |                  N/A |
| N/A   44C    P8    N/A /  N/A |   2835MiB /  4042MiB |      9%      Default |
+-------------------------------+----------------------+----------------------+
| Processes:                                                       GPU Memory |
|  GPU       PID   Type   Process name                             Usage      |
|=============================================================================|
|    0     32433      C   /usr/bin/python3.6                          2551MiB |
+-----------------------------------------------------------------------------+

and

+-----------------------------------------------------------------------------+
| NVIDIA-SMI 410.48                 Driver Version: 410.48                    |
|-------------------------------+----------------------+----------------------+
| GPU  Name        Persistence-M| Bus-Id        Disp.A | Volatile Uncorr. ECC |
| Fan  Temp  Perf  Pwr:Usage/Cap|         Memory-Usage | GPU-Util  Compute M. |
|===============================+======================+======================|
|   1  GeForce RTX 208...  Off  | 00000000:41:00.0 Off |                  N/A |
| 35%   53C    P8    24W / 260W |   6973MiB / 10986MiB |      0%      Default |
+-------------------------------+----------------------+----------------------+
| Processes:                                                       GPU Memory |
|  GPU       PID   Type   Process name                             Usage      |
|=============================================================================|
|    1     22749      C   /usr/bin/python3                            6963MiB |
+-----------------------------------------------------------------------------+
@Borda Borda changed the title Loaded trained model allocates whole GPU memory Loaded trained model allocates most of GPU memory Jul 16, 2019
@tensorflowbutler tensorflowbutler added the stat:awaiting response Waiting on input from the contributor label Jul 17, 2019
@tensorflowbutler
Copy link
Member

Thank you for your post. We noticed you have not filled out the following field in the issue template. Could you update them if they are relevant in your case, or leave them as N/A? Thanks.
What is the top-level directory of the model you are using
Have I written custom code
TensorFlow installed from
Exact command to reproduce

@Borda
Copy link
Author

Borda commented Jul 18, 2019

What is the top-level directory of the model you are using
there is no such directory, it is downloaded on the fly to the same folder as the code is running
Have I written custom code
kind of, I took inspiration from referred post and adjust it to minimum
TensorFlow installed from
pip
Exact command to reproduce
run the attached code in PyCharm debugger
@tensorflowbutler ^^

@tensorflowbutler tensorflowbutler removed the stat:awaiting response Waiting on input from the contributor label Jul 19, 2019
@gowthamkpr gowthamkpr added the models:research models that come under research directory label Sep 10, 2019
@Borda
Copy link
Author

Borda commented Oct 25, 2019

@tensorflowbutler @gowthamkpr any update here?

@jaeyounkim jaeyounkim added models:research:odapi ODAPI and removed models:research models that come under research directory labels Jun 25, 2021
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

7 participants