Loaded trained model allocates most of GPU memory #7224

Borda · 2019-07-16T11:33:08Z

I want to run multiple object detection instances on a single GPU but each of then always allocates a certain percentage of remaining GPU resources. I was playing with small pre-trained models like ssd_mobilenet_v1_coco from ZOO. I was following this tutorial.

I am wondering why the same script loading the same models takes once 70% 4GB (GTX 1050) and 66% of 11GB (RTX 2080) memory...

System information

OS Platform and Distribution: Linux-4.15.0-54-generic-x86_64-with-LinuxMint-19.1-tessa
TensorFlow version: ("b'v1.13.1-0-g6612da8951'", '1.13.1')
Bazel version (if compiling from source): no
CUDA/cuDNN version: Driver: 418.56,CUDA: 10.1
GPU model and memory: GeForce GTX 1050 with 4042MiB

Describe the problem

I have downloaded pre-trained simple Object detection model ssd_mobilenet_v1_coco and run prediction on a sample image. The loaded model takes once 70% 4GB (GTX 1050) and 66% of 11GB (RTX 2080) memory.

Source code / logs

import os
import sys
import tarfile
import six.moves.urllib as urllib

import numpy as np
import tensorflow as tf
import matplotlib.pyplot as plt

sys.path.append("/home/jb/Workspace/tfmodels/research")
from object_detection.utils import label_map_util
from object_detection.utils import visualization_utils as vis_util

MODEL_NAME = 'ssd_mobilenet_v1_coco_2018_01_28'
MODEL_FILE = MODEL_NAME + '.tar.gz'
DOWNLOAD_BASE = 'http://download.tensorflow.org/models/object_detection/'
# Path to frozen detection graph. This is the actual model that is used for the object detection.
PATH_TO_CKPT = MODEL_NAME + '/frozen_inference_graph.pb'

opener = urllib.request.URLopener()
opener.retrieve(DOWNLOAD_BASE + MODEL_FILE, MODEL_FILE)
tar_file = tarfile.open(MODEL_FILE)

for file in tar_file.getmembers():
    file_name = os.path.basename(file.name)
    if 'frozen_inference_graph.pb' in file_name:
        tar_file.extract(file, os.getcwd())

detection_graph = tf.Graph()
with detection_graph.as_default():
    graph_def = tf.GraphDef()
    with tf.gfile.GFile(PATH_TO_CKPT, 'rb') as fid:
        serialized_graph = fid.read()
        graph_def.ParseFromString(serialized_graph)
        tf.import_graph_def(graph_def, name='')

image_np = plt.imread('people.png')
if image_np.max() < 1.5:
    image_np = np.clip(np.round(image_np * 255), 0, 255).astype(np.uint8)

config = tf.ConfigProto(
    allow_soft_placement=True,
    log_device_placement=False,
)
config.gpu_options.force_gpu_compatible = False
# Don't pre-allocate memory; allocate as-needed
config.gpu_options.allow_growth = True
# config.gpu_options.per_process_gpu_memory_fraction = 0.7

# with detection_graph.as_default():
with tf.Session(config=config, graph=detection_graph) as sess:

    # Expand dimensions since the model expects images to have shape: [1, None, None, 3]
    image_np_expanded = np.expand_dims(image_np, axis=0)
    image_tensor = detection_graph.get_tensor_by_name('image_tensor:0')

    # Each box represents a part of the image where a particular object was detected.
    boxes = detection_graph.get_tensor_by_name('detection_boxes:0')

    # Each score represent how level of confidence for each of the objects.
    scores = detection_graph.get_tensor_by_name('detection_scores:0')
    classes = detection_graph.get_tensor_by_name('detection_classes:0')
    num_detections = detection_graph.get_tensor_by_name('num_detections:0')

    # Actual detection.
    (boxes, scores, classes, num_detections) = sess.run([boxes, scores, classes, num_detections],
                                                        feed_dict={image_tensor: image_np_expanded})

I got folowing

+-----------------------------------------------------------------------------+
| NVIDIA-SMI 418.56       Driver Version: 418.56       CUDA Version: 10.1     |
|-------------------------------+----------------------+----------------------+
| GPU  Name        Persistence-M| Bus-Id        Disp.A | Volatile Uncorr. ECC |
| Fan  Temp  Perf  Pwr:Usage/Cap|         Memory-Usage | GPU-Util  Compute M. |
|===============================+======================+======================|
|   0  GeForce GTX 1050    Off  | 00000000:01:00.0 Off |                  N/A |
| N/A   44C    P8    N/A /  N/A |   2835MiB /  4042MiB |      9%      Default |
+-------------------------------+----------------------+----------------------+
| Processes:                                                       GPU Memory |
|  GPU       PID   Type   Process name                             Usage      |
|=============================================================================|
|    0     32433      C   /usr/bin/python3.6                          2551MiB |
+-----------------------------------------------------------------------------+

and

+-----------------------------------------------------------------------------+
| NVIDIA-SMI 410.48                 Driver Version: 410.48                    |
|-------------------------------+----------------------+----------------------+
| GPU  Name        Persistence-M| Bus-Id        Disp.A | Volatile Uncorr. ECC |
| Fan  Temp  Perf  Pwr:Usage/Cap|         Memory-Usage | GPU-Util  Compute M. |
|===============================+======================+======================|
|   1  GeForce RTX 208...  Off  | 00000000:41:00.0 Off |                  N/A |
| 35%   53C    P8    24W / 260W |   6973MiB / 10986MiB |      0%      Default |
+-------------------------------+----------------------+----------------------+
| Processes:                                                       GPU Memory |
|  GPU       PID   Type   Process name                             Usage      |
|=============================================================================|
|    1     22749      C   /usr/bin/python3                            6963MiB |
+-----------------------------------------------------------------------------+

The text was updated successfully, but these errors were encountered:

tensorflowbutler · 2019-07-17T12:11:16Z

Thank you for your post. We noticed you have not filled out the following field in the issue template. Could you update them if they are relevant in your case, or leave them as N/A? Thanks.
What is the top-level directory of the model you are using
Have I written custom code
TensorFlow installed from
Exact command to reproduce

Borda · 2019-07-18T20:02:54Z

What is the top-level directory of the model you are using
there is no such directory, it is downloaded on the fly to the same folder as the code is running
Have I written custom code
kind of, I took inspiration from referred post and adjust it to minimum
TensorFlow installed from
pip
Exact command to reproduce
run the attached code in PyCharm debugger
@tensorflowbutler ^^

Borda · 2019-10-25T11:14:46Z

@tensorflowbutler @gowthamkpr any update here?

Borda changed the title ~~Loaded trained model allocates whole GPU memory~~ Loaded trained model allocates most of GPU memory Jul 16, 2019

tensorflowbutler added the stat:awaiting response Waiting on input from the contributor label Jul 17, 2019

tensorflowbutler removed the stat:awaiting response Waiting on input from the contributor label Jul 19, 2019

gowthamkpr added the models:research models that come under research directory label Sep 10, 2019

gbaned assigned tombstone, jch1 and pkulzc Jun 23, 2020

jaeyounkim added models:research:odapi ODAPI and removed models:research models that come under research directory labels Jun 25, 2021

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Loaded trained model allocates most of GPU memory #7224

Loaded trained model allocates most of GPU memory #7224

Borda commented Jul 16, 2019 •

edited

tensorflowbutler commented Jul 17, 2019

Borda commented Jul 18, 2019 •

edited

Borda commented Oct 25, 2019

Loaded trained model allocates most of GPU memory #7224

Loaded trained model allocates most of GPU memory #7224

Comments

Borda commented Jul 16, 2019 • edited

System information

Describe the problem

Source code / logs

tensorflowbutler commented Jul 17, 2019

Borda commented Jul 18, 2019 • edited

Borda commented Oct 25, 2019

Borda commented Jul 16, 2019 •

edited

Borda commented Jul 18, 2019 •

edited