You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
I want to run multiple object detection instances on a single GPU but each of then always allocates a certain percentage of remaining GPU resources. I was playing with small pre-trained models like ssd_mobilenet_v1_coco from ZOO. I was following this tutorial.
I am wondering why the same script loading the same models takes once 70% 4GB (GTX 1050) and 66% of 11GB (RTX 2080) memory...
System information
OS Platform and Distribution: Linux-4.15.0-54-generic-x86_64-with-LinuxMint-19.1-tessa
GPU model and memory: GeForce GTX 1050 with 4042MiB
Describe the problem
I have downloaded pre-trained simple Object detection model ssd_mobilenet_v1_coco and run prediction on a sample image. The loaded model takes once 70% 4GB (GTX 1050) and 66% of 11GB (RTX 2080) memory.
Source code / logs
importosimportsysimporttarfileimportsix.moves.urllibasurllibimportnumpyasnpimporttensorflowastfimportmatplotlib.pyplotaspltsys.path.append("/home/jb/Workspace/tfmodels/research")
fromobject_detection.utilsimportlabel_map_utilfromobject_detection.utilsimportvisualization_utilsasvis_utilMODEL_NAME='ssd_mobilenet_v1_coco_2018_01_28'MODEL_FILE=MODEL_NAME+'.tar.gz'DOWNLOAD_BASE='http://download.tensorflow.org/models/object_detection/'# Path to frozen detection graph. This is the actual model that is used for the object detection.PATH_TO_CKPT=MODEL_NAME+'/frozen_inference_graph.pb'opener=urllib.request.URLopener()
opener.retrieve(DOWNLOAD_BASE+MODEL_FILE, MODEL_FILE)
tar_file=tarfile.open(MODEL_FILE)
forfileintar_file.getmembers():
file_name=os.path.basename(file.name)
if'frozen_inference_graph.pb'infile_name:
tar_file.extract(file, os.getcwd())
detection_graph=tf.Graph()
withdetection_graph.as_default():
graph_def=tf.GraphDef()
withtf.gfile.GFile(PATH_TO_CKPT, 'rb') asfid:
serialized_graph=fid.read()
graph_def.ParseFromString(serialized_graph)
tf.import_graph_def(graph_def, name='')
image_np=plt.imread('people.png')
ifimage_np.max() <1.5:
image_np=np.clip(np.round(image_np*255), 0, 255).astype(np.uint8)
config=tf.ConfigProto(
allow_soft_placement=True,
log_device_placement=False,
)
config.gpu_options.force_gpu_compatible=False# Don't pre-allocate memory; allocate as-neededconfig.gpu_options.allow_growth=True# config.gpu_options.per_process_gpu_memory_fraction = 0.7# with detection_graph.as_default():withtf.Session(config=config, graph=detection_graph) assess:
# Expand dimensions since the model expects images to have shape: [1, None, None, 3]image_np_expanded=np.expand_dims(image_np, axis=0)
image_tensor=detection_graph.get_tensor_by_name('image_tensor:0')
# Each box represents a part of the image where a particular object was detected.boxes=detection_graph.get_tensor_by_name('detection_boxes:0')
# Each score represent how level of confidence for each of the objects.scores=detection_graph.get_tensor_by_name('detection_scores:0')
classes=detection_graph.get_tensor_by_name('detection_classes:0')
num_detections=detection_graph.get_tensor_by_name('num_detections:0')
# Actual detection.
(boxes, scores, classes, num_detections) =sess.run([boxes, scores, classes, num_detections],
feed_dict={image_tensor: image_np_expanded})
I got folowing
+-----------------------------------------------------------------------------+
| NVIDIA-SMI 418.56 Driver Version: 418.56 CUDA Version: 10.1 |
|-------------------------------+----------------------+----------------------+
| GPU Name Persistence-M| Bus-Id Disp.A | Volatile Uncorr. ECC |
| Fan Temp Perf Pwr:Usage/Cap| Memory-Usage | GPU-Util Compute M. |
|===============================+======================+======================|
| 0 GeForce GTX 1050 Off | 00000000:01:00.0 Off | N/A |
| N/A 44C P8 N/A / N/A | 2835MiB / 4042MiB | 9% Default |
+-------------------------------+----------------------+----------------------+
| Processes: GPU Memory |
| GPU PID Type Process name Usage |
|=============================================================================|
| 0 32433 C /usr/bin/python3.6 2551MiB |
+-----------------------------------------------------------------------------+
and
+-----------------------------------------------------------------------------+
| NVIDIA-SMI 410.48 Driver Version: 410.48 |
|-------------------------------+----------------------+----------------------+
| GPU Name Persistence-M| Bus-Id Disp.A | Volatile Uncorr. ECC |
| Fan Temp Perf Pwr:Usage/Cap| Memory-Usage | GPU-Util Compute M. |
|===============================+======================+======================|
| 1 GeForce RTX 208... Off | 00000000:41:00.0 Off | N/A |
| 35% 53C P8 24W / 260W | 6973MiB / 10986MiB | 0% Default |
+-------------------------------+----------------------+----------------------+
| Processes: GPU Memory |
| GPU PID Type Process name Usage |
|=============================================================================|
| 1 22749 C /usr/bin/python3 6963MiB |
+-----------------------------------------------------------------------------+
The text was updated successfully, but these errors were encountered:
Borda
changed the title
Loaded trained model allocates whole GPU memory
Loaded trained model allocates most of GPU memory
Jul 16, 2019
Thank you for your post. We noticed you have not filled out the following field in the issue template. Could you update them if they are relevant in your case, or leave them as N/A? Thanks.
What is the top-level directory of the model you are using
Have I written custom code
TensorFlow installed from
Exact command to reproduce
What is the top-level directory of the model you are using
there is no such directory, it is downloaded on the fly to the same folder as the code is running Have I written custom code
kind of, I took inspiration from referred post and adjust it to minimum TensorFlow installed from pip Exact command to reproduce
run the attached code in PyCharm debugger @tensorflowbutler ^^
I want to run multiple object detection instances on a single GPU but each of then always allocates a certain percentage of remaining GPU resources. I was playing with small pre-trained models like ssd_mobilenet_v1_coco from ZOO. I was following this tutorial.
I am wondering why the same script loading the same models takes once 70% 4GB (GTX 1050) and 66% of 11GB (RTX 2080) memory...
System information
Describe the problem
I have downloaded pre-trained simple Object detection model ssd_mobilenet_v1_coco and run prediction on a sample image. The loaded model takes once 70% 4GB (GTX 1050) and 66% of 11GB (RTX 2080) memory.
Source code / logs
I got folowing
and
The text was updated successfully, but these errors were encountered: