Memory leak: tf1 trained saved_model in tf2 for prediction #10759

purvang3 · 2022-09-01T04:35:23Z

Prerequisites

Please answer the following questions for yourself before submitting an issue.

I am using the latest TensorFlow Model Garden release and TensorFlow 2.
I am reporting the issue to the correct repository. (Model Garden official or research directory)
I checked to make sure that this issue has not been filed already.

1. The entire URL of the file you are using

https://github.com/tensorflow/models/tree/master/official/...

2. Describe the bug

A clear and concise description of what the bug is.

I have previously trained ssd_inception_v2 model in tensorflow 1.14. It has frozen_inference graph and saved_model dir with protobuf files and variables. I am running tensorflow 2.6.0. loading tf 1.14 trained saved_model into tf 2.6 is done without problem and it runs smoothly. But over the period of time, cpu memory keeps increasing and after some time, prediction scrip crashes because of memory full. I have tried to load "frozen graph.pb" instead of saved_model.pb and problem still exists. Any help would be appreciated. Using "htop" command, MEM% column keep increasing over the time with follwing script running.

3. Steps to reproduce

Steps to reproduce the behavior.
Any tensorflow 1 trained model with saved_model dir after training.
sample: wget http://download.tensorflow.org/models/object_detection/ssd_inception_v2_coco_2018_01_28.tar.gz

use saved_model dir.

import numpy as np

import tensorflow.compat.v1 as tf
tf.disable_v2_behavior()
tf.config.set_soft_device_placement(True)
output_tensor_dict_global = None

model_path = "saved_model_path"

gpus = tf.config.experimental.list_physical_devices('GPU')

def get_output_tensor_dict():
    global output_tensor_dict_global
    if output_tensor_dict_global:
        return output_tensor_dict_global

    ops = tf.get_default_graph().get_operations()
    all_tensor_names = {output.name for op in ops for output in op.outputs}
    tensor_dict = {}
    for key in ['num_detections', 'detection_boxes', 'detection_scores', 'detection_classes', 'detection_masks']:
        tensor_name = key + ':0'
        if tensor_name in all_tensor_names:
            tensor_dict[key] = tf.get_default_graph().get_tensor_by_name(
                tensor_name)
    output_tensor_dict_global = tensor_dict
    return output_tensor_dict_global


if gpus:
    try:
        for gpu in gpus:
            tf.config.experimental.set_memory_growth(gpu, True)
    except RuntimeError as e:
        print(e)

with tf.Graph().as_default() as g:
    with tf.Session() as sess:
        detection_graph = tf.saved_model.load(sess, ["serve"], model_path)

        while True:
            img_np = np.random.randn(720,1280,3)
            tensor_dict = get_output_tensor_dict()
            image_tensor = tf.get_default_graph().get_tensor_by_name('image_tensor:0')
            output_dict = sess.run(tensor_dict, feed_dict={image_tensor: np.expand_dims(img_np, axis=0)})
            print(output_dict.keys())

I have tested same code with tensorflow 2.9.0 and problem still exists.

4. Expected behavior

A clear and concise description of what you expected to happen.
Memory consumption should be constant.

5. Additional context

Include any logs that would be helpful to diagnose the problem.

6. System information

OS Platform and Distribution (e.g., Linux Ubuntu 16.04): Ubuntu 20.04
Mobile device name if the issue happens on a mobile device:
TensorFlow installed from (source or binary): Tensorflow docker 2.6.0
TensorFlow version (use command below): 2.6.0
Python version: 3.6.9
Bazel version (if compiling from source):
GCC/Compiler version (if compiling from source):
CUDA/cuDNN version:
GPU model and memory: RTX 3070, 8GB

The text was updated successfully, but these errors were encountered:

muxamilian · 2022-09-07T08:56:33Z

I encountered the same memory leak. I also tried the same steps as you. Memory leak for saved model, for frozen inference graph. Doesn't matter whether it's in eager mode or graph mode, the memory leak is always there.

muxamilian · 2022-09-07T10:58:16Z

When disabling the GPU, the memory leak disappears.

try:
    # Disable all GPUS
    tf.config.set_visible_devices([], 'GPU')
    visible_devices = tf.config.get_visible_devices()
    for device in visible_devices:
        assert device.device_type != 'GPU'
except:
    # Invalid device or cannot modify virtual devices once initialized.
    pass

Cuda version is 11.2 and cudnn 8100, tensorflow is 2.7.1. But it also occurs with the newest tensorflow.

sushreebarsa · 2022-09-07T11:01:59Z

@purvang3 Could you refer to the comment above and let us know if it helps? I was able to reproduce the issue on Colab using TF v2.9. Please find the gist here for reference .
Thank you!

muxamilian · 2022-09-07T11:03:45Z

Well, it "helps" but then the model doesn't run on the GPU anymore... So it's certainly not a fix/workaround.

seel-channel · 2023-02-18T23:55:43Z

"Use the tf.config.experimental.set_memory_growth function to allow memory to be allocated as needed instead of allocating all GPU memory at the start."

    # Avoid VRAM Leak
    physical_devices = tf.config.list_physical_devices('GPU') 
    for device in physical_devices:
        tf.config.experimental.set_memory_growth(device, True)
        
    self._model = tf.compat.v2.saved_model.load(model_path)

Before	After

muxamilian · 2023-02-20T17:35:58Z

This is a memory leak of CPU memory, not GPU memory. It also occurs when disabling the GPU altogether.

purvang3 added models:official models that come under official repository type:bug Bug in the code labels Sep 1, 2022

saberkun added models:research:odapi ODAPI and removed models:official models that come under official repository labels Sep 4, 2022

sushreebarsa self-assigned this Sep 7, 2022

sushreebarsa mentioned this issue Sep 7, 2022

Memory leak: tf1 trained saved_model in tf2 for prediction tensorflow/tensorflow#57601

Open

sushreebarsa assigned tombstone, jch1 and pkulzc and unassigned sushreebarsa Sep 7, 2022

sushreebarsa added the stat:awaiting response Waiting on input from the contributor label Sep 7, 2022

sushreebarsa removed the stat:awaiting response Waiting on input from the contributor label Sep 7, 2022

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Memory leak: tf1 trained saved_model in tf2 for prediction #10759

Memory leak: tf1 trained saved_model in tf2 for prediction #10759

purvang3 commented Sep 1, 2022 •

edited

muxamilian commented Sep 7, 2022

muxamilian commented Sep 7, 2022 •

edited

sushreebarsa commented Sep 7, 2022 •

edited

muxamilian commented Sep 7, 2022

seel-channel commented Feb 18, 2023

muxamilian commented Feb 20, 2023

Memory leak: tf1 trained saved_model in tf2 for prediction #10759

Memory leak: tf1 trained saved_model in tf2 for prediction #10759

Comments

purvang3 commented Sep 1, 2022 • edited

Prerequisites

1. The entire URL of the file you are using

2. Describe the bug

3. Steps to reproduce

4. Expected behavior

5. Additional context

6. System information

muxamilian commented Sep 7, 2022

muxamilian commented Sep 7, 2022 • edited

sushreebarsa commented Sep 7, 2022 • edited

muxamilian commented Sep 7, 2022

seel-channel commented Feb 18, 2023

muxamilian commented Feb 20, 2023

purvang3 commented Sep 1, 2022 •

edited

muxamilian commented Sep 7, 2022 •

edited

sushreebarsa commented Sep 7, 2022 •

edited