Description
OpenVINO Version
2025.02
Operating System
Other (Please specify in description)
Device used for inference
CPU
Framework
ONNX
Model used
https://github.com/PaddlePaddle/PaddleOCR converted into ONNX
Issue description
Operating System: Ubuntu 22.04
Cloud: AWS
Compute: c7i.large (Intel Xeon Scalable (Sapphire Rapids))
I've created a FastAPI endpoint to serve a model trained with PaddleOCR. I converted the file into ONNX, and then I load the file with OpenVINO. However, I noticed that over time the memory grows significantly. Over 1 hour it can grow 1GB, which is unsustainable.
Over the last few days I have checked that it's not any of my other related processes. Only when I perform ML inferencing does it grow like this.

I also notice that when I profile the code with memray flamegraph --leaks
there is a huge amount of uncollected objects from c.

Step-by-step reproduction
RapidOCR calls my custom inference session like this.
class OpenVINOInferSession(InferSession):
def __init__(self, config):
core = Core()
self._verify_model(config["model_path"])
self.model_onnx = core.read_model(config["model_path"])
cpu_nums = os.cpu_count()
infer_num_threads = config.get("inference_num_threads", cpu_nums)
core.set_property("CPU", {"INFERENCE_NUM_THREADS": str(infer_num_threads)})
self.compile_model = core.compile_model(model=self.model_onnx, device_name="CPU")
def __call__(self, input_content: np.ndarray) -> np.ndarray:
request = None
try:
request = self.compile_model.create_infer_request()
request.infer(input_content)
data = request.get_output_tensor().data
return data
except Exception as e:
error_info = traceback.format_exc()
raise OpenVINOError(error_info) from e
finally:
del request
Relevant log output
Issue submission checklist
- I'm reporting an issue. It's not a question.
- I checked the problem with the documentation, FAQ, open issues, Stack Overflow, etc., and have not found a solution.
- There is reproducer code and related data files such as images, videos, models, etc.