-
Notifications
You must be signed in to change notification settings - Fork 3.2k
Program crashes (segmentation fault) during interrupted load tests using TensorRT/CUDA EP #24601
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Comments
I compiled ONNX Runtime with Debug flags, resulting in more informative stack traces:
|
If I understand correctly, you would like to be able to interrupt locust tests without having onnxruntime crashing? |
Yes, but to clarify: Interrupting a Locust test may resemble real-world scenarios where multiple requests are sent to ONNX Runtime, and network issues between ONNX Runtime and the user could corrupt the connection. |
But even if the connection is lost, the thread should continue to run. I don't know locust but I wonder how the thread runing onnxruntime is terminated. Maybe the memory hosting the data onnxruntime is playing with is removed before onnxruntime ends. |
To gracefully shutdown a running inference use terminate flag which is a member of |
I have tested many scenarios and realized that when the connection is interrupted on the client side (Locust), the gRPC server (Tonic) terminates the concurrent request. At the same time, the payload is still being processed by ONNX Runtime, and some memory may be released before ONNX Runtime finishes execution, which leads to a program crash. |
Describe the issue
I have a program written using Rust bindings (ort) for ONNX Runtime (C/C++ backend). The program serves 3 models with TensorRT/CUDA execution provider, each configured with intra_threads=1 (greater value also face the same issue). The service is exposed via a gRPC server built with tonic.
When performing load testing with Locust (a Python load testing tool) at 128 concurrent users (CCU), I observe the following behavior:
If I interrupt the Locust test (Click "Stop" button), ~80% of the time the program crashes with Segmentation Fault (not a Rust panic, suggesting the issue originates from ONNX Runtime).
If I let Locust run the test continuously for a full day (without interruption), the program runs without any errors.
As an alternative test, I wrote a separate program using a ThreadPool (128 workers) to call the gRPC server. When allowing this test to finish without interruption, I never encounter any errors. This has been confirmed through multiple test runs.
Stack Trace:
Occasionally, I also encounter this error:
To reproduce
Let me know if you need further details to reproduce the issue.
My machine information:
OS: Ubuntu 22.04
GPU: Nvidia A40
CPU: AMD EPYC Processor (with IBPB) (2 Socket with 16 core each)
RAM: 256GB
Urgency
This issue prevents the program from running in production.
Platform
Linux
OS Version
Ubuntu 22.04.5 LTS
ONNX Runtime Installation
Built from Source
ONNX Runtime Version or Commit ID
1.21.1
ONNX Runtime API
Other / Unknown
Architecture
X64
Execution Provider
CUDA, TensorRT
Execution Provider Library Version
nvidia/cuda:11.8.0-cudnn8-devel-ubuntu22.04 && Tensorrt@8.6.1.6-1+cuda11.8
The text was updated successfully, but these errors were encountered: