-
Notifications
You must be signed in to change notification settings - Fork 1.4k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Add L0 memory growth test #4259
Conversation
based on @dyastremsky : #3827 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Beautiful! This passed for 1M iterations for me with the memory usage bouncing around within a few decimal points of 17MB. Currently running it for 20M Iterations, which will likely run for a while.
Added two small comments. Let me know what you think. You can make the updates and I can review this ticket or I can do it and we can get someone else to review this test.
qa/L0_java_memory_growth/Simple.java
Outdated
TRITONSERVER_InferenceResponseError(completed_response), | ||
"response status"); | ||
|
||
Check( |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We should probably make Check only run based on a command line boolean. That way we can break up the test into a sanity test to make sure that the full Simple example has no memory leak (run for a small number of iterations) and a high-iteration memory leak test just for the inference. Right now, I suspect Check is likely taking up a lot of the time per inference, which I suspect we don't want. What do you think?
qa/L0_java_memory_growth/test.sh
Outdated
# Create local model repository | ||
rm -r models | ||
cp -r `pwd`/../L0_simple_ensemble/models . | ||
mkdir ${MODEL_REPO}/ensemble_add_sub_int32_int32_int32/1 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Let's either change the previous line to only copy the simple model from models, or make this line remove L0_simple_ensemble. It's not part of the test anymore, so we don't need to load it.
Amazing work on the PR! Heads up that the test failed on 20M iterations due to a larger memory allocation range (it's saying a 7.7MB, 32% diff). There are two parts to this:
|
67293c0
to
062a7a8
Compare
I don't see a way to fix #2 with DoubleSummaryStatistics. The ideal would be to use a percentile (e.g. comparing the 90th percentile value versus the median to get the difference), but that'd be hard to do without holding all our data in a data structure, which we don't want to do (especially for the long-running memory growth test). We need to figure out a way to identify and reject/ignore outliers. |
Only cpu memory growth test Co-authored-by: dyastremsky@nvidia.com <dyastremsky@nvidia.com>
Only cpu memory growth test Co-authored-by: dyastremsky@nvidia.com <dyastremsky@nvidia.com>
Only cpu memory growth test Co-authored-by: dyastremsky@nvidia.com <dyastremsky@nvidia.com>
Only cpu memory growth test Co-authored-by: dyastremsky@nvidia.com <dyastremsky@nvidia.com>
No description provided.