Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add L0 memory growth test #4259

Merged
merged 8 commits into from
Apr 25, 2022
Merged

Conversation

jbkyang-nvi
Copy link
Contributor

No description provided.

@jbkyang-nvi
Copy link
Contributor Author

jbkyang-nvi commented Apr 21, 2022

Copy link
Contributor

@dyastremsky dyastremsky left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Beautiful! This passed for 1M iterations for me with the memory usage bouncing around within a few decimal points of 17MB. Currently running it for 20M Iterations, which will likely run for a while.

Added two small comments. Let me know what you think. You can make the updates and I can review this ticket or I can do it and we can get someone else to review this test.

TRITONSERVER_InferenceResponseError(completed_response),
"response status");

Check(
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We should probably make Check only run based on a command line boolean. That way we can break up the test into a sanity test to make sure that the full Simple example has no memory leak (run for a small number of iterations) and a high-iteration memory leak test just for the inference. Right now, I suspect Check is likely taking up a lot of the time per inference, which I suspect we don't want. What do you think?

# Create local model repository
rm -r models
cp -r `pwd`/../L0_simple_ensemble/models .
mkdir ${MODEL_REPO}/ensemble_add_sub_int32_int32_int32/1
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Let's either change the previous line to only copy the simple model from models, or make this line remove L0_simple_ensemble. It's not part of the test anymore, so we don't need to load it.

@dyastremsky
Copy link
Contributor

dyastremsky commented Apr 21, 2022

Amazing work on the PR! Heads up that the test failed on 20M iterations due to a larger memory allocation range (it's saying a 7.7MB, 32% diff). There are two parts to this:

  1. The memory oscillates in the 14-17MB range. The current 10% tolerance should be good, though we can increase this to 20% if we want to ensure there's no intermittence on this test.
  2. Outliers. Out of 6,750 memory readings, 3 were higher (22-23MB). I'll look into whether DoubleSummaryStatistics can provide percentiles or get rid of outliers; if not, whether there's a better way to track memory statistics.

@dyastremsky
Copy link
Contributor

Amazing work on the PR! Heads up that the test failed on 20M iterations due to a larger memory allocation range (it's saying a 7.7MB, 32% diff). There are two parts to this:

  1. The memory oscillates in the 14-17MB range. The current 10% tolerance should be good, though we can increase this to 20% if we want to ensure there's no intermittence on this test.
  2. Outliers. Out of 6,750 memory readings, 3 were higher (22-23MB). I'll look into whether DoubleSummaryStatistics can provide percentiles or get rid of outliers; if not, whether there's a better way to track memory statistics.

I don't see a way to fix #2 with DoubleSummaryStatistics. The ideal would be to use a percentile (e.g. comparing the 90th percentile value versus the median to get the difference), but that'd be hard to do without holding all our data in a data structure, which we don't want to do (especially for the long-running memory growth test). We need to figure out a way to identify and reject/ignore outliers.

@jbkyang-nvi jbkyang-nvi merged commit b52e44e into java-api-prerelease Apr 25, 2022
@jbkyang-nvi jbkyang-nvi deleted the kyang-memory-growth branch April 25, 2022 18:07
dyastremsky pushed a commit that referenced this pull request Apr 26, 2022
Only cpu memory growth test
Co-authored-by: dyastremsky@nvidia.com <dyastremsky@nvidia.com>
dyastremsky pushed a commit that referenced this pull request Apr 26, 2022
Only cpu memory growth test
Co-authored-by: dyastremsky@nvidia.com <dyastremsky@nvidia.com>
jbkyang-nvi added a commit that referenced this pull request May 4, 2022
Only cpu memory growth test
Co-authored-by: dyastremsky@nvidia.com <dyastremsky@nvidia.com>
jbkyang-nvi added a commit that referenced this pull request May 10, 2022
Only cpu memory growth test
Co-authored-by: dyastremsky@nvidia.com <dyastremsky@nvidia.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Development

Successfully merging this pull request may close these issues.

None yet

2 participants