Add L0 memory growth test #4259

jbkyang-nvi · 2022-04-21T02:16:19Z

No description provided.

jbkyang-nvi · 2022-04-21T02:25:20Z

based on @dyastremsky : #3827
dyas-memory-growth...kyang-memory-growth
and @jackyh 's PR: #4054

dyastremsky

Beautiful! This passed for 1M iterations for me with the memory usage bouncing around within a few decimal points of 17MB. Currently running it for 20M Iterations, which will likely run for a while.

Added two small comments. Let me know what you think. You can make the updates and I can review this ticket or I can do it and we can get someone else to review this test.

dyastremsky · 2022-04-21T19:39:54Z

qa/L0_java_memory_growth/Simple.java

+            TRITONSERVER_InferenceResponseError(completed_response),
+            "response status");
+
+        Check(


We should probably make Check only run based on a command line boolean. That way we can break up the test into a sanity test to make sure that the full Simple example has no memory leak (run for a small number of iterations) and a high-iteration memory leak test just for the inference. Right now, I suspect Check is likely taking up a lot of the time per inference, which I suspect we don't want. What do you think?

dyastremsky · 2022-04-21T19:43:42Z

qa/L0_java_memory_growth/test.sh

+# Create local model repository
+rm -r models
+cp -r `pwd`/../L0_simple_ensemble/models .
+mkdir ${MODEL_REPO}/ensemble_add_sub_int32_int32_int32/1


Let's either change the previous line to only copy the simple model from models, or make this line remove L0_simple_ensemble. It's not part of the test anymore, so we don't need to load it.

dyastremsky · 2022-04-21T22:14:16Z

Amazing work on the PR! Heads up that the test failed on 20M iterations due to a larger memory allocation range (it's saying a 7.7MB, 32% diff). There are two parts to this:

The memory oscillates in the 14-17MB range. The current 10% tolerance should be good, though we can increase this to 20% if we want to ensure there's no intermittence on this test.
Outliers. Out of 6,750 memory readings, 3 were higher (22-23MB). I'll look into whether DoubleSummaryStatistics can provide percentiles or get rid of outliers; if not, whether there's a better way to track memory statistics.

dyastremsky · 2022-04-21T22:28:13Z

Amazing work on the PR! Heads up that the test failed on 20M iterations due to a larger memory allocation range (it's saying a 7.7MB, 32% diff). There are two parts to this:

The memory oscillates in the 14-17MB range. The current 10% tolerance should be good, though we can increase this to 20% if we want to ensure there's no intermittence on this test.

Outliers. Out of 6,750 memory readings, 3 were higher (22-23MB). I'll look into whether DoubleSummaryStatistics can provide percentiles or get rid of outliers; if not, whether there's a better way to track memory statistics.

I don't see a way to fix #2 with DoubleSummaryStatistics. The ideal would be to use a percentile (e.g. comparing the 90th percentile value versus the median to get the difference), but that'd be hard to do without holding all our data in a data structure, which we don't want to do (especially for the long-running memory growth test). We need to figure out a way to identify and reject/ignore outliers.

Only cpu memory growth test Co-authored-by: dyastremsky@nvidia.com <dyastremsky@nvidia.com>

dyastremsky and others added 7 commits January 19, 2022 10:28

Added memory growth tests.

ae09d66

Fix copyright.

667d18c

Made the number of iterations configurable.

c8c07ef

Added explicit garbage collection.

f87a25a

remove cuda and tensorrt dependencies

fc59bc6

cpu only simple

cb42613

added jack's bug fix and fixed sed bug

90dbebe

jbkyang-nvi requested a review from dyastremsky April 21, 2022 02:16

dyastremsky reviewed Apr 21, 2022

View reviewed changes

dyastremsky approved these changes Apr 21, 2022

View reviewed changes

addressed comments

062a7a8

jbkyang-nvi force-pushed the kyang-memory-growth branch from 67293c0 to 062a7a8 Compare April 21, 2022 22:27

jbkyang-nvi merged commit b52e44e into java-api-prerelease Apr 25, 2022

jbkyang-nvi deleted the kyang-memory-growth branch April 25, 2022 18:07

This was referenced Apr 25, 2022

Memory Growth Tests for Java API #3827

Closed

add java binding memory growth test #4054

Closed

dyastremsky pushed a commit that referenced this pull request Apr 26, 2022

Add L0 memory growth test (#4259)

7cf5e74

Only cpu memory growth test Co-authored-by: dyastremsky@nvidia.com <dyastremsky@nvidia.com>

dyastremsky pushed a commit that referenced this pull request Apr 26, 2022

Add L0 memory growth test (#4259)

913f924

Only cpu memory growth test Co-authored-by: dyastremsky@nvidia.com <dyastremsky@nvidia.com>

jbkyang-nvi added a commit that referenced this pull request May 4, 2022

Add L0 memory growth test (#4259)

7623d73

Only cpu memory growth test Co-authored-by: dyastremsky@nvidia.com <dyastremsky@nvidia.com>

jbkyang-nvi added a commit that referenced this pull request May 10, 2022

Add L0 memory growth test (#4259)

47e7d83

Only cpu memory growth test Co-authored-by: dyastremsky@nvidia.com <dyastremsky@nvidia.com>

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Add L0 memory growth test #4259

Add L0 memory growth test #4259

jbkyang-nvi commented Apr 21, 2022

jbkyang-nvi commented Apr 21, 2022 •

edited

dyastremsky left a comment

dyastremsky Apr 21, 2022

dyastremsky Apr 21, 2022

dyastremsky commented Apr 21, 2022 •

edited

dyastremsky commented Apr 21, 2022

Add L0 memory growth test #4259

Add L0 memory growth test #4259

Conversation

jbkyang-nvi commented Apr 21, 2022

jbkyang-nvi commented Apr 21, 2022 • edited

dyastremsky left a comment

Choose a reason for hiding this comment

dyastremsky Apr 21, 2022

Choose a reason for hiding this comment

dyastremsky Apr 21, 2022

Choose a reason for hiding this comment

dyastremsky commented Apr 21, 2022 • edited

dyastremsky commented Apr 21, 2022

jbkyang-nvi commented Apr 21, 2022 •

edited

dyastremsky commented Apr 21, 2022 •

edited