-
Notifications
You must be signed in to change notification settings - Fork 74.2k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Add new op BytesInUse, similar to MaxBytesInUse #13107
Conversation
@yaroslavvb, thanks for your PR! By analyzing the history of the files in this pull request, we identified @wujingyue, @lukeiwanski and @tensorflower-gardener to be potential reviewers. |
Can one of the admins verify this patch? |
Note that bytes_in_use is also what's used by timeline for memory tracking -- 7b5f590 |
} | ||
}; | ||
|
||
// register this op on GPU only, see comment for MaxBytesInUse for reason |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Nit: "// Register"
|
||
#ifdef TENSORFLOW_USE_SYCL | ||
REGISTER_KERNEL_BUILDER( | ||
Name("MaxBytesInUse").Device(DEVICE_SYCL).HostMemory("out"), |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Do you mean "BytesInUse" instead of "MaxBytesInUse"?
@@ -16,6 +16,31 @@ limitations under the License. | |||
|
|||
namespace tensorflow { | |||
|
|||
// Op that measures current memory in bytes. | |||
class BytesInUseOp : public MemoryStatsOp { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
For consistency, I suggest move this op after MemoryStatsOp.
self.assertGreaterEqual(max_bytes_in_use, matrix_size_in_bytes * 3) | ||
self.assertLess(max_bytes_in_use, matrix_size_in_bytes * 4) | ||
self.assertGreaterEqual(bytes_in_use, matrix_size_in_bytes * 3) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The tests you added don't distinguish BytesInUse from MaxBytesInUse. One idea is to launch a second session with a smaller graph. MaxBytesInUse should return the larger size, and BytesInUse should return the smaller size.
OK, I guess for adding new test I'll actually need to setup TensorFlow development env :) I'll ping this thread after updating |
dafaf99
to
7e6fd8a
Compare
added test to make sure intermediate memory usage gets measured correctly, ptal |
Jenkins, test this please. |
# intermediate result allocates 1 matrix, max usage is at least 2 | ||
self.assertGreaterEqual(bytes_in_use, matrix_size_in_bytes * 1) | ||
self.assertLess(bytes_in_use, matrix_size_in_bytes * 2) | ||
self.assertGreaterEqual(max_bytes_in_use, matrix_size_in_bytes * 2) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can we make this more strict? max_bytes_in_use should >= matrix_size_in_bytes * 3, because as you said it's sticky to the allocator.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I left it that way because I wasn't sure about guarantees about lifetime of the allocator. If the allocator gets reset between run calls this test would be too strict. I'll assume allocator lifetime is guaranteed to span 2 session.run calls and update this in a bit
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM with minor comments.
Jenkins, test this please. |
MacOS failures unrelated to this cl (GCS cloud tests) |
That is a known flaky test so re-running the tests. Jenkins, test this please. |
Imported from GitHub PR openxla/xla#13107 Copybara import of the project: -- 6555b6d216bb18b1f98eb36c5dac3f58f4d09c05 by mmakevic <Milica.Makevic@amd.com>: Fix reduce_row_vectorized.hlo.test Merging this change closes #13107 FUTURE_COPYBARA_INTEGRATE_REVIEW=openxla/xla#13107 from ROCm:ci_reduce_row_vectorized 6555b6d216bb18b1f98eb36c5dac3f58f4d09c05 PiperOrigin-RevId: 638442671
Imported from GitHub PR openxla/xla#13107 Copybara import of the project: -- 6555b6d216bb18b1f98eb36c5dac3f58f4d09c05 by mmakevic <Milica.Makevic@amd.com>: Fix reduce_row_vectorized.hlo.test Merging this change closes #13107 FUTURE_COPYBARA_INTEGRATE_REVIEW=openxla/xla#13107 from ROCm:ci_reduce_row_vectorized 6555b6d216bb18b1f98eb36c5dac3f58f4d09c05 PiperOrigin-RevId: 638442671
Imported from GitHub PR openxla/xla#13107 Copybara import of the project: -- 6555b6d216bb18b1f98eb36c5dac3f58f4d09c05 by mmakevic <Milica.Makevic@amd.com>: Fix reduce_row_vectorized.hlo.test Merging this change closes #13107 PiperOrigin-RevId: 638453162
Adding BytesInUse
This is more useful than MaxBytesInUse for getting peak memory for a given session.run call because the latter gives maximum memory usage over lifetime of allocator, which can span multiple session.run calls/multiple session objects