-
Notifications
You must be signed in to change notification settings - Fork 955
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Reviewable status: 0 of 1 approvals obtained (waiting on @annxingyuan)
src/engine.ts, line 283 at r2 (raw file):
const original = this.runKernel; this.runKernel = <T extends Tensor|Tensor[], I extends NamedTensorMap>(
Instead of doing it this way, I think we should build the hook directly into runKernel. When a profile begins, we basically just set a bit saying we're profiling. Then inside the regular runKernel call, we update some internal state, and when the profile actually ends we read off that state.
Does that make sense?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for the review - I think this is ready for another look.
Reviewable status: 0 of 1 approvals obtained
src/engine.ts, line 283 at r2 (raw file):
Previously, nsthorat (Nikhil Thorat) wrote…
Instead of doing it this way, I think we should build the hook directly into runKernel. When a profile begins, we basically just set a bit saying we're profiling. Then inside the regular runKernel call, we update some internal state, and when the profile actually ends we read off that state.
Does that make sense?
Done
Makes sense!
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Reviewable status: 0 of 1 approvals obtained (waiting on @annxingyuan, @nsthorat, and @dsmilkov)
src/engine.ts, line 50 at r3 (raw file):
type KernelProfile = { name: string; bytesAdded: number; bytesUsed: number; inputShapes: number[][]; outputShape: number[]
this will be number[] | number[][]
src/engine.ts, line 54 at r3 (raw file):
export type ProfileInfo = { newBytes: number; newTensors: number; peak: number; kernels: KernelProfile[];
peakBytes
src/engine.ts, line 56 at r3 (raw file):
newBytes: number; newTensors: number; peak: number; kernels: KernelProfile[]; // tslint:disable-next-line:no-any result: any
make this a TensorContainer and then remove the any linter above
src/engine.ts, line 213 at r3 (raw file):
bytesAdded, bytesUsed: bytesAdded + inputKeys.reduce(
prefer forEach over reduce, in general, but I think you can simply use this.numBytes here right?
src/engine.ts, line 219 at r3 (raw file):
0), inputShapes: inputKeys.map(key => inputs[key].shape), outputShape: (result as Tensor).shape
turns out result can be a Tensor[] , can you make outputShape be Tensor | Tensor[] and make sure this lines up?
src/engine.ts, line 300 at r3 (raw file):
} async profile(query: () => void): Promise<ProfileInfo> {
query should return a TensorContainer
src/engine.ts, line 307 at r3 (raw file):
this.activeProfile.kernels = []; this.activeProfile.result = await query();
query should not be async, so you should not await it (if you use a TensorContainer this will be enforced by the types). However, when you move over to using GPU timing for a profile, this will have to be async so you should keep the function signature of profile as returning a Promise (but for now resolve it immediately).
src/engine_test.ts, line 376 at r3 (raw file):
expect(result.newBytes).toBe(12); expect(result.peak).toBe(24); expect(result.kernels[0].bytesAdded).toBe(12);
assert the whole result.kernels object since there are multiple kernels run
src/engine_test.ts, line 389 at r3 (raw file):
expect(result.newBytes).toBe(32); expect(result.peak).toBe(32); expect(result.kernels.find(d => d.name === 'matMul').bytesAdded).toBe(8);
instead of doing this can you assert the whole result.kernels object?
src/environment.ts, line 122 at r3 (raw file):
* - `kernels`: an array of objects for each kernel involved that reports * their input and output shapes and number of bytes used. * - `peak`: the maximum number of bytes used in any kernel
peakBytes
src/environment.ts, line 123 at r3 (raw file):
* their input and output shapes and number of bytes used. * - `peak`: the maximum number of bytes used in any kernel */
Can you also add a snippet here so they run in our API docs?
src/environment.ts, line 123 at r3 (raw file):
* their input and output shapes and number of bytes used. * - `peak`: the maximum number of bytes used in any kernel */
Add a section about the kernels as well
src/environment.ts, line 125 at r3 (raw file):
*/ /** @doc {heading: 'Performance', subheading: 'Memory'} */ static profile(f: () => void): Promise<ProfileInfo> {
f should return a TensorContainer
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Reviewed 1 of 7 files at r3.
Reviewable status: 0 of 1 approvals obtained (waiting on @annxingyuan and @dsmilkov)
src/engine.ts, line 49 at r3 (raw file):
type KernelProfile = { name: string; bytesAdded: number; bytesUsed: number; inputShapes: number[][];
How about renaming bytesUsed
to totalBytesSnapshot
to be more explicit that we are interested in the absolute value of memory at that time?
src/engine.ts, line 50 at r3 (raw file):
type KernelProfile = { name: string; bytesAdded: number; bytesUsed: number; inputShapes: number[][]; outputShape: number[]
For consistency with ProfileInfo, add tensorsAdded
and totalTensorsSnapshot
src/engine_test.ts, line 376 at r3 (raw file):
Previously, nsthorat (Nikhil Thorat) wrote…
assert the whole result.kernels object since there are multiple kernels run
+1. Even better to assert to the whole result object. (e.g. right now result.newTensors
is not being asserted)
src/environment.ts, line 124 at r3 (raw file):
* - `peak`: the maximum number of bytes used in any kernel */ /** @doc {heading: 'Performance', subheading: 'Memory'} */
let's change subheading to 'Profile' since it's going to be both Memory and Timing, not just memory.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Reviewable status: 0 of 1 approvals obtained (waiting on @annxingyuan)
src/engine.ts, line 49 at r3 (raw file):
Previously, dsmilkov (Daniel Smilkov) wrote…
How about renaming
bytesUsed
tototalBytesSnapshot
to be more explicit that we are interested in the absolute value of memory at that time?
Done
src/engine.ts, line 50 at r3 (raw file):
Previously, nsthorat (Nikhil Thorat) wrote…
this will be number[] | number[][]
Done
src/engine.ts, line 50 at r3 (raw file):
Previously, dsmilkov (Daniel Smilkov) wrote…
For consistency with ProfileInfo, add
tensorsAdded
andtotalTensorsSnapshot
Done
src/engine.ts, line 54 at r3 (raw file):
Previously, nsthorat (Nikhil Thorat) wrote…
peakBytes
Done
src/engine.ts, line 56 at r3 (raw file):
Previously, nsthorat (Nikhil Thorat) wrote…
make this a TensorContainer and then remove the any linter above
Done
src/engine.ts, line 213 at r3 (raw file):
Previously, nsthorat (Nikhil Thorat) wrote…
prefer forEach over reduce, in general, but I think you can simply use this.numBytes here right?
Done
src/engine.ts, line 219 at r3 (raw file):
Previously, nsthorat (Nikhil Thorat) wrote…
turns out result can be a Tensor[] , can you make outputShape be Tensor | Tensor[] and make sure this lines up?
Done
src/engine.ts, line 300 at r3 (raw file):
Previously, nsthorat (Nikhil Thorat) wrote…
query should return a TensorContainer
Done
src/engine.ts, line 307 at r3 (raw file):
Previously, nsthorat (Nikhil Thorat) wrote…
query should not be async, so you should not await it (if you use a TensorContainer this will be enforced by the types). However, when you move over to using GPU timing for a profile, this will have to be async so you should keep the function signature of profile as returning a Promise (but for now resolve it immediately).
Done
src/engine_test.ts, line 376 at r3 (raw file):
Previously, dsmilkov (Daniel Smilkov) wrote…
+1. Even better to assert to the whole result object. (e.g. right now
result.newTensors
is not being asserted)
Done
src/engine_test.ts, line 389 at r3 (raw file):
Previously, nsthorat (Nikhil Thorat) wrote…
instead of doing this can you assert the whole result.kernels object?
Done
src/environment.ts, line 122 at r3 (raw file):
Previously, nsthorat (Nikhil Thorat) wrote…
peakBytes
Done
src/environment.ts, line 123 at r3 (raw file):
Previously, nsthorat (Nikhil Thorat) wrote…
Can you also add a snippet here so they run in our API docs?
Is there an issue with multiline snippets?
src/environment.ts, line 123 at r3 (raw file):
Previously, nsthorat (Nikhil Thorat) wrote…
Add a section about the kernels as well
Done
src/environment.ts, line 124 at r3 (raw file):
Previously, dsmilkov (Daniel Smilkov) wrote…
let's change subheading to 'Profile' since it's going to be both Memory and Timing, not just memory.
Done
src/environment.ts, line 125 at r3 (raw file):
Previously, nsthorat (Nikhil Thorat) wrote…
f should return a TensorContainer
Done
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Reviewed 1 of 12 files at r1.
Reviewable status: 0 of 1 approvals obtained (waiting on @annxingyuan)
src/engine.ts, line 56 at r3 (raw file):
Previously, annxingyuan wrote…
Done
semicolon here
src/engine_test.ts, line 396 at r4 (raw file):
'outputShape': [3] } ]);
expect result.result to equal something you know about
src/engine_test.ts, line 448 at r4 (raw file):
} ]); });
expect result.result
src/environment.ts, line 123 at r3 (raw file):
Previously, annxingyuan wrote…
Is there an issue with multiline snippets?
nope you should be good
src/environment.ts, line 120 at r4 (raw file):
* - `newBytes`: tne number of new bytes allocated * - `newTensors`: the number of new tensors created * - `peakBytes`: the maximum number of bytes used in any kernel
update this, should say something like "peak number of bytes allocated"
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Reviewable status: 0 of 1 approvals obtained
src/engine.ts, line 56 at r3 (raw file):
Previously, nsthorat (Nikhil Thorat) wrote…
semicolon here
Done
src/engine_test.ts, line 396 at r4 (raw file):
Previously, nsthorat (Nikhil Thorat) wrote…
expect result.result to equal something you know about
Done
src/engine_test.ts, line 448 at r4 (raw file):
Previously, nsthorat (Nikhil Thorat) wrote…
expect result.result
Done
src/environment.ts, line 120 at r4 (raw file):
Previously, nsthorat (Nikhil Thorat) wrote…
update this, should say something like "peak number of bytes allocated"
Done
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Reviewable status: 0 of 1 approvals obtained
src/environment.ts, line 120 at r4 (raw file):
Previously, annxingyuan wrote…
Done
This doesn't look done :)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Reviewable status: 0 of 1 approvals obtained
src/environment.ts, line 120 at r4 (raw file):
Previously, nsthorat (Nikhil Thorat) wrote…
This doesn't look done :)
Done
i think!
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Reviewed 1 of 3 files at r5.
Reviewable status: 0 of 1 approvals obtained
Description
This PR implements a
tf.profile
function, described here: tensorflow/tfjs#563For example:
Then, profile looks like this:
Here's another example:
Then, profile looks like this:
These examples are also in engine_test.ts.
This doesn't cover everything specified in the original GitHub issue, but I wanted to get thoughts on the overall approach / ask a few questions before going further:
For repository owners only:
Please remember to apply all applicable tags to your pull request.
Tags: FEATURE, BREAKING, BUG, PERF, DEV, DOC, SECURITY
For more info see: https://github.com/tensorflow/tfjs/blob/master/DEVELOPMENT.md
This change is