Memory profiling #1247

annxingyuan · 2018-08-25T20:05:00Z

Description

This PR implements a tf.profile function, described here: tensorflow/tfjs#563

For example:

const profile = tf.profile(() => {
  const x = tf.tensor1d([1, 2, 3]);
  let x2 = x.square();
  x2.dispose();
  x2 = x.square();
  x2.dispose();
  return x;
});

Then, profile looks like this:

{  
   "newBytes":12,
   "newTensors":1,
   "peak":24,
   "average":24,
   "kernels":[  
      {  
         "name":"square",
         "bytesAdded":12,
         "bytesUsed":24,
         "inputShapes":[[3]],
         "outputShape":[3]
      },
      {  
         "name":"square",
         "bytesAdded":12,
         "bytesUsed":24,
         "inputShapes":[[3]],
         "outputShape":[3]
      }
   ],
   "result":{ ... }
}

Here's another example:

const profile = await tf.profile(() => {
  const a = tf.tensor2d([1, 2], [1, 2]);
  const b = tf.tensor2d([1, 2, 3, 4], [2, 2]);
  const c = a.matMul(b);
  return c;
});

Then, profile looks like this:

{  
   "newBytes":32,
   "newTensors":3,
   "peak":32,
   "average":32,
   "kernels":[  
      {  
         "name":"matMul",
         "bytesAdded":8,
         "bytesUsed":32,
         "inputShapes":[  
            [1, 2],
            [2, 2]
         ],
         "outputShape":[1, 2]
      }
   ],
   "result":{ ...}
}

These examples are also in engine_test.ts.

This doesn't cover everything specified in the original GitHub issue, but I wanted to get thoughts on the overall approach / ask a few questions before going further:

I saw that tf.memory returns both numBytes and numBytesInGPU - do we want to make that distinction for tf.profile?
I temporarily wrap the runKernel (in engine.ts) function in order to monitor memory usage per kernel. I was trying to avoid adding code outside the profile function, but is there an approach that better fits with existing patterns?
I wasn't sure how to calculate "averageBytes". According to the GitHub issue Nikhil created, it seems like we want the average number of bytes used across (1) ops and (2) tensor initializations by the user. I see that engine.ts has a registerTensor function, but it doesn't seem to distinguish between these two cases. But maybe I'm thinking about this in the wrong way?

For repository owners only:

Please remember to apply all applicable tags to your pull request.
Tags: FEATURE, BREAKING, BUG, PERF, DEV, DOC, SECURITY

For more info see: https://github.com/tensorflow/tfjs/blob/master/DEVELOPMENT.md

This change is

nsthorat

Reviewable status: 0 of 1 approvals obtained (waiting on @annxingyuan)

src/engine.ts, line 283 at r2 (raw file):

    const original = this.runKernel;

    this.runKernel = <T extends Tensor|Tensor[], I extends NamedTensorMap>(

Instead of doing it this way, I think we should build the hook directly into runKernel. When a profile begins, we basically just set a bit saying we're profiling. Then inside the regular runKernel call, we update some internal state, and when the profile actually ends we read off that state.

Does that make sense?

annxingyuan

Thanks for the review - I think this is ready for another look.

Reviewable status: 0 of 1 approvals obtained

src/engine.ts, line 283 at r2 (raw file):

Previously, nsthorat (Nikhil Thorat) wrote…

Instead of doing it this way, I think we should build the hook directly into runKernel. When a profile begins, we basically just set a bit saying we're profiling. Then inside the regular runKernel call, we update some internal state, and when the profile actually ends we read off that state.

Does that make sense?

Done

Makes sense!

nsthorat

Reviewable status: 0 of 1 approvals obtained (waiting on @annxingyuan, @nsthorat, and @dsmilkov)

src/engine.ts, line 50 at r3 (raw file):

type KernelProfile = {
  name: string; bytesAdded: number; bytesUsed: number; inputShapes: number[][];
  outputShape: number[]

this will be number[] | number[][]

src/engine.ts, line 54 at r3 (raw file):

export type ProfileInfo = {
  newBytes: number; newTensors: number; peak: number; kernels: KernelProfile[];

peakBytes

src/engine.ts, line 56 at r3 (raw file):

  newBytes: number; newTensors: number; peak: number; kernels: KernelProfile[];
  // tslint:disable-next-line:no-any
  result: any

make this a TensorContainer and then remove the any linter above

src/engine.ts, line 213 at r3 (raw file):

        bytesAdded,
        bytesUsed: bytesAdded +
            inputKeys.reduce(

prefer forEach over reduce, in general, but I think you can simply use this.numBytes here right?

src/engine.ts, line 219 at r3 (raw file):

                0),
        inputShapes: inputKeys.map(key => inputs[key].shape),
        outputShape: (result as Tensor).shape

turns out result can be a Tensor[] , can you make outputShape be Tensor | Tensor[] and make sure this lines up?

src/engine.ts, line 300 at r3 (raw file):

  }

  async profile(query: () => void): Promise<ProfileInfo> {

query should return a TensorContainer

src/engine.ts, line 307 at r3 (raw file):

    this.activeProfile.kernels = [];
    this.activeProfile.result = await query();

query should not be async, so you should not await it (if you use a TensorContainer this will be enforced by the types). However, when you move over to using GPU timing for a profile, this will have to be async so you should keep the function signature of profile as returning a Promise (but for now resolve it immediately).

src/engine_test.ts, line 376 at r3 (raw file):

    expect(result.newBytes).toBe(12);
    expect(result.peak).toBe(24);
    expect(result.kernels[0].bytesAdded).toBe(12);

assert the whole result.kernels object since there are multiple kernels run

src/engine_test.ts, line 389 at r3 (raw file):

    expect(result.newBytes).toBe(32);
    expect(result.peak).toBe(32);
    expect(result.kernels.find(d => d.name === 'matMul').bytesAdded).toBe(8);

instead of doing this can you assert the whole result.kernels object?

src/environment.ts, line 122 at r3 (raw file):

   * - `kernels`: an array of objects for each kernel involved that reports
   * their input and output shapes and number of bytes used.
   * - `peak`: the maximum number of bytes used in any kernel

peakBytes

src/environment.ts, line 123 at r3 (raw file):

   * their input and output shapes and number of bytes used.
   * - `peak`: the maximum number of bytes used in any kernel
   */

Can you also add a snippet here so they run in our API docs?

src/environment.ts, line 123 at r3 (raw file):

   * their input and output shapes and number of bytes used.
   * - `peak`: the maximum number of bytes used in any kernel
   */

Add a section about the kernels as well

src/environment.ts, line 125 at r3 (raw file):

   */
  /** @doc {heading: 'Performance', subheading: 'Memory'} */
  static profile(f: () => void): Promise<ProfileInfo> {

f should return a TensorContainer

dsmilkov

Reviewed 1 of 7 files at r3.
Reviewable status: 0 of 1 approvals obtained (waiting on @annxingyuan and @dsmilkov)

src/engine.ts, line 49 at r3 (raw file):

type KernelProfile = {
  name: string; bytesAdded: number; bytesUsed: number; inputShapes: number[][];

How about renaming bytesUsed to totalBytesSnapshot to be more explicit that we are interested in the absolute value of memory at that time?

src/engine.ts, line 50 at r3 (raw file):

type KernelProfile = {
  name: string; bytesAdded: number; bytesUsed: number; inputShapes: number[][];
  outputShape: number[]

For consistency with ProfileInfo, add tensorsAdded and totalTensorsSnapshot

src/engine_test.ts, line 376 at r3 (raw file):

Previously, nsthorat (Nikhil Thorat) wrote…

assert the whole result.kernels object since there are multiple kernels run

+1. Even better to assert to the whole result object. (e.g. right now result.newTensors is not being asserted)

src/environment.ts, line 124 at r3 (raw file):

   * - `peak`: the maximum number of bytes used in any kernel
   */
  /** @doc {heading: 'Performance', subheading: 'Memory'} */

let's change subheading to 'Profile' since it's going to be both Memory and Timing, not just memory.

…container

annxingyuan

Reviewable status: 0 of 1 approvals obtained (waiting on @annxingyuan)

src/engine.ts, line 49 at r3 (raw file):

Previously, dsmilkov (Daniel Smilkov) wrote…

How about renaming bytesUsed to totalBytesSnapshot to be more explicit that we are interested in the absolute value of memory at that time?

Done

src/engine.ts, line 50 at r3 (raw file):

Previously, nsthorat (Nikhil Thorat) wrote…

this will be number[] | number[][]

Done

src/engine.ts, line 50 at r3 (raw file):

Previously, dsmilkov (Daniel Smilkov) wrote…

For consistency with ProfileInfo, add tensorsAdded and totalTensorsSnapshot

Done

src/engine.ts, line 54 at r3 (raw file):

Previously, nsthorat (Nikhil Thorat) wrote…

peakBytes

Done

src/engine.ts, line 56 at r3 (raw file):

Previously, nsthorat (Nikhil Thorat) wrote…

make this a TensorContainer and then remove the any linter above

Done

src/engine.ts, line 213 at r3 (raw file):

Previously, nsthorat (Nikhil Thorat) wrote…

prefer forEach over reduce, in general, but I think you can simply use this.numBytes here right?

Done

src/engine.ts, line 219 at r3 (raw file):

Previously, nsthorat (Nikhil Thorat) wrote…

turns out result can be a Tensor[] , can you make outputShape be Tensor | Tensor[] and make sure this lines up?

Done

src/engine.ts, line 300 at r3 (raw file):

Previously, nsthorat (Nikhil Thorat) wrote…

query should return a TensorContainer

Done

src/engine.ts, line 307 at r3 (raw file):

Previously, nsthorat (Nikhil Thorat) wrote…

query should not be async, so you should not await it (if you use a TensorContainer this will be enforced by the types). However, when you move over to using GPU timing for a profile, this will have to be async so you should keep the function signature of profile as returning a Promise (but for now resolve it immediately).

Done

src/engine_test.ts, line 376 at r3 (raw file):

Previously, dsmilkov (Daniel Smilkov) wrote…

+1. Even better to assert to the whole result object. (e.g. right now result.newTensors is not being asserted)

Done

src/engine_test.ts, line 389 at r3 (raw file):

Previously, nsthorat (Nikhil Thorat) wrote…

instead of doing this can you assert the whole result.kernels object?

Done

src/environment.ts, line 122 at r3 (raw file):

Previously, nsthorat (Nikhil Thorat) wrote…

peakBytes

Done

src/environment.ts, line 123 at r3 (raw file):

Previously, nsthorat (Nikhil Thorat) wrote…

Can you also add a snippet here so they run in our API docs?

Is there an issue with multiline snippets?

src/environment.ts, line 123 at r3 (raw file):

Previously, nsthorat (Nikhil Thorat) wrote…

Add a section about the kernels as well

Done

src/environment.ts, line 124 at r3 (raw file):

Previously, dsmilkov (Daniel Smilkov) wrote…

let's change subheading to 'Profile' since it's going to be both Memory and Timing, not just memory.

Done

src/environment.ts, line 125 at r3 (raw file):

Previously, nsthorat (Nikhil Thorat) wrote…

f should return a TensorContainer

Done

nsthorat

Reviewed 1 of 12 files at r1.
Reviewable status: 0 of 1 approvals obtained (waiting on @annxingyuan)

src/engine.ts, line 56 at r3 (raw file):

Previously, annxingyuan wrote…

Done

semicolon here

src/engine_test.ts, line 396 at r4 (raw file):

        'outputShape': [3]
      }
    ]);

expect result.result to equal something you know about

src/engine_test.ts, line 448 at r4 (raw file):

      }
    ]);
  });

expect result.result

src/environment.ts, line 123 at r3 (raw file):

Previously, annxingyuan wrote…

Is there an issue with multiline snippets?

nope you should be good

src/environment.ts, line 120 at r4 (raw file):

   * - `newBytes`: tne number of new bytes allocated
   * - `newTensors`: the number of new tensors created
   * - `peakBytes`: the maximum number of bytes used in any kernel

update this, should say something like "peak number of bytes allocated"

annxingyuan

Reviewable status: 0 of 1 approvals obtained

src/engine.ts, line 56 at r3 (raw file):

Previously, nsthorat (Nikhil Thorat) wrote…

semicolon here

Done

src/engine_test.ts, line 396 at r4 (raw file):

Previously, nsthorat (Nikhil Thorat) wrote…

expect result.result to equal something you know about

Done

src/engine_test.ts, line 448 at r4 (raw file):

Previously, nsthorat (Nikhil Thorat) wrote…

expect result.result

Done

src/environment.ts, line 120 at r4 (raw file):

Previously, nsthorat (Nikhil Thorat) wrote…

update this, should say something like "peak number of bytes allocated"

Done

nsthorat

Reviewable status: 0 of 1 approvals obtained

src/environment.ts, line 120 at r4 (raw file):

Previously, annxingyuan wrote…

Done

This doesn't look done :)

annxingyuan

Reviewable status: 0 of 1 approvals obtained

src/environment.ts, line 120 at r4 (raw file):

Previously, nsthorat (Nikhil Thorat) wrote…

This doesn't look done :)

Done

i think!

nsthorat

Reviewed 1 of 3 files at r5.
Reviewable status: 0 of 1 approvals obtained

annxingyuan and others added 20 commits August 22, 2018 09:49

setup

992bb1e

added stubs to the interface

4c046d7

improve notes

d6737d9

return start/end byte allocations

25d124f

basic kernel watching

22c1a6b

add kernel name

ba31e83

report bytes used

edf5959

added typescript interfaces

e2a362d

add average and peak properties

be0c7aa

enforce runkernel interface

8693bba

add test cases

0fdceab

make test async

aab5af3

undo setup code

de5d0e8

undo changes accidentally made during development

e8f4f3d

undo version bump

4611bd2

undelete yarn.lock

a5ad560

revive newline

a7c3d74

add comment

cce7756

add shapes

8c260e2

modify return info

ce6fd9c

nsthorat reviewed Aug 27, 2018

View reviewed changes

annxingyuan added 9 commits August 30, 2018 09:52

Merge branch 'master' into profile

90af8de

Merge branch 'master' into profile

101d65c

formatting

ee3aed5

linting

a5c2c79

change matmul profile test to select for the right kernel

45d4aac

add docs and nix average for now

6eb0f82

record information directly in runkernel

95cfddc

typescript fix

ecc7628

undo uneeded change

a5343e0

annxingyuan commented Aug 31, 2018

View reviewed changes

annxingyuan changed the title ~~WIP Memory profiling~~ Memory profiling Aug 31, 2018

annxingyuan added 2 commits August 31, 2018 13:44

Merge branch 'master' into profile

2684286

Merge branch 'master' into profile

a70f88c

dsmilkov requested review from dsmilkov and nsthorat September 4, 2018 17:16

nsthorat reviewed Sep 5, 2018

View reviewed changes

dsmilkov reviewed Sep 5, 2018

View reviewed changes

annxingyuan added 7 commits September 6, 2018 14:07

Merge branch 'master' into profile

08cb84a

update bytesused and tensorsused to be snapshots

5bbaff5

account for return array type and specify query return type as tensor…

52c7038

…container

assert entire kernels objects

18dde1f

update docs

b9fcb48

remove calls to fit

0367347

missing semicolon

7b534dc

annxingyuan commented Sep 6, 2018

View reviewed changes

Merge branch 'master' into profile

8e34ed8

nsthorat reviewed Sep 7, 2018

View reviewed changes

review

86d0d59

annxingyuan commented Sep 7, 2018

View reviewed changes

nsthorat reviewed Sep 7, 2018

View reviewed changes

edit copy

135c831

annxingyuan commented Sep 7, 2018

View reviewed changes

nsthorat approved these changes Sep 7, 2018

View reviewed changes

annxingyuan merged commit 3280806 into master Sep 7, 2018

annxingyuan deleted the profile branch September 7, 2018 17:34

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Memory profiling #1247

Memory profiling #1247

annxingyuan commented Aug 25, 2018 •

edited by dsmilkov

nsthorat left a comment

annxingyuan left a comment

nsthorat left a comment

dsmilkov left a comment

annxingyuan left a comment

nsthorat left a comment

annxingyuan left a comment

nsthorat left a comment

annxingyuan left a comment

nsthorat left a comment

Memory profiling #1247

Memory profiling #1247

Conversation

annxingyuan commented Aug 25, 2018 • edited by dsmilkov

Description

For repository owners only:

nsthorat left a comment

Choose a reason for hiding this comment

annxingyuan left a comment

Choose a reason for hiding this comment

nsthorat left a comment

Choose a reason for hiding this comment

dsmilkov left a comment

Choose a reason for hiding this comment

annxingyuan left a comment

Choose a reason for hiding this comment

nsthorat left a comment

Choose a reason for hiding this comment

annxingyuan left a comment

Choose a reason for hiding this comment

nsthorat left a comment

Choose a reason for hiding this comment

annxingyuan left a comment

Choose a reason for hiding this comment

nsthorat left a comment

Choose a reason for hiding this comment

annxingyuan commented Aug 25, 2018 •

edited by dsmilkov