Skip to content
This repository has been archived by the owner on Aug 15, 2019. It is now read-only.

Memory profiling #1247

Merged
merged 41 commits into from
Sep 7, 2018
Merged

Memory profiling #1247

merged 41 commits into from
Sep 7, 2018

Conversation

annxingyuan
Copy link
Collaborator

@annxingyuan annxingyuan commented Aug 25, 2018

Description

This PR implements a tf.profile function, described here: tensorflow/tfjs#563

For example:

const profile = tf.profile(() => {
  const x = tf.tensor1d([1, 2, 3]);
  let x2 = x.square();
  x2.dispose();
  x2 = x.square();
  x2.dispose();
  return x;
});

Then, profile looks like this:

{  
   "newBytes":12,
   "newTensors":1,
   "peak":24,
   "average":24,
   "kernels":[  
      {  
         "name":"square",
         "bytesAdded":12,
         "bytesUsed":24,
         "inputShapes":[[3]],
         "outputShape":[3]
      },
      {  
         "name":"square",
         "bytesAdded":12,
         "bytesUsed":24,
         "inputShapes":[[3]],
         "outputShape":[3]
      }
   ],
   "result":{ ... }
}

Here's another example:

const profile = await tf.profile(() => {
  const a = tf.tensor2d([1, 2], [1, 2]);
  const b = tf.tensor2d([1, 2, 3, 4], [2, 2]);
  const c = a.matMul(b);
  return c;
});

Then, profile looks like this:

{  
   "newBytes":32,
   "newTensors":3,
   "peak":32,
   "average":32,
   "kernels":[  
      {  
         "name":"matMul",
         "bytesAdded":8,
         "bytesUsed":32,
         "inputShapes":[  
            [1, 2],
            [2, 2]
         ],
         "outputShape":[1, 2]
      }
   ],
   "result":{ ...}
}

These examples are also in engine_test.ts.

This doesn't cover everything specified in the original GitHub issue, but I wanted to get thoughts on the overall approach / ask a few questions before going further:

  • I saw that tf.memory returns both numBytes and numBytesInGPU - do we want to make that distinction for tf.profile?
  • I temporarily wrap the runKernel (in engine.ts) function in order to monitor memory usage per kernel. I was trying to avoid adding code outside the profile function, but is there an approach that better fits with existing patterns?
  • I wasn't sure how to calculate "averageBytes". According to the GitHub issue Nikhil created, it seems like we want the average number of bytes used across (1) ops and (2) tensor initializations by the user. I see that engine.ts has a registerTensor function, but it doesn't seem to distinguish between these two cases. But maybe I'm thinking about this in the wrong way?

For repository owners only:

Please remember to apply all applicable tags to your pull request.
Tags: FEATURE, BREAKING, BUG, PERF, DEV, DOC, SECURITY

For more info see: https://github.com/tensorflow/tfjs/blob/master/DEVELOPMENT.md


This change is Reviewable

Copy link
Contributor

@nsthorat nsthorat left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Reviewable status: 0 of 1 approvals obtained (waiting on @annxingyuan)


src/engine.ts, line 283 at r2 (raw file):

    const original = this.runKernel;

    this.runKernel = <T extends Tensor|Tensor[], I extends NamedTensorMap>(

Instead of doing it this way, I think we should build the hook directly into runKernel. When a profile begins, we basically just set a bit saying we're profiling. Then inside the regular runKernel call, we update some internal state, and when the profile actually ends we read off that state.

Does that make sense?

Copy link
Collaborator Author

@annxingyuan annxingyuan left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for the review - I think this is ready for another look.

Reviewable status: 0 of 1 approvals obtained


src/engine.ts, line 283 at r2 (raw file):

Previously, nsthorat (Nikhil Thorat) wrote…

Instead of doing it this way, I think we should build the hook directly into runKernel. When a profile begins, we basically just set a bit saying we're profiling. Then inside the regular runKernel call, we update some internal state, and when the profile actually ends we read off that state.

Does that make sense?

Done

Makes sense!

@annxingyuan annxingyuan changed the title WIP Memory profiling Memory profiling Aug 31, 2018
Copy link
Contributor

@nsthorat nsthorat left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Reviewable status: 0 of 1 approvals obtained (waiting on @annxingyuan, @nsthorat, and @dsmilkov)


src/engine.ts, line 50 at r3 (raw file):

type KernelProfile = {
  name: string; bytesAdded: number; bytesUsed: number; inputShapes: number[][];
  outputShape: number[]

this will be number[] | number[][]


src/engine.ts, line 54 at r3 (raw file):

export type ProfileInfo = {
  newBytes: number; newTensors: number; peak: number; kernels: KernelProfile[];

peakBytes


src/engine.ts, line 56 at r3 (raw file):

  newBytes: number; newTensors: number; peak: number; kernels: KernelProfile[];
  // tslint:disable-next-line:no-any
  result: any

make this a TensorContainer and then remove the any linter above


src/engine.ts, line 213 at r3 (raw file):

        bytesAdded,
        bytesUsed: bytesAdded +
            inputKeys.reduce(

prefer forEach over reduce, in general, but I think you can simply use this.numBytes here right?


src/engine.ts, line 219 at r3 (raw file):

                0),
        inputShapes: inputKeys.map(key => inputs[key].shape),
        outputShape: (result as Tensor).shape

turns out result can be a Tensor[] , can you make outputShape be Tensor | Tensor[] and make sure this lines up?


src/engine.ts, line 300 at r3 (raw file):

  }

  async profile(query: () => void): Promise<ProfileInfo> {

query should return a TensorContainer


src/engine.ts, line 307 at r3 (raw file):

    this.activeProfile.kernels = [];
    this.activeProfile.result = await query();

query should not be async, so you should not await it (if you use a TensorContainer this will be enforced by the types). However, when you move over to using GPU timing for a profile, this will have to be async so you should keep the function signature of profile as returning a Promise (but for now resolve it immediately).


src/engine_test.ts, line 376 at r3 (raw file):

    expect(result.newBytes).toBe(12);
    expect(result.peak).toBe(24);
    expect(result.kernels[0].bytesAdded).toBe(12);

assert the whole result.kernels object since there are multiple kernels run


src/engine_test.ts, line 389 at r3 (raw file):

    expect(result.newBytes).toBe(32);
    expect(result.peak).toBe(32);
    expect(result.kernels.find(d => d.name === 'matMul').bytesAdded).toBe(8);

instead of doing this can you assert the whole result.kernels object?


src/environment.ts, line 122 at r3 (raw file):

   * - `kernels`: an array of objects for each kernel involved that reports
   * their input and output shapes and number of bytes used.
   * - `peak`: the maximum number of bytes used in any kernel

peakBytes


src/environment.ts, line 123 at r3 (raw file):

   * their input and output shapes and number of bytes used.
   * - `peak`: the maximum number of bytes used in any kernel
   */

Can you also add a snippet here so they run in our API docs?


src/environment.ts, line 123 at r3 (raw file):

   * their input and output shapes and number of bytes used.
   * - `peak`: the maximum number of bytes used in any kernel
   */

Add a section about the kernels as well


src/environment.ts, line 125 at r3 (raw file):

   */
  /** @doc {heading: 'Performance', subheading: 'Memory'} */
  static profile(f: () => void): Promise<ProfileInfo> {

f should return a TensorContainer

Copy link
Contributor

@dsmilkov dsmilkov left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Reviewed 1 of 7 files at r3.
Reviewable status: 0 of 1 approvals obtained (waiting on @annxingyuan and @dsmilkov)


src/engine.ts, line 49 at r3 (raw file):

type KernelProfile = {
  name: string; bytesAdded: number; bytesUsed: number; inputShapes: number[][];

How about renaming bytesUsed to totalBytesSnapshot to be more explicit that we are interested in the absolute value of memory at that time?


src/engine.ts, line 50 at r3 (raw file):

type KernelProfile = {
  name: string; bytesAdded: number; bytesUsed: number; inputShapes: number[][];
  outputShape: number[]

For consistency with ProfileInfo, add tensorsAdded and totalTensorsSnapshot


src/engine_test.ts, line 376 at r3 (raw file):

Previously, nsthorat (Nikhil Thorat) wrote…

assert the whole result.kernels object since there are multiple kernels run

+1. Even better to assert to the whole result object. (e.g. right now result.newTensors is not being asserted)


src/environment.ts, line 124 at r3 (raw file):

   * - `peak`: the maximum number of bytes used in any kernel
   */
  /** @doc {heading: 'Performance', subheading: 'Memory'} */

let's change subheading to 'Profile' since it's going to be both Memory and Timing, not just memory.

Copy link
Collaborator Author

@annxingyuan annxingyuan left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Reviewable status: 0 of 1 approvals obtained (waiting on @annxingyuan)


src/engine.ts, line 49 at r3 (raw file):

Previously, dsmilkov (Daniel Smilkov) wrote…

How about renaming bytesUsed to totalBytesSnapshot to be more explicit that we are interested in the absolute value of memory at that time?

Done


src/engine.ts, line 50 at r3 (raw file):

Previously, nsthorat (Nikhil Thorat) wrote…

this will be number[] | number[][]

Done


src/engine.ts, line 50 at r3 (raw file):

Previously, dsmilkov (Daniel Smilkov) wrote…

For consistency with ProfileInfo, add tensorsAdded and totalTensorsSnapshot

Done


src/engine.ts, line 54 at r3 (raw file):

Previously, nsthorat (Nikhil Thorat) wrote…

peakBytes

Done


src/engine.ts, line 56 at r3 (raw file):

Previously, nsthorat (Nikhil Thorat) wrote…

make this a TensorContainer and then remove the any linter above

Done


src/engine.ts, line 213 at r3 (raw file):

Previously, nsthorat (Nikhil Thorat) wrote…

prefer forEach over reduce, in general, but I think you can simply use this.numBytes here right?

Done


src/engine.ts, line 219 at r3 (raw file):

Previously, nsthorat (Nikhil Thorat) wrote…

turns out result can be a Tensor[] , can you make outputShape be Tensor | Tensor[] and make sure this lines up?

Done


src/engine.ts, line 300 at r3 (raw file):

Previously, nsthorat (Nikhil Thorat) wrote…

query should return a TensorContainer

Done


src/engine.ts, line 307 at r3 (raw file):

Previously, nsthorat (Nikhil Thorat) wrote…

query should not be async, so you should not await it (if you use a TensorContainer this will be enforced by the types). However, when you move over to using GPU timing for a profile, this will have to be async so you should keep the function signature of profile as returning a Promise (but for now resolve it immediately).

Done


src/engine_test.ts, line 376 at r3 (raw file):

Previously, dsmilkov (Daniel Smilkov) wrote…

+1. Even better to assert to the whole result object. (e.g. right now result.newTensors is not being asserted)

Done


src/engine_test.ts, line 389 at r3 (raw file):

Previously, nsthorat (Nikhil Thorat) wrote…

instead of doing this can you assert the whole result.kernels object?

Done


src/environment.ts, line 122 at r3 (raw file):

Previously, nsthorat (Nikhil Thorat) wrote…

peakBytes

Done


src/environment.ts, line 123 at r3 (raw file):

Previously, nsthorat (Nikhil Thorat) wrote…

Can you also add a snippet here so they run in our API docs?

Is there an issue with multiline snippets?


src/environment.ts, line 123 at r3 (raw file):

Previously, nsthorat (Nikhil Thorat) wrote…

Add a section about the kernels as well

Done


src/environment.ts, line 124 at r3 (raw file):

Previously, dsmilkov (Daniel Smilkov) wrote…

let's change subheading to 'Profile' since it's going to be both Memory and Timing, not just memory.

Done


src/environment.ts, line 125 at r3 (raw file):

Previously, nsthorat (Nikhil Thorat) wrote…

f should return a TensorContainer

Done

Copy link
Contributor

@nsthorat nsthorat left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Reviewed 1 of 12 files at r1.
Reviewable status: 0 of 1 approvals obtained (waiting on @annxingyuan)


src/engine.ts, line 56 at r3 (raw file):

Previously, annxingyuan wrote…

Done

semicolon here


src/engine_test.ts, line 396 at r4 (raw file):

        'outputShape': [3]
      }
    ]);

expect result.result to equal something you know about


src/engine_test.ts, line 448 at r4 (raw file):

      }
    ]);
  });

expect result.result


src/environment.ts, line 123 at r3 (raw file):

Previously, annxingyuan wrote…

Is there an issue with multiline snippets?

nope you should be good


src/environment.ts, line 120 at r4 (raw file):

   * - `newBytes`: tne number of new bytes allocated
   * - `newTensors`: the number of new tensors created
   * - `peakBytes`: the maximum number of bytes used in any kernel

update this, should say something like "peak number of bytes allocated"

Copy link
Collaborator Author

@annxingyuan annxingyuan left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Reviewable status: 0 of 1 approvals obtained


src/engine.ts, line 56 at r3 (raw file):

Previously, nsthorat (Nikhil Thorat) wrote…

semicolon here

Done


src/engine_test.ts, line 396 at r4 (raw file):

Previously, nsthorat (Nikhil Thorat) wrote…

expect result.result to equal something you know about

Done


src/engine_test.ts, line 448 at r4 (raw file):

Previously, nsthorat (Nikhil Thorat) wrote…

expect result.result

Done


src/environment.ts, line 120 at r4 (raw file):

Previously, nsthorat (Nikhil Thorat) wrote…

update this, should say something like "peak number of bytes allocated"

Done

Copy link
Contributor

@nsthorat nsthorat left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Reviewable status: 0 of 1 approvals obtained


src/environment.ts, line 120 at r4 (raw file):

Previously, annxingyuan wrote…

Done

This doesn't look done :)

Copy link
Collaborator Author

@annxingyuan annxingyuan left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Reviewable status: 0 of 1 approvals obtained


src/environment.ts, line 120 at r4 (raw file):

Previously, nsthorat (Nikhil Thorat) wrote…

This doesn't look done :)

Done

i think!

Copy link
Contributor

@nsthorat nsthorat left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Reviewed 1 of 3 files at r5.
Reviewable status: 0 of 1 approvals obtained

@annxingyuan annxingyuan merged commit 3280806 into master Sep 7, 2018
@annxingyuan annxingyuan deleted the profile branch September 7, 2018 17:34
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
None yet
Projects
None yet
3 participants